400

October 27th, 2021 × #webdev#horror#halloween

Horror Web Dev Stories - 2021

Scott and Wes share horror stories submitted by developers about production bugs, data loss, performance issues, and more.

or

For episode 400, Scott and Wes talk about web dev horror stories - 2021 edition!

LogRocket - Sponsor

LogRocket lets you replay what users do on your site, helping you reproduce bugs and fix issues faster. It's an exception tracker, a session re-player and a performance monitor. Get 14 days free at logrocket.com/syntax.

Mux - Sponsor

Mux Video is an API-first platform that makes it easy for any developer to build beautiful video. Powered by data and designed by video experts, your video will work perfectly on every device, every time. Mux Video handles storage, encoding, and delivery so you can focus on building your product. Live streaming is just as easy and Mux will scale with you as you grow, whether you're serving a few dozen streams or a few million. Visit mux.com/syntax.

Linode - Sponsor

Whether you’re working on a personal project or managing enterprise infrastructure, you deserve simple, affordable, and accessible cloud computing solutions that allow you to take your project to the next level. Simplify your cloud infrastructure with Linode’s Linux virtual machines and develop, deploy, and scale your modern applications faster and easier. Get started on Linode today with a $100 in free credit for listeners of Syntax. You can find all the details at linode.com/syntax. Linode has 11 global data centers and provides 24/7/365 human support with no tiers or hand-offs regardless of your plan size. In addition to shared and dedicated compute instances, you can use your $100 in credit on S3-compatible object storage, Managed Kubernetes, and more. Visit linode.com/syntax and click on the “Create Free Account” button to get started.

Show Notes

02:54 - Hi guys, love the show. I wanted to share with you something that happened just the other day (Oct 4th), I was starting my new job today at a large tech company. They use React for everything (even DNS!, don't ask me how, it's complicated). I figured I'd celebrate my first day and push some code to prod, (how hard could useEffect be right?) Next thing you know, they ended up bringing in a guy with an angle grinder to get access to the server cage.

04:15 - No one from Denver can buy

06:38 - Bug accidentally gives $90 million to users

08:34 - Share Pointy Knives

  • Hi! I'm a developer at a consulting firm in Sweden, writing C# on the backend and using React with either JavaScript or TypeScript and hosting things in Azure 99% of the time (and 1% in SharePoint). I was in my last week at my last job before I was due to start my new job. Worked 12 h/day to keep up with all the handovers etc. to colleagues so they would have a chance to continue working on the solutions I have taken care of. One project was a process tool hosted in SharePoint Online. The guy who would oversee it had -1% experience with SharePoint (which I pointed out to my bosses). But to make things a bit easier, I wrote a deploy script to ease things a bit. Starts the terminal and runs the script towards the acceptance environment. Umpteen million errors appear... Which is strange, because there would only be about 20 commands (which can cause errors like these). I log into the environment to double check if I now accidentally entered the wrong values in the script (which looked okay according to me). But I get a 404 error when I try to reach the environment... I log into the admin interface; I discover that the site is gone... Also checking the trash can, there are no things there. Very strange. I find that I'm in a different folder than the one where I saved my script... In that folder there is an old deploy script that was used when the project was started a thousand years ago (which was not used after the project was "finished"). The first thing the script does is force delete the site and then try to create a new empty site... The site is gone with lists and everything (lists are a SharePoint thing, think of it as sql-lite), there are no backups of the acceptance environment (although it is very important). I just feel a little panicked about how I'm going to solve this. However, I remember testing a tool six months ago to copy entire environments. Where the first attempt was made on the acceptance environment. Finds the cloned environment and can use the same tool to clone it back. It took only 8-12 hours of work to create all the new things done in the environment in the last 6 months instead of X number of hours to build everything from scratch.
  • Once I updated a feature that saves accessories on orders (same solution). However, I failed to add all the new fields to the production environment. Which meant that accessories were not saved at all... Which was discovered after a week... I fixed the error in 5 minutes and the sellers had to contact x number of customers to double check what kind of accessories they would have for their orders...

11:22 - External HD

  • One time I needed to format a server. It was an outdated Windows server. I selected all the files and copied and pasted to an external hard drive. My drive was pretty fast and it took like a minute. I was like: “Wow! That’s a great external hd”. Formatted the server and, as soon as I realized it didn’t copy 10% of the files, I had that face. We all know that face. Anyways. Tried to restore the files using some HD recovery tools but they were all corrupted, not by the formatting itself but for the installation of the new OS. My boss was pissed! I was very young so I blame it on the server. I’m not proud of it. But why the heck they would ask a developer to format a server in the first place? By the way, my birthday is on Halloween. Spoooky.

13:07 - Hey Loser

  • I was testing new code to automate mass-mailings to our customers. Who knows what demon drove me but I wrote the “test” mailings like ransom notes: “Dear loser! Fork over all your $$$ or else!” Well, all was looking great and I wa s feeling pretty pleased with myself. Progress bars were sliding and counters were spinning. But I could hear a rising commotion from the marketing guys behind me. Phones ringing, voices raised. Turns out I had moronically wired myself to the production database! Even worse for me, I’d only been at the company a month or two. I thought my goose was cooked and the Big Boss was plenty mad, but I owned up right away and apologized. We put out a cover story that we’d been hacked and all was forgiven.

15:01 - HE HATE ME

  • I was part of the developer team that accidentally leaked the 8 cities the XFL, an alternate football league, a week before their press conference. ewrestling.com/article/wwe-ac… We were using Contentful and Gatsby. A junior dev entered the information into the prod space instead of the UAT space and when we released some bug fixes, it picked up the contact us content update. I found out after seeing stories pop up in Google News when I was about to go to sleep. Was taking the content down when we started getting calls from the CIO of the WWE. The league went bust because of COVID.

19:23 - I Don't Have Memory of This

  • I had two pretty bad code changes that only showed their problems when they went live in production. Around 6 years ago, I was running into a large performance issue with some of our queries running slowly against this giant DB. We were using JPA/Hibernate and we had a bunch of joins that were done lazily. I switched a few of them to eager so that they would create a single SQL statement instead of a bunch (or thousands). The change worked fine on my dev environment, QA, and staging. Staging was supposed to be representative of production. So we went live and within minutes the entire system went down because of out of memory errors. We quickly switched back to the lazy joins. We found out that staging had more memory and fewer DB records than production though they were supposed to be exactly the same.

21:05 - Your Performance is Slowing us down

  • Back when VMWare was becoming a thing, like 2010 or so. I was working at an ecomm site and we were seeing slow performance between the app server and some data services. I decided to build a little multithreaded logger that could track when a query to Oracle Financials was running too slow and generate a warning. Oracle Financials was doing the credit card transactions, orders, and all the rest of the sites DB work. The code had no impact on my dev, QA, and staging environments. We were hitting well over our minimum number of concurrent users. We deployed it to production and then the system got slower and slower, but never crashed. Again, production and staging were set up differently. Staging was a bare-metal server. Production was running on an ESXi server on a host that was split 4 ways. The multi-threaded code meant to detect performance degradations was slowing the whole system down when it tried to synchronize data across threads. I was pretty embarrassed by both these two issues. It went to show that production is its own special thing and that you really don't know if your server-side code is really going to work until it starts running there.

23:15 - Dead Button

  • Way back when mainframes were king, a guy I worked with pushed a button in, that if released, would immediately take down the entire company. He stood there for 4 hours, holding the button in, until we could let it crash after business hours. We gave him a chair after 2 hours.

25:12 - No Deploys on Fridays

  • I was a junior dev working on our company's website. They were HTML + nunjucks templates that were later being integrated with the backend using some Python witchcraft. There was also a metric ton of JS libraries added (like Babra for page transitions, threejs for a cool interactive animation on the landing page etc.). Didn't really get much of all this package.json stuff at that seniority level. So after running yarn or npm or whatever, and seeing some warnings about a couple packages being outdated, I decided to update some of them. It ran great locally, but I didn't build the prod version, as I didn't know there could be any differences. I was working on some minor feature (or maybe even some minor bug) and the PM decided there's no time for code review. So I pushed it to the repo, the backend guy did his integration, and launched it on prod. As it turned out, there were some breaking changes in one of the libraries I decided to update. It crashed the entire site. On Friday. At 4:30PM. And that, kids, is why you don't deploy on Fridays.

27:33 - Stupid Selfie

  • Horror story for you Wes. I work for one of the biggest retailers in the UK and we were working on an app that would go on a 'media wall' in their flagship store in London. Basically a giant 200-inch screen in the middle of the store that social content can go on. Turns out that I left my local Dev version connected to the production API when I uploaded a couple of stupid selfies of my big head in the office. Get a call the next day to ask why my face is on the medial wall.

28:37 - Soda

  • I was a computer operator back in the late 1960’s, operating a Honeywell mainframe. The consoles were huge, about the size of a dishwashing machine, with the console typewriter and printer inset in the middle, on top. I had a soft drink on the console, next to the typewriter mechanism. We were told never to bring a drink into the room but we all did it, especially on third shift. Long story short, someone called my name, I turned around and knocked the glass of soda into the console. Had to be completely replaced – machine was down for two days. My boss was not happy.

31:22 - Oof

  • A bigger horror story. I had my own software company in the 90’s and was in Singapore, customizing my software package for Johnson & Higgins Insurance Brokers – I had their Asian contract for my Insurance Broker/Accounting package. I spent a good 40 hours on Saturday and Sunday, making all the changes they asked for, getting ready for a demo on Monday morning. I finished up about 4am on Monday morning and was cleaning up my files. All this work was done on a Novell server. Print files had an extension of .prt and I had a ton of them in the main directory from all of the testing I had done. I was cleaning out old files, getting ready to back everything up and I thought I would delete all of the print files. I mistakenly keyed in erase *.prg, instead of erase *.prt (or whatever the delete command was – can’t remember it now). Programming files have a .prg extension – I had deleted all of my updated files from the weekend. In desperation I called Novell in Utah, hoping they could help me recover the files, but no-go. The demo Monday morning was not fun.

33:24 - Young Dev

  • I was a young dev right out of college. My first job was at a child support company where we had desktop apps that would handle case information more efficiently than using Excel. My first project was to write a POC that would later be implemented into a new, bigger app that consolidated all the “POCs” for various parts of the child support process. For some odd reason, I still don’t know why to this day, my boss wanted me to write this “new” app on top of an old app with a bunch of legacy code. I never understood why but as a young dev fresh out of school, you tend to just do what you’re told. In school, I mainly used PHP/HTML/CSS for learning how to work with a database; this job however used C#/.NET for their desktop apps so I was doing a lot of learning as I went. I remember finally learning how to connect to the database and run some SQL after fighting with this old pile of legacy code. In early versions, I chose to handle creates/updates for these records in the same function. My young, dumb self wrote a try catch statement that would attempt to create the record and if it failed, it would try to update the record. Before the first production release, I updated the flow to handle creates/updates in separate functions - but never removed the update in the catch block of the original function now used for creates only. Somehow I, or any PM/QA, never failed on a create and hit this catch block while testing. Fast-forward probably 9-12 months later, I got a ticket to investigate why every case’s data looked the same in Production. I login to the app, search a few case numbers and sure enough, every case’s data is the same. I began freaking out as I had no clue how this could’ve happened. I mean it had never happened in all the dev work, testing, and months of live Production use. After I investigated with a senior dev, we realized the try block had failed and the update query in the catch block ran for that record - we also realized that I left off the where clause in the related SQL query to specify which record needs updating - so ALL records got updated with this data. Thankfully, we kept regular back-ups and were able to restore the data to a recent timeframe without users losing a ton of work. We commented out that database update call and redeployed the code ASAP. Also the senior dev was cool about it and was like “hey, it happens to all of us at some point”. Let’s just say I’ve learned a ton since then and definitely steer clear from writing code like that. You live and you learn I suppose.

38:40 - Where Wolf

  • Here’s my development tale of terror: One night I was burning the midnight oil trying to get caught up on a never-ending workload. At the time I was working for an online travel booking site. It was after 11, and the last thing I had to do for the night was to rename one of the hotels in our production database. So I wrote my query: UPDATE hotels SET name=‘Some Hotel Chain’; One problem, I FORGOT THE WHERE CLAUSE. Suddenly, over 5,000 hotels in our production database all had the same name. This was around 2003, so well before the time of point-in-time restores, and we were only backing up the database every week at that point. I was panicking. Fortunately, I had a dump of the production database that I had created only a couple of hours earlier sitting on my local hard drive. So thankfully, I was able to restore almost all of the hotel names, save for a couple that signed up after that data dump, and my boss was none the wiser. That’s when I learned that working late hours is not worth it, because at some point you are so tired that you can no longer make good decisions.

41:19 - I Want Your Job

  • When I first started out I worked for a consultancy and they trained us in sales meetings to help managers get promoted because we were coming in to make them "look good". This was okay b/c obviously, we were coming in as a contractor; however, after being laid off due to 9/11 (yes, this was about 20 years ago), I was looking for a new job and during an interview when asked where I'd like to be in X years, I mentioned to the hiring manager that I wanted to eventually do what he was doing. Well, I guess he didn't take it that I wanted to make him get promoted to then take his spot. Safe to say I didn't get hired. 🤦‍♂️😜

××× SIIIIICK ××× PIIIICKS ×××

Shameless Plugs

Tweet us your tasty treats!