• by tartuffe78 on 7/20/2016, 1:03:23 PM

    One of my most memorable ones:

    We we're developing an Android app that involved a custom camera implementation. In the camera we we're getting the actual frames (about 15 fps was all we could manage on most Gingerbread devices) and saving them out in a custom file format.

    We we're dealing with lots of device specific camera issues (HTC had one device with 2 back facing cameras that gave us a lot of trouble), and one day our QA person comes up to us and says: "It's working fine on this device, except it crashes when I try to film out the window."

    We we're already feeling stressed about all the issues and didn't believe her. We figured it was just some race condition that it was a correlation != causation thing. We asked her to show us, and sure enough time after time you would turn the camera to face the window and it would crash.

    We showed our manager and the designers, because we couldn't believe how weird it was.

    The cause and fix turned out to be simple enough. We kept the lights off in the office for the most part, and the camera frame rate was jumping way up when the scene was bright. When the camera turned to the window, the fps went above what we had achieved before, and our app was throwing an OutOfMemoryException, since we weren't dealing with the frames fast enough, and they were stacking up.

    It drove home the importance of real world testing to us.

  • by s3b on 7/20/2016, 6:36:24 AM

    The apple graphing calculator story - http://www.pacifict.com/Story/

  • by kim-toms on 7/23/2016, 1:56:42 AM

    Once, long ago, I worked at a company that made hardware that plugged into a Unibus backplane. This backplane was used by DEC (an early minicomputer) systems. The backplane, much like those of today, was a series of card-edge connectors.

    We had a product which communicated across this backplane using programmed I/O with a driver installed into the kernel of the machine (running RSTS-E).

    We'd developed a new version which was able to use DMA to directly access the buffer memory of the system processor; it was pretty small, only 256K bytes.

    We had a system we tried to install this on in England, and there was some problem we could not figure out. It just wouldn't work. Now, the installation procedure involved the backplane intimately. In those days, the backplanes weren't a printed circuit card, but rather a series of connectors connected by wire-wrap wire. To install a DMA card, you had to unwrap the DMA grant line from the backplane. We even had special cards to insert which jumpered it back if you needed to remove one of the DMA cards. After several days of trying various things, we gave up.

    About 11 years later, the installer and I were talking over beers, and we finally figured out the problem. Of course, by that time we were both working at a new company. The issue was that the grant line was damaged or removed in another slot of the backplane, but wasn't correctly jumpered back. When they moved around a card to 'test' the slot, it didn't use DMA, and so worked fine.

  • by malux85 on 7/20/2016, 9:58:15 PM

    Room for a DevOps story?

    I was managing a very busy Cassandra cluster of 14 nodes, these machines had 16GB of ram, and as a very horrible hack for OOM problems, had 8GB of swap space too. With Cassandra - you have to always have at least N megabytes of space free on the disk where N is the size of your largest table, this is because during "Major compactions" it's possible the table will have to be rewritten.

    One of these machines was desperately running out of disk space, so I turned the swap space off to claim the extra 8GB (I just had to get the compaction out of the way, then these machines would be upgraded and this all wouldn't be a problem anymore)

    So turning off the swap space, I could see the kernel moving data back into RAM, and I was also watching the diskspace fill up from the compaction. They were both going at a fairly linear rate, but the system was going to run out of disk space BEFORE the swapfile was released.

    But not by much, the system would run out of diskspace about 30 seconds before the swapspace would be free --- now I know that this cassandra configuration was set so that a node wouldn't be considered "dead" unless it was out of communication with the cluster for longer than 10 seconds --- so I used KILL to freeze the cassandra process a few times, but never longer than 10 seconds.

    I was able to freeze the process enough so that the swapspace was free'd before the diskspace ran out -- and the node communicated with the cluster enough to remain "active"

    Lesson Learned: These machines were severely underspec'd!

  • by LarryMade2 on 7/20/2016, 12:29:56 AM

    Scratch Monkey - Its a sad classic of computing lore. Here is a brief and less harrowing version of it: http://www.catb.org/jargon/html/S/scratch-monkey.html

  • by Rannath on 7/20/2016, 7:20:55 PM

    I was recently doing some win32 coding, getting an odd access violation that hits JUST before winmain/main exits, only on x64 and only in Release mode. I deleted/un-deleted nearly every bit of code to find my problem. Turns out I had the wrong string going into UnRegisterClass on my window handling class. Took me ~4 hours to find the bug and all of 10 seconds to fix it. Always double & triple check when copying & pasting.

    Now I'm using SDL, since it's just a little hobbyist game. :) Also don't reinvent the wheel. Especially if someone else is offering free wheels.

  • by Fatters on 7/20/2016, 8:43:22 AM

    There's a couple of great ones from Dave Baggett during the development of Crash Bandicoot.

    https://www.quora.com/How-did-game-developers-pack-entire-ga...

    https://www.quora.com/Whats-the-hardest-bug-youve-debugged