Are complex algorithms relevant in the real world?
I have been asked this in a variety of tones: some curious, some anxious, and some condescending. Let’s see…
Morgan Stanley, November 2014.
My friend was drawing on the board while I looked on. We were in different teams, each assigned a project. His program currently took 5 hours to execute. The requirement stated one hour. As the ‘algorithm expert’, I was called for help.
The program was to find the friends of all users in a social network. After explaining the use-case, my friend showed me his approach. I was very determined to find an optimization, hence I did. Immediately.
“An adjacency matrix?! No, use an adjacency list. The execution time will reduce in your case to O(N). Or you could use an adjacency tree! That would – blah blah….”
I made some calculations, and guaranteed that the program could not take more than a second to run. My friend hurried to his computer, and I got back to my project. He came back the next day. The program had taken 5 hours again.
There must be a bug in his code. Looking into it, I found the problem. And it wasn’t any bug.
“You are making a DB call for each user?”
“Where is your DB?”
“California, I think.”
Each DB (Database) call was traveling halfway round the world to query a database before heading back. Every call took 5 seconds, and there were 1700 users. To add to it all, the DB gateway got suspicious at the large number of requests from a single client. It was throttling them.
“A stored procedure will do the job. You can code the algorithm into it.”
A stored procedure, like the name suggests, is a piece of code stored in a database. It executes within the database and sends results in a single call. This would avoid all the back and forth.
Within a few hours, his team was celebrating. Their program’s execution time had dropped to 12 minutes.
My own project was client facing. The little backend work was coded in a script, written by the project manager.
From the start, I found the code disproportionately long. 300 lines to make a simple DB call? Here is the procedure:
- The database table was being queried entirely.
- Stored in a map.
- Filtered on a date range.
- Sorted according to the dates.
I changed it to:
- The database table was queried on a date range, with the results ordered and returned in a list. Only the relevant columns would be sent back.
It looks like I have just squashed the previous list into a single line. That makes a big difference though. Now, the sorting and filtering is at the DB layer and the network load is significantly smaller. The program speed went up by 400%.
I soon realized: it didn’t matter. Our data set was small, requiring no optimization. The previous script ran almost as fast, and I never mentioned the changes to the project manager.
I am now studying system design in detail. This involves understanding how organizations write software to handle millions of users. Specifically, how do large organizations stay available, reliable and fast. This despite the enormous load on their machines.
When going through eventual consistency, consistent hashing, etc… I started appreciating algorithms. Algorithms enabled these systems. They were saving millions of dollars for firms worldwide. And algorithms allowed me to understand and appreciate the complexity of these systems.
“You rarely code complex algorithms in the real world” is a gross simplification. Most problems in the world are easy to solve. A more accurate statement would be: “You will rarely face a complex-solvable-monetizable problem in the real world”. And that is, of course, true. But the number of adjectives in that statement make it trivial.
Scene 1 and 2 encourage me to:
1. Not be quick to make conclusions.
2. Remember optimizations need not be algorithmic.
3. Keep the bigger picture in mind.
But scene 3 reminds me that, all around the world, we are solving difficult problems through study and ingenuity.
Thanks for reading!
I have changed some of the project scenarios in this post, to avoid revealing private data.