This week I ran into something interesting on the current project that I’m working on. In the project, we have a PostgreSQL datamart where we store a ton of data generated from a machine learning model.
Data is added to the database every time a run finishes and each run contain hundreds of thousands of entries, on top of that we run around ~200 runs per day so that equals to at least 20M rows per day, ouch.
Sometimes some of our favorite tools make things so simple for us by abstracting away so much of the technical implementations, that we lose track of what is actually going on behind the scenes, which cause us to accidentally implement things incorrectly and have unintended bugs within our code.
I ran into one of these bugs a few days ago with Celery. I had a Celery task that accepted a deserialized yaml file as a dictionary, it then modified some...
Stability of your asynchronous background tasks is crucial for your system design. When you move the work of processing from the main application and instead leverage something like Celery, to execute the work in the background, it’s important that you can feel confident that those tasks get executed correctly without you having to babysit it and wait for the results.
There are generally two things that can go wrong as you send a task to a Celery worker to process...
These days, many of the large tech companies that people are aspiring to work for, are following a very similar recruitment process that is split up into multiple steps that slowly filter down the applicants to a few final ones.
This process usually looks something like this:
Coding Challenge (Either live or using online tools)
Partner/Manager/Team Leader Interview
The steps can vary slightly, but in almost all cases they will always start off by...
Reacting on calls to Celery tasks is one of the first things that you will want to dig deeper in as soon as you start scratching the surface of Celery. How do you react on when a task finishes or fails, and then trigger some other code when these events occur?
For example, I was working on a project recently where we leveraged Celery to do asynchronous communication between services in a distributed system. We had multiple applications that were...
Parsing file paths, web addresses or file names is something that we have to come back to over and over again in different situations. It could be that you want to validate the file extension of a file name, or perhaps that you want to get a hostname of a full URL.
There are plenty of different methods to break down a string of a path into smaller components, to allow you to get the information you’d like. A lot...
Personally, I feel that code reviews have been one of the key tools in my toolbox that have helped me develop the most within the past few years.
Not only has it helped me and the team that I’m working with to create a high quality, consistent and easy-to-use code base, but it has also given me great perspectives and insight into what I can do better, and how I can take the next steps into becoming a better software...
Scaling Django in production to be able to serve your application to thousands of visitors is one of the most popular topics that people wonder and ask about within the Django community. I can’t even count the times I’ve seen people ask questions related to this within Facebook groups, StackOverflow and other discussion forums.
Yes, Django scales incredibly well and we can see that from use cases such as BitBucket, Quora, Instagram, Disqus and other services that use Django to...
Believe it or not, during the past few months I’ve managed to delete the database of this website not just once but twice (almost three!). Imagine that, how can someone be so clumsy to delete their whole database?!
Well, at least I’ve learned my lesson. Obviously, it wasn’t that I just wrote DROP DATABASE or anything like it by accident, what actually happened was that I accidentally reprovisioned my database instance which recreated a fresh version with a...
During the last few months, I’ve been working on a project for one of the largest retailers in the world where we use machine learning and data science to help them predict future sales.
This global company has a huge amount of data and one of the trickiest parts of the project is to gather all the data from their stores, users, and markets. Some data is accessible from API’s while other data must be manually uploaded in Excel or...