Why code Idempotence is essential in workers. A case study

Why code Idempotence is essential in workers. A case study

Hello everyone! Today's article is a short but important case study about how not having idempotence in your code could cause you endless headaches.

What is code idempotence?

Simply put, idempotence is a property where multiple executions of a piece of code should always yield the same result, no matter how many times it got executed.

Not having idempotence in your code could lead to headaches and debugging problems that we can simply prevent in the first place if we implement it right

[Case Study] A worker bulk writing to the database

Imagine this, A worker takes in a CSV URL, downloads the CSV and proceeds to parse it. After parsing it will iterate over each row of the CSV and proceed to insert that row into the database.

The example code written below is implemented using ruby and Sidekiq which is a background worker.

class InsertBulkCSVJob < ActiveJob::Base
    queue_as :bulk_csv_queue
    sidekiq_options retry: 5

    def perform(csv_url)
        csv = parse_csv(csv_url)
        csv.each do |r|
            # logic to insert the row into the database
        end
    end

    def parse_csv(csv_url)
        # downloads the csv, parses it and returns it
    end
end

Unexpected failures might occur to this worker and prevent it from continuing, for example;

  1. One record might fail due to a possible constraint missing

  2. The worker process might get terminated (the host machine dies, the process crashes, etc)

If you have a retry policy in your worker then when everything is healthy it will attempt to retry the job. On retrying it will download the CSV once again and proceed to insert the rows into the database.

But what if we made some progress in our CSV? let's say we inserted 20% of our CSV already, retrying will cause data duplication where the first 20% will get inserted twice. In this case, writing code with this in consideration makes a huge difference and saves you much time down the road.

How can we improve the code written above?

The simplest way to approach an issue like this is by tracking our progress. We can have a key with the job id stored in Redis for example and increment this key every time an insertion succeeds. Let's take a look at the code down below

class InsertBulkCSVJob < ActiveJob::Base
    queue_as :bulk_csv_queue
    sidekiq_options retry: 5

    def perform(csv_url)
        csv = parse_csv(csv_url)
        insert_bulk_events(csv)
    end

    def insert_bulk_events(csv)
        Redis.SETNX(job_id, 0)

        for i in (Redis.get(self.job_id).to_i)...csv.length)
          OurModel.transaction do
            OurModel.transaction.create(r[i])
            Redis.INCR(self.job_id)
          end
        end

        Redis.DEL(self.job_id)
   end

    def parse_csv(csv_url)
        # downloads the csv, parses it and returns it
    end
end
  • If we look at the insert_bulk_events method, We start by writing to the key-value store a key (the job_id ) and an initial value of 0, (SETNX) means set if not exists so don't overwrite the value if its present.
  • When the job fails Sidekiq keeps the same job_id and doesn't issue a new one, which is very handy in our case.

  • We start iterating over the CSV from the starting index which is the value of the key we just set.

  • Wrapping the insertion along with the increment in one transaction to ensure atomicity and prevent unnecessary increments if for some reason the creation fails.

  • Finally, we delete the key once we finish and the job reaches its end.

This way if a job had to retry for any reason it will start from the first index that still needs to be inserted. However its still prone to some errors for example if the key got deleted from Redis for any reason. Also another downside is the amount of write queries being made but this is irrelevant to the problem we're discussing.

We can see how writing code with idempotence in mind can make a huge difference in how we approach our solution. Not only idempotence but other non related issues like concurrency, any race conditions, etc. Having even 1% doubt that something like this might happen would help improve your code resilience alot. However depending on the use case it might be a double edged weapon because you don't want to over engineer and overcomplicate things.

This was a quick article discussing my thoughts on a case that i found was interesting. Stay safe and till the next one!

Did you find this article valuable?

Support Amr Elhewy by becoming a sponsor. Any amount is appreciated!