Jorge Omar Vázquez: Reliable queue

During research for my current project, I found an exciting topic to research.

Instead of going with the usual solutions such as celery, RQ, rabbit, zeromq, for example, I discovered a post about reliable queues using just redis.

The solution itself is quite straight forward.

You push incoming lists to redis [l|r]push, [b]rlpop to retrieve, and put in a new list.

N workers consume that list, and you obtain your desired result.

The steps of the worker are simple. It has two steps.

Set a lock (setex) and run the task.

There is a monitor that evaluates the locks, and if a task does not finish, it pushes the task again to be rerun.

This pattern provides me with several good points to describe.

I can scale redis.
I can put up to n workers to consume.
The monitor can take care of tasks that do not complete, expire.
Messages can indicate how long they take.
If the worker crashes, the monitor can put the task in the queue again.
Tasks are in memory

The bad points I foresee

If monitor crashes and does not catch up with redis, it is a potential disaster.
If redis crashes, everything is lost. Need to work on this point

Besides those points, there is a fundamental point.

I'm running again in ECS, the most notable instance my client is allowing me to use are T2.small instances.

At one point, I was allowed to add A2.xlarge or A2.large instances, but the cost of adapting to arm exceeds the time I can spend on this.

Over the sprint, my concept moved to another sprint, but I'm spending my free time writing an implementation of this excellent solution.

Perhaps I'm talking ahead of time, but I'm excited to learn something new.

I'm writing this proof of concept here. Perhaps I'm overly optimistic, and I do not see the future problems, such as how redis scales works or how to synchronize well the tasks, or how to calculate I/O with time execution, which proves an important part here to keep parts oiled.

During this solution, the other fundamental part of my research involves using GRPC. The workers use a grpc client to place a message, so execution time seems trivial; we receive all the data and call the client.

To create random blocking time, I went with the classic Fibonacci calculation. During this implementation, I found the reduced formula to calculate Fibonacci, and it caught my attention and got a bit deviated in my implementation.

I rarely had the capacity in maths, I recognize that I'm not good, but the closed formula for Fibonacci as an alternative caught my attention, as an alternative solution to recursion is fantastic, but I'm noticing that if we calculate the Fibonacci of 100, results differ with the closed formula.

Either way, this is my weekend project besides others.

I doubt that anyone is reading this, but well, if this works, this is a log of operations.

Jorge Omar Vázquez

Wednesday, November 6, 2019

Reliable queue

No comments:

Post a Comment