Thursday, November 21, 2019

Ecs, CircleCI, NLB and Grpc

I finished a migration from Heroku to ECS using CircleCI orbs for AWS.
I detail the aspect of the migration and the peculiarities of the project.



I knew of CircleCI back from 2016. I just knew that it was a more accessible tool to configure than Jenkins, and the project I was working on already had it.
Back then, I did not pay much attention to it since my main task was developing.
Cue to mid-2019, the company I'm working on switches from Jenkins to CircleCI. They did not want to spend resources, time, or anything. They want to "have it working without spending time fixing it.".
They move to CircleCI, and everything is working without the problems we had with Jenkins.
This new project I was working on was using Heroku with a bash script.
We opted to use ECS, and we found that we had support for it.
The documentation for the ECS orb is not clear at all if you don't spend time reading the terms of CircleCI orbs.
There is no clear differentiation between the job and the workflow, and I spent days confused.



I already talked about ECS, three years later and I'm back with it. Things are pretty much the same as before, now we've got fargate, that I have not used, but the documentation on crucial aspects like green/blue deployment is tailored towards that, which sucks, but is what it is.
Things are the same, you create a cluster, you create services, and you place tasks on it.
The new things I'm using this year are;
  1. Green/Blue deployment (soon).
  2. NLB for GRPC (more on it in the GRPC section).
What I do have to recommend is that if you are planning on expensive IO operations and you are planning on using "t2.small" as the best machine, then stay away from ECS. You are going to have downtime because the IO consumes the CPU credits fast.
We've got a situation with a monolithic application that we won't spend time refactoring. Still, the section of the code that works with PDF's has a lot of IO code, some parts are abusing IO due to the library they are using, and some parts abuse IO in the monolithic application itself.



Again with GRPC. In the same situation, learn a bit more about how to debug the server internally, how to force the channel closure.
The "options" parameter for the server and the channel on the client receive a list of tuples. It lacks documentation in Python, and you need to dig deep into the source for figure how things work.
Options that you can use are also missing in Python, so I ended up reading Go examples that gave me an idea.



I'm writing this document on 11/21/19, NLB does indeed work with ECS EC2 deployment type without any problem.

The crux of the problem


I am using ECS (EC2) with a network load balancer to have the GRPC server working.
We are using internal NLB because we are not exposing GRPC to the outside world since this is just for our microservices.
The main problem is that the NLB balancer does not balance the machines behind it. Once the NLB opens a connection, it keeps on reusing it, no matter if you close the channel in the client.
That cascades the problem that If I create an autoscaling group because the EC2 instance serving is degrading, I don't have a way to force that without having downtime, which indeed it sucks.
In theory, GRPC also offers to balance at the client level, but since I'm doing a deployment in ECS, I don't have an excellent way to fix the IP of the server since the IP will change after the first deploy.
I don't know, and perhaps I could opt for a strategy like placing an elastic IP and find a way to articulate this in ECS?.

I will write my findings if I ever find a solution to this problem.

Sources and links used during this research.

No comments:

Post a Comment