New AutoSpotting release - 1.2.1-6

Instance deregistration from ECS clusters

Hello,

I'm happy to announce a new release of AutoSpotting, which now also drains ECS tasks from terminating EC2 instances, making sure no new tasks are scheduled on them and triggering the immediate launch of new replacement tasks by ECS.

You can get it as usual on the AWS Marketplace. This feature is so far only available from CloudFormation but Terraform support will be made available soon, stay tuned.

Background

For a little context on this work, about two weeks ago we had our biggest release ever, that included about a dozen improvements.

One of the highlights of that release was improving the Spot termination handling to launch new Spot instances with diversified failover to On Demand instances, pretty much ensuring you always get capacity to replace your Spot instances.

Somewhat related to the same work, we soon had another important release that added support for automatically draining connections from classic load balancers and ALB/NLB target groups on Spot terminations in a way transparent to the applications you're running and how you deploy them, be it EKS, ECS, Beanstalk or even plain EC2.

This request initially came from an AutoSpotting user that runs their application on ECS, since it turns out ECS doesn't properly trigger these connection draining actions and they noticed dropped connection for a long time.

Soon after this latest release I got feedback from the same user that even though the load balancer connection draining worked as expected, there were still some ECS tasks scheduled on the terminating EC2 instances, which were then very shortly lived. This wasn't resulting in any dropped connections or other visible impact but it was still suboptimal, so I decided to fix it.

I quickly came up with a way to deregister terminating instances from ECS, and after some testing on my side the user also confirmed that the ECS tasks are drained and no tasks are scheduled on the terminating instances:

After the fact I realized/remembered that ECS natively supports task and load balancer connection draining natively, but it is disabled by default and requires a change of the userdata script.

Besides reliability, one of the main promises of AutoSpotting is the fact that it is easy to roll out because it doesn't require any configuration changes from users, so asking users to do such changes is not an option.

In theory we could convert our new explicit load balancer connection draining API calls to automate this userdata script change, but besides the fact that we would need to patch the userdata, which is something I'm reluctant to do, that functionality also has some limitations such as the lack of support for the Rebalancing Recommendation events, which we would also handle with the current implementation if AutoSpotting is configured to handle them.

That's pretty much it, I'm happy we were able to refine this draining logic even further and completely covering ECS users.

I'm looking forward to your feedback on this feature, as well as on what other things you'd like to see built next. Any inputs that helps us prioritize our roadmap are more than welcome.

Best regards,

Cristian