

However, we observed a problem that during rolling updates of some of the services on our Kubernetes cluster, pods were dying immediately when services would be put into Terminating status, rather than stopping gracefully as was desired. Problem StatementĪs we have been rolling out our NextGen HCM solution to more and more customers and demands on our platform have been increasing, we want to continue to ensure the reliability of our underlying infrastructure. Our platform consists of 150+ microservices, most of which are written using Node.js, running in many pods across multiple worker nodes. As our API Gateway, we also have been using Ambassador, which is an open source distribution of Envoy proxy, designed for Kubernetes. We run our platform workloads on AWS’s managed Kubernetes offering, EKS. In this post, I am sharing a small glimpse into our journey here towards achieving zero-downtime rolling updates with Kubernetes. Given this, it is critical that we can release updates to our platform continuously while it is online and in a way that is transparent to our users who may be in different regions and timezones and who rely on the system being available to them in their daily work. Here at Lifion, by ADP, we’re building a distributed platform and product suite for our clients around the world.

Our Journey to Zero Downtime Rolling Updates with Ambassador
