Kubernetes Migration: Lessons from Moving 50 Services to K8s
Real-world insights from a large-scale Kubernetes migration, including what to migrate first, common pitfalls, and why lift-and-shift rarely works.
Kubernetes Migration: Lessons from Moving 50 Services to K8s
"We should move to Kubernetes" sounds simple. It is not.
Over the past two years, I led the migration of 50+ services from EC2-based deployments to Amazon EKS. Here is everything I learned, including the mistakes we made so you do not have to.
Why We Migrated
Our EC2-based infrastructure worked. But we were hitting limits:
- Deployment friction: Each service had its own deployment scripts, configurations, and quirks
- Resource inefficiency: Instances ran at 20-30% utilization because each service needed headroom
- Scaling lag: Autoscaling took minutes, not seconds
- Environment drift: Dev, staging, and production configurations diverged over time
c89d
Kubernetes promised standardization, efficiency, and faster iteration. It delivered, but the path was harder than expected.
What We Got Right
Started with Stateless Services
Our first migrations were simple, stateless APIs. No databases, no persistent volumes, no special networking requirements.
This let us build platform capabilities incrementally:
- Basic deployment pipelines
- Logging and monitoring
- Secret management
- Basic networking
By the time we tackled complex services, the platform was mature.
Invested in Platform Tooling
We did not ask every team to become Kubernetes experts. We built a platform team and created abstractions:
Standardized Helm charts: Teams fill in a values file; platform provides the chart.
GitOps with ArgoCD: All deployments go through Git. No kubectl in production.
Centralized secrets management: External Secrets Operator syncs from AWS Secrets Manager.
Observability stack: Prometheus, Grafana, and Datadog integrated out of the box.
Teams deploy services without knowing Kubernetes internals.
Ran Parallel Environments
For critical services, we ran Kubernetes and EC2 in parallel for weeks:
- 1Deploy to both environments
- 2Route 1% of traffic to Kubernetes
- 3Compare latency, error rates, and behavior
- 4Gradually increase Kubernetes traffic
- 5Decommission EC2 only after 100% traffic on K8s for 2+ weeks
This saved us multiple times when we discovered issues that only appeared under production load.
What We Got Wrong
Underestimated Networking Complexity
Kubernetes networking is its own world. We struggled with:
DNS resolution: Pod DNS settings differ from EC2. Services that resolved hostnames had unexpected behavior.
Load balancer costs: Every Service of type LoadBalancer creates an ELB. At 50 services, that is 50 ELBs. We switched to ingress controllers.
Network policies: We did not implement them initially. Retrofitting was painful.
Service mesh evaluation paralysis: We spent months debating Istio vs Linkerd vs no mesh. We should have started without a mesh and added it later.
Tried to Lift and Shift
Our initial approach: take existing containers, add Kubernetes manifests, deploy.
This worked for simple services and failed for everything else:
- Health checks: Kubernetes probes are stricter than ELB health checks. Services that "worked" on EC2 crash-looped on K8s.
- Graceful shutdown: Services that ignored SIGTERM caused deployment failures.
- Resource limits: Services without limits got OOMKilled or starved neighbors.
- Twelve-factor violations: Services that read local files or wrote local logs failed mysteriously.
We ended up refactoring every service anyway. Should have planned for it from the start.
Skipped the Staging Environment
"We will test in production with low traffic" was a mistake.
We should have built a production-identical staging environment on Kubernetes first. Would have caught:
- Resource limit tuning issues
- Network policy problems
- Secrets management edge cases
- Observability gaps
Did Not Plan for Stateful Services
We knew stateful services would be hard. We did not realize how hard.
Databases stayed on RDS (right choice). But stateful services like:
- Redis (with persistence)
- Elasticsearch
- Kafka
...required weeks of planning each. Persistent volumes, node affinity, disruption budgets, backup strategies.
Our advice: keep stateful workloads on managed services unless you have a very good reason.
Migration Playbook
Based on our experience, here is how I would approach a Kubernetes migration today:
Phase 1: Foundation (2-3 months)
- Set up EKS cluster with proper VPC design
- Implement GitOps pipeline (Flux or ArgoCD)
- Deploy observability stack
- Establish secrets management
- Create standard Helm charts
- Document everything
Phase 2: Pilot (1-2 months)
- Migrate 2-3 simple, non-critical services
- Run in parallel with existing infrastructure
- Iterate on platform based on learnings
- Build team confidence
Phase 3: Accelerate (3-6 months)
- Migrate services in waves (5-10 at a time)
- Prioritize by risk (low risk first)
- Build internal champions on each team
- Refine platform tooling based on feedback
Phase 4: Complete (2-3 months)
- Migrate complex and stateful services
- Decommission legacy infrastructure
- Optimize resource utilization
- Document lessons learned
Cost Reality Check
Kubernetes is not automatically cheaper. Our first-year cost comparison:
| Category | Before (EC2) | After (EKS) |
|---|---|---|
| Compute | $35K/month | $25K/month |
| Load Balancers | $2K/month | $4K/month |
| Platform Tooling | $0 | $3K/month |
| Data Transfer | $3K/month | $4K/month |
| Engineering Time | N/A | 3 FTE for 6 months |
Net compute savings: $10K/month. But it took 18 months to realize.
The real value was not cost savings. It was developer velocity, deployment confidence, and standardization.
Should You Migrate?
Kubernetes makes sense if:
- You have 10+ services
- Deployment friction slows you down
- You need better resource utilization
- You want infrastructure standardization
- You can invest in platform engineering
Kubernetes does not make sense if:
- You have a handful of simple services
- Your team is small (< 10 engineers)
- You do not have platform engineering capacity
- Your current setup is working fine
The Bottom Line
Kubernetes migration is a multi-quarter investment. It will be harder than you expect, take longer than you plan, and require more organizational change than you anticipate.
But done right, it pays dividends for years: faster deployments, better reliability, and a foundation for scaling. Just go in with eyes open.
Anoop MC
Fractional CTO and AI Strategist helping enterprises navigate the AI revolution. 18+ years of experience building and scaling technology organizations.
Get in touch