What is a Fractional CTO?

A Fractional CTO is a part-time Chief Technology Officer who provides strategic technology leadership to companies without the cost of a full-time executive. They help with technology strategy, team building, architecture decisions, and scaling technical organizations.

When should a startup hire a Fractional CTO?

Startups should consider a Fractional CTO when they are pre-Series B with technical debt but no technical leadership, are non-technical founders launching technical products, are undergoing specific transformations like cloud migration or AI adoption, or need board-level technology representation.

What does an AI Consultant do?

An AI Consultant helps organizations develop and implement AI strategies, identify high-value use cases, evaluate AI vendors, build production AI systems like RAG and LLM applications, and ensure responsible AI adoption aligned with business objectives.

How much does a Fractional CTO cost?

Fractional CTO services typically range from $5,000 to $20,000 per month depending on engagement scope, compared to $300,000+ fully-loaded cost for a full-time CTO. This makes senior technology leadership accessible to startups and growing companies.

What industries does Anoop MC work with?

Anoop MC works with enterprises and startups across fintech, healthcare, SaaS, e-commerce, and enterprise technology sectors, providing fractional CTO services, AI consulting, and technology strategy.

Cloud

Kubernetes Migration: Lessons from Moving 50 Services to K8s

Real-world insights from a large-scale Kubernetes migration, including what to migrate first, common pitfalls, and why lift-and-shift rarely works.

September 8, 202513 min read

Kubernetes Migration: Lessons from Moving 50 Services to K8s

"We should move to Kubernetes" sounds simple. It is not.

Over the past two years, I led the migration of 50+ services from EC2-based deployments to Amazon EKS. Here is everything I learned, including the mistakes we made so you do not have to.

Why We Migrated

Our EC2-based infrastructure worked. But we were hitting limits:

Deployment friction: Each service had its own deployment scripts, configurations, and quirks
Resource inefficiency: Instances ran at 20-30% utilization because each service needed headroom
Scaling lag: Autoscaling took minutes, not seconds
Environment drift: Dev, staging, and production configurations diverged over time

c89d

Kubernetes promised standardization, efficiency, and faster iteration. It delivered, but the path was harder than expected.

What We Got Right

Started with Stateless Services

Our first migrations were simple, stateless APIs. No databases, no persistent volumes, no special networking requirements.

This let us build platform capabilities incrementally:

Basic deployment pipelines
Logging and monitoring
Secret management
Basic networking

By the time we tackled complex services, the platform was mature.

Invested in Platform Tooling

We did not ask every team to become Kubernetes experts. We built a platform team and created abstractions:

Standardized Helm charts: Teams fill in a values file; platform provides the chart.

GitOps with ArgoCD: All deployments go through Git. No kubectl in production.

Centralized secrets management: External Secrets Operator syncs from AWS Secrets Manager.

Observability stack: Prometheus, Grafana, and Datadog integrated out of the box.

Teams deploy services without knowing Kubernetes internals.

Ran Parallel Environments

For critical services, we ran Kubernetes and EC2 in parallel for weeks:

1Deploy to both environments
2Route 1% of traffic to Kubernetes
3Compare latency, error rates, and behavior
4Gradually increase Kubernetes traffic
5Decommission EC2 only after 100% traffic on K8s for 2+ weeks

This saved us multiple times when we discovered issues that only appeared under production load.

What We Got Wrong

Underestimated Networking Complexity

Kubernetes networking is its own world. We struggled with:

DNS resolution: Pod DNS settings differ from EC2. Services that resolved hostnames had unexpected behavior.

Load balancer costs: Every Service of type LoadBalancer creates an ELB. At 50 services, that is 50 ELBs. We switched to ingress controllers.

Network policies: We did not implement them initially. Retrofitting was painful.

Service mesh evaluation paralysis: We spent months debating Istio vs Linkerd vs no mesh. We should have started without a mesh and added it later.

Tried to Lift and Shift

Our initial approach: take existing containers, add Kubernetes manifests, deploy.

This worked for simple services and failed for everything else:

Health checks: Kubernetes probes are stricter than ELB health checks. Services that "worked" on EC2 crash-looped on K8s.
Graceful shutdown: Services that ignored SIGTERM caused deployment failures.
Resource limits: Services without limits got OOMKilled or starved neighbors.
Twelve-factor violations: Services that read local files or wrote local logs failed mysteriously.

We ended up refactoring every service anyway. Should have planned for it from the start.

Skipped the Staging Environment

"We will test in production with low traffic" was a mistake.

We should have built a production-identical staging environment on Kubernetes first. Would have caught:

Resource limit tuning issues
Network policy problems
Secrets management edge cases
Observability gaps

Did Not Plan for Stateful Services

We knew stateful services would be hard. We did not realize how hard.

Databases stayed on RDS (right choice). But stateful services like:

Redis (with persistence)
Elasticsearch
Kafka

...required weeks of planning each. Persistent volumes, node affinity, disruption budgets, backup strategies.

Our advice: keep stateful workloads on managed services unless you have a very good reason.

Migration Playbook

Based on our experience, here is how I would approach a Kubernetes migration today:

Phase 1: Foundation (2-3 months)

Set up EKS cluster with proper VPC design
Implement GitOps pipeline (Flux or ArgoCD)
Deploy observability stack
Establish secrets management
Create standard Helm charts
Document everything

Phase 2: Pilot (1-2 months)

Migrate 2-3 simple, non-critical services
Run in parallel with existing infrastructure
Iterate on platform based on learnings
Build team confidence

Phase 3: Accelerate (3-6 months)

Migrate services in waves (5-10 at a time)
Prioritize by risk (low risk first)
Build internal champions on each team
Refine platform tooling based on feedback

Phase 4: Complete (2-3 months)

Migrate complex and stateful services
Decommission legacy infrastructure
Optimize resource utilization
Document lessons learned

Cost Reality Check

Kubernetes is not automatically cheaper. Our first-year cost comparison:

Category	Before (EC2)	After (EKS)
Compute	$35K/month	$25K/month
Load Balancers	$2K/month	$4K/month
Platform Tooling	$0	$3K/month
Data Transfer	$3K/month	$4K/month
Engineering Time	N/A	3 FTE for 6 months

Net compute savings: $10K/month. But it took 18 months to realize.

The real value was not cost savings. It was developer velocity, deployment confidence, and standardization.

Should You Migrate?

Kubernetes makes sense if:

You have 10+ services
Deployment friction slows you down
You need better resource utilization
You want infrastructure standardization
You can invest in platform engineering

Kubernetes does not make sense if:

You have a handful of simple services
Your team is small (< 10 engineers)
You do not have platform engineering capacity
Your current setup is working fine

The Bottom Line

Kubernetes migration is a multi-quarter investment. It will be harder than you expect, take longer than you plan, and require more organizational change than you anticipate.

But done right, it pays dividends for years: faster deployments, better reliability, and a foundation for scaling. Just go in with eyes open.

Anoop MC

Fractional CTO and AI Strategist helping enterprises navigate the AI revolution. 18+ years of experience building and scaling technology organizations.

Get in touch

Cloud

Cloud Cost Optimization: A CTO Playbook

11 min read

Kubernetes Migration: Lessons from Moving 50 Services to K8s

Why We Migrated

What We Got Right

Started with Stateless Services

Invested in Platform Tooling

Ran Parallel Environments

What We Got Wrong

Underestimated Networking Complexity

Tried to Lift and Shift

Skipped the Staging Environment

Did Not Plan for Stateful Services

Migration Playbook

Phase 1: Foundation (2-3 months)

Phase 2: Pilot (1-2 months)

Phase 3: Accelerate (3-6 months)

Phase 4: Complete (2-3 months)

Cost Reality Check

Should You Migrate?

The Bottom Line

Anoop MC

Related Articles

Cloud Cost Optimization: A CTO Playbook