Services

Engagements with a clear edge,
not open-ended "developer time."

Each of these is something I've shipped in production, repeatedly. Pick the one that matches your problem — or describe your situation and I'll tell you which fits.

// 01 · streaming

Event-streaming pipeline build

A Kafka / Confluent Cloud pipeline designed to handle real volume from day one — and stay up.

  • Topic design, partitioning, retention strategy
  • Producers & Kafka Streams processing
  • Exactly-once / idempotency patterns
  • Sink connectors (S3, DynamoDB, DocumentDB)
  • Schema Registry & compatibility governance
// 02 · migration

Zero-downtime migration

The version upgrade or platform move everyone's been avoiding — done across every environment without an outage.

  • Confluent / Kafka version upgrades
  • Cluster linking & multi-region failover
  • Static-credential → IAM role auth cutover
  • Staged rollout across 10+ environments
  • Rollback runbook before anything ships
// 03 · cloud

AWS platform & IaC

Infrastructure that's reproducible, reviewable, and not living in one person's head or the console.

  • ECS / EKS / Lambda architecture
  • Terraform from scratch or cleanup of drift
  • Task placement & cost/throughput tuning
  • DynamoDB / RDS data layer design
  • CI/CD with Jenkins + secrets management
// 04 · reliability

Observability platform

Stop flying blind. End-to-end visibility built so the next incident pages you before the customer notices.

  • Datadog / Grafana dashboards from zero
  • OpenTelemetry distributed tracing
  • Consumer-lag & broker-health monitoring
  • Alert routing (OpsGenie / PagerDuty)
  • SLOs that reflect what users actually feel
// 05 · rescue

Incident RCA & rescue

The bug you can't reproduce, the fleet-wide failure, the thing that breaks only in prod. I've found these before.

  • Root-cause analysis on production incidents
  • State-lock, deadlock & race-condition debugging
  • Fleet-wide fixes via infrastructure-as-code
  • Duplicate-message & data-integrity issues
  • Post-incident hardening so it can't recur
// 06 · build

Backend & integration build

Java / Spring Boot services and third-party integrations done by someone who's wired up the messy ones.

  • Java 17 / Spring Boot microservices
  • REST & vendor API integration
  • Redis caching & distributed locking
  • AI-assisted features (Claude / Bedrock, RAG)
  • Android backends (see BidHound, NearMe)

Not sure which one you need?

That's normal. Describe the problem and I'll tell you the shortest path to fixing it — and whether I'm the right person for it.

Describe your project →