Engagements with a clear edge,
not open-ended "developer time."
Each of these is something I've shipped in production, repeatedly. Pick the one that matches your problem — or describe your situation and I'll tell you which fits.
Event-streaming pipeline build
A Kafka / Confluent Cloud pipeline designed to handle real volume from day one — and stay up.
- Topic design, partitioning, retention strategy
- Producers & Kafka Streams processing
- Exactly-once / idempotency patterns
- Sink connectors (S3, DynamoDB, DocumentDB)
- Schema Registry & compatibility governance
Zero-downtime migration
The version upgrade or platform move everyone's been avoiding — done across every environment without an outage.
- Confluent / Kafka version upgrades
- Cluster linking & multi-region failover
- Static-credential → IAM role auth cutover
- Staged rollout across 10+ environments
- Rollback runbook before anything ships
AWS platform & IaC
Infrastructure that's reproducible, reviewable, and not living in one person's head or the console.
- ECS / EKS / Lambda architecture
- Terraform from scratch or cleanup of drift
- Task placement & cost/throughput tuning
- DynamoDB / RDS data layer design
- CI/CD with Jenkins + secrets management
Observability platform
Stop flying blind. End-to-end visibility built so the next incident pages you before the customer notices.
- Datadog / Grafana dashboards from zero
- OpenTelemetry distributed tracing
- Consumer-lag & broker-health monitoring
- Alert routing (OpsGenie / PagerDuty)
- SLOs that reflect what users actually feel
Incident RCA & rescue
The bug you can't reproduce, the fleet-wide failure, the thing that breaks only in prod. I've found these before.
- Root-cause analysis on production incidents
- State-lock, deadlock & race-condition debugging
- Fleet-wide fixes via infrastructure-as-code
- Duplicate-message & data-integrity issues
- Post-incident hardening so it can't recur
Backend & integration build
Java / Spring Boot services and third-party integrations done by someone who's wired up the messy ones.
- Java 17 / Spring Boot microservices
- REST & vendor API integration
- Redis caching & distributed locking
- AI-assisted features (Claude / Bedrock, RAG)
- Android backends (see BidHound, NearMe)
Not sure which one you need?
That's normal. Describe the problem and I'll tell you the shortest path to fixing it — and whether I'm the right person for it.