2023·Senior Engineer
Distributed Task Orchestrator
A reliable task orchestration platform with retries, scheduling, and observability, running tens of millions of tasks per day.
Problem
The company outgrew its homegrown cron system. Critical financial workflows ran on a fragile mix of shell scripts and database triggers with no central observability.
Solution
- Designed an opinionated orchestrator with clear separation between control plane (scheduling, retries, state) and data plane (workers).
- Used PostgreSQL with
SKIP LOCKEDfor fair dispatch and Redis for ephemeral coordination. - Built an explicit idempotency contract: every task carries a stable key, guaranteed at-least-once with deterministic dedup at the worker.
Result
- 10x throughput on the same hardware with linear scaling until the database became the bottleneck.
- Mean time to recover from bad deploys dropped from hours to minutes.
- Platform became the default substrate for new internal workflows.