Case 01 — Job-Intelligence Platform
Distributed 7-server crawler, 16 ATS integrations, continuous enrichment across 2.5 M open positions in the DACH market.
The challenge
A DACH recruitment leader needed a data layer its own team could no longer sustain: millions of active positions, refreshed daily, pulled from 16 different applicant-tracking systems, enriched with contacts, salary bands, company metadata and semantic description analysis. Without downtime, without data gaps, with forensically auditable quality control.
Architecture
A master node orchestrates API, cron scheduling, daemon-keeper and frontend delivery. Six specialised workers split the load by domain — ATS crawling, career-page extraction, description shards, geo-discovery. A dedicated database host with a PgBouncer pool.
Pipeline
8-shard description pipeline (resilient)
- 01Sharding via hashtext — deterministic distribution across 8 partitions
- 02Per-shard Python process + dedicated log file
- 03Endless reconnect with exponential backoff [1,2,4,8,16,30]s
- 04Mini-batch commit every 50 rows — idempotent, UPDATE-only
- 05Daemon-keeper with Telegram alerts — auto-restart on miss + log tail + OOM check
Technology stack
Outcome
Since go-live: 99.9 %+ uptime. Description coverage 84 %, email coverage 65 %, quality score climbing toward 80 %. The pipeline runs 04:30–07:30 daily with zero operator intervention. Two years of planned remediation were made obsolete by unified connection management and a cluster-wide daemon-keeper.