Data Engineering · Recruitment Intelligence · 2026

Case 01 — Job-Intelligence Platform

Distributed 7-server crawler, 16 ATS integrations, continuous enrichment across 2.5 M open positions in the DACH market.

2,5 MJob postings

7Servers in the cluster

16ATS integrations

55+Live daemons

The challenge

A DACH recruitment leader needed a data layer its own team could no longer sustain: millions of active positions, refreshed daily, pulled from 16 different applicant-tracking systems, enriched with contacts, salary bands, company metadata and semantic description analysis. Without downtime, without data gaps, with forensically auditable quality control.

Architecture

A master node orchestrates API, cron scheduling, daemon-keeper and frontend delivery. Six specialised workers split the load by domain — ATS crawling, career-page extraction, description shards, geo-discovery. A dedicated database host with a PgBouncer pool.

MASTERAPI · Cron · Orchestrator · Frontend

W1ATS crawler · 13 enricher daemons

W2Career pages · triple enricher · contact completer

W3Career HTML · PDF extraction · description shards 3–4

DBPostgreSQL 15 primary · PgBouncer pool

W5Description shards 5–7 · residential-proxy scraper

W6Geo-discovery · 25 Docker containers

Pipeline

8-shard description pipeline (resilient)

01Sharding via hashtext — deterministic distribution across 8 partitions
02Per-shard Python process + dedicated log file
03Endless reconnect with exponential backoff [1,2,4,8,16,30]s
04Mini-batch commit every 50 rows — idempotent, UPDATE-only
05Daemon-keeper with Telegram alerts — auto-restart on miss + log tail + OOM check

Technology stack

Next.js 14 (App Router + Pages)FastAPI · UvicornPostgreSQL 15 · PgBouncerRedisPlaywrightDocker Composesystemd · cronSendGridCloudflare WorkersTelegram Bot APIIPRoyal (residential)nginx · Let's Encrypt

Outcome

Since go-live: 99.9 %+ uptime. Description coverage 84 %, email coverage 65 %, quality score climbing toward 80 %. The pipeline runs 04:30–07:30 daily with zero operator intervention. Two years of planned remediation were made obsolete by unified connection management and a cluster-wide daemon-keeper.

Similar challenge?

Talk to us — we listen first, deliver second.

Request a project→