Call Center & Operations

The Problem

High-volume processing breaks when every step is an LLM call.

The instinct when automating call center workflows is to throw an agent at every step. But transcription is an API call. Routing is a lookup table. Classification might be a fine-tuned model or a simple heuristic. Wrapping all of these in the same LLM agent adds latency, cost, and failure modes that aren't necessary. The real challenge is mixing step types — fast deterministic operations and slower LLM-powered ones — in a workflow that scales to thousands of items per shift.

How It Works

Per-call processing with sharded_queue. Shift summaries with map_reduce.

The workflow has two distinct phases. The per-call pipeline runs each recording through a sequence of steps using sharded_queue — distributing thousands of items across workers. At the end of the shift, a map_reduce run aggregates trends across all calls into reports for managers.

Transcribe recordings

Each recording is sent to a transcription API — Whisper, Deepgram, or whichever the team uses. This is a plain function call: POST the audio, receive the transcript. No LLM, no agent. Fast, cheap, and cacheable. The transcript is written to the call's workspace directory for downstream steps.

Function call — transcription API

Extract key entities from the transcript

An agent reads the transcript and extracts structured data: customer name and account number (if mentioned), product referenced, issue type, specific complaint or request, sentiment, and any commitments made by the agent. This is where an LLM earns its place — transcripts are messy, and entity extraction requires understanding context, not just pattern matching.

Agent

Classify urgency

Using the extracted entities and the full transcript, classify the call: critical (safety issue, executive escalation, regulatory), high (billing dispute over $500, service outage), normal, or low. For most calls this is a deterministic rule applied to the extracted fields. For ambiguous cases — a customer who's upset but not escalating — an agent makes the call.

Function call or Agent

Route to the appropriate team

Apply the routing table: billing issues go to the billing queue, technical issues go to tier-2, executive escalations go directly to a named handler. This is a lookup and assignment — a pure function. No LLM needed. The result is a structured record written to the routing database, with the call's extracted fields and urgency score attached.

Function call — routing table

Generate shift summaries for managers

At the end of the shift, a map_reduce run aggregates all processed calls. At the map stage, each batch of ~50 calls is summarized by an agent: top issue types, sentiment distribution, escalation rate, common product mentions. At the reduce stage, a synthesis agent combines the batch summaries into a single shift report: what patterns emerged, what's different from yesterday, what needs attention tomorrow morning.

Agent — map_reduce

Why Epsilon

Mix deterministic and LLM steps in the same workflow. Scale to thousands of items.

sharded_queue for volume

Processing 3,000 calls in a shift means 3,000 items moving through a 5-step pipeline. The sharded_queue topology distributes items across a worker pool, handles failures and retries per item, and writes results to a shared workspace. You don't write a queue processor — Epsilon is one.

Only use LLMs where they matter

Transcription is an API call. Routing is a lookup. Urgency classification is usually a rule. Epsilon treats all of these as steps in the same workflow — you use an agent where interpretation is needed and a function call where it isn't. The result is faster, cheaper, and more predictable than running everything through an LLM.

map_reduce for aggregation

Summarizing 3,000 calls in a single LLM context is impossible. map_reduce breaks it into batches, summarizes each, and aggregates the summaries. The shift report a manager reads every morning is the output of a hierarchical aggregation, not a single agent trying to hold thousands of calls in memory.

Running It

One run per shift. One summary at the end.

Drop the recording paths into a task file. Epsilon processes every call, routes it, and produces the shift report. No manual intervention unless a critical escalation fires.

# process all calls from the morning shift
$ epsilon runs create --topology sharded_queue \
    --task "Process morning shift recordings: /recordings/2026-04-10-am/*.mp3" \
    --implementation python:call_pipeline.py:run

run_id: r-4b7f2e  topology: sharded_queue  status: running
  items: 1247  workers: 16  shards: 40
  complete: 312  in_progress: 640  queued: 295

$ epsilon runs get r-4b7f2e
  status: complete  items: 1247/1247  failed: 3 (retried, resolved)
  routed: billing=412 tier2=298 normal=534 critical=3
  artifacts: call_records/ routing_log.jsonl

# generate end-of-shift summary for managers
$ epsilon runs create --topology map_reduce \
    --task "Summarize morning shift: run r-4b7f2e. Highlight trends, escalations, anomalies vs yesterday." \
    --implementation python:shift_summary.py:run

run_id: r-9c1d4a  topology: map_reduce  status: complete
  batches: 25  aggregation levels: 2
  artifacts: shift_summary_2026-04-10-am.pdf