Deep Research

How It Works

Fan out, extract, cross-reference, synthesize.

The workflow decomposes a research brief into parallel search agents, normalizes their outputs into a shared workspace, and hands everything to a synthesis agent that produces the final deliverable. The dag topology handles the fan-out; map_reduce handles aggregation at scale.

Define research questions and search strategy

An agent reads the research brief — say, "map the vector database landscape: key vendors, pricing models, differentiation, recent funding, customer segments" — and decomposes it into a set of specific search queries grouped by topic. This decomposition is itself an LLM task: it requires understanding the domain and knowing which questions are worth asking.

Agent

Fan out search queries in parallel

Each query spawns a dedicated search agent. For a competitive analysis, you might have 8–12 agents running simultaneously: one per competitor, plus agents covering pricing, open-source alternatives, recent press coverage, and analyst takes. Each agent runs independently, writes its raw findings to the shared workspace, and terminates.

Agent — parallel fan-out

Scrape, extract, and normalize content

Each search agent fetches pages, extracts relevant content, and normalizes it into a consistent schema: source URL, publication date, key claims, named entities, confidence level. The schema is enforced by the step contract — downstream steps expect structured data, not raw HTML.

Agent — per source

Cross-reference findings and resolve contradictions

A dedicated reconciliation agent reads all normalized outputs from the shared workspace and identifies contradictions: Pinecone's pricing page says one thing, a TechCrunch article from three months ago says another. The agent flags discrepancies, assigns confidence weights based on source recency and authority, and produces a reconciled fact set.

Agent

Synthesize into a structured report with citations

The synthesis agent reads the reconciled fact set and writes the final report: executive summary, vendor-by-vendor breakdown, comparison tables, key themes, gaps, and recommendations. Every claim links back to a source in the workspace. The output is a structured document, not a chat response.

Agent + QA loop

Why Epsilon

One agent can't hold the context. A team with a shared workspace can.

Parallelism that actually works

Running 12 searches in parallel isn't a loop with asyncio — it's a coordination problem. Which agents are done? Which failed? What do downstream steps get to see? Epsilon's dag topology handles all of that. You write the search agent once; Epsilon fans it out and collects the results.

Shared workspace as memory

Every agent writes its findings to a shared workspace directory. The reconciliation and synthesis agents read from that same directory. There is no context window to manage, no prompt engineering to pass findings between steps. The filesystem is the message bus.

Scale with map_reduce

For large research projects — due diligence across 200 companies, patent landscape across 500 filings — switch from dag to map_reduce. Epsilon distributes the work across worker pools, aggregates at each level of the tree, and produces the same structured output regardless of input size.

Running It

Describe what you want. Get a report.

The research brief becomes the task. Epsilon handles the decomposition, fan-out, and synthesis. You get a structured document with citations.

# competitive analysis: vector database landscape
$ epsilon runs create --topology dag \
    --task "Competitive analysis: vector database market. Cover Pinecone, Weaviate, Qdrant, Milvus, pgvector. Pricing, differentiation, funding, customer segments." \
    --implementation python:research_workflow.py:run

run_id: r-9e2f7d  topology: dag  status: running
  step 1/5  decompose         complete (12 search queries)
  step 2/5  search_agents     running (12 parallel)

# check progress mid-run
$ epsilon runs get r-9e2f7d
  step 2/5  search_agents     complete (11/12 succeeded, 1 retried)
  step 3/5  extract_normalize complete
  step 4/5  reconcile         running
  step 5/5  synthesize        queued

# large-scale research: switch to map_reduce
$ epsilon runs create --topology map_reduce \
    --task "Due diligence: 200 climate-tech startups from Crunchbase export" \
    --implementation python:research_workflow.py:run

run_id: r-1a9b3c  topology: map_reduce  status: running
  workers: 20  items: 200  aggregation: 3 levels

Good research isn't one good search. It's fifty mediocre ones, reconciled.