System architecture

Ori is a real-time voice AI platform: it answers and places phone calls, runs a live speech-to-speech conversation against a configurable bot, and returns a fully analysed, recorded, dispositioned call record — at fleet scale. The whole system is four cooperating services around a shared data layer and a telephony plane. The defining design rule is a clean split between the control plane (which owns durable business state) and the runtime plane (which runs the actual calls and holds no long-lived state).

The platform at a glance

The single most important line in that diagram is Backend → runtime config → Fleet → results → Backend. The voice fleet fetches everything it needs at the start of each call and hands back a complete record at the end. It stores nothing in between. That one property is what lets the fleet scale horizontally and lets a crashed worker cost you exactly one call.

The four services

Service	Plane	What it owns	Stack
Console	Operator surface	The dashboard operators use: bot builder, campaign builder, call logs, reports, settings, knowledge bases	React + Vite single-page app
Backend API	Control plane	Auth, bots, campaigns, system settings, per-call runtime config, durable call records, CRM integrations, API keys, recordings access, fleet selection	Python · FastAPI
Voice fleet	Runtime / media plane	Live call workers, transport adapters, the speech pipeline, recording upload, post-call packaging, result delivery	Python · FastAPI · Pipecat
Dialler	Campaign execution plane	Campaign leases, predictive pacing, retry scheduling, outbound SIP dialing, answering-machine screening, attaching answered calls to the fleet	Python · asyncio worker

Control plane = the source of truth

The Backend and its data layer (MongoDB, Redis, vector DB, object storage) hold everything durable: who the bots are, what the campaigns contain, every call record. If it must survive a restart, it lives here.

Runtime plane = disposable muscle

The fleet and dialler do the heavy, real-time work but keep no durable state. They can be scaled out, restarted, or replaced freely — they always re-fetch config from the control plane.

The life of a call

Two call shapes converge on the same pipeline and the same post-call record. This is the flow that ties every service together.

A call begins

Either the dialler places an outbound campaign call (paced so it never overruns the fleet or the carrier), or a customer dials in to a configured number. Both arrive at the telephony plane.

A worker picks it up

The call lands on exactly one free fleet worker. The worker immediately asks the Backend for that bot’s runtime config — prompts, voice settings, tools, and any CRM data for this contact.

The conversation runs

Inside the worker, the speech pipeline runs the loop: speech-to-text → language model (with tools) → text-to-speech, with voice-activity and turn detection deciding who’s speaking. The audio is recorded.

The call ends and is finalised

The worker uploads the recording, then hands the Backend a complete result: transcript, post-call analysis, quality-control findings, and a disposition. The Backend stores it and can push it to a CRM.

Inside a fleet host

The fleet’s scaling model is deliberately simple. Each host runs many single-call worker processes behind a reverse proxy that hands each worker one call at a time.

Property	Value	Why it matters
Calls per worker	1	A stuck or crashed worker can only ever lose one call, never a batch.
Worker count	N per host (commonly 16 on a 4-CPU / 8 GB host)	Total host capacity = number of workers.
Proxy rule	`max_conns=1`	Stops a busy worker from being handed a second call.
Worker state	none	Workers re-fetch config every call, so any worker can take any call.

How a change reaches production

Code does not move by hand. Every change flows through version control and a build server before it lands on the servers — see the developer and DevOps sections for the full detail.

Where to go next

Run it locally

For developers: prerequisites, cloning the repos, and running the stack on your own machine.

Deploy it

For DevOps: the Bitbucket → Jenkins pipeline, host deployment, configuration, and the operations runbook.

Repository map

The four repositories, their boundaries, and the contracts between them.

Operations manual

For the team running day-to-day calling: bots, campaigns, calls, and reports.

​The platform at a glance

​The four services

Control plane = the source of truth

Runtime plane = disposable muscle

​The life of a call

​Inside a fleet host

​How a change reaches production

​Where to go next

Run it locally

Deploy it

Repository map

Operations manual

The platform at a glance

The four services

The life of a call

Inside a fleet host

How a change reaches production

Where to go next