Realtime Voice API BETA

Connect AI to real phone calls

Get an audio stream via WebSocket from any call — inbound or outbound — and connect it to OpenAI, Claude, ElevenLabs, or your own model.

Build your first agent Documentation

How does it work?

1
Phone

Get a number from us
2
46elks

We handle all the technical work to deliver the call as a WebSocket stream
3
Your app

You receive the audio stream and decide what to do with it
4
AI / Tools

Choose any AI you want. Build your own tools and use them in the call.

Try it now.

Try one of our demos and see what's possible with AI in real phone calls.

Order a pizza

Call our AI and order a pizza. It takes your order, asks about toppings, and confirms delivery.

+46 76-686 77 77

Interactive demo

Pizza demo – ring och beställ pizza med AI

Talk to our documentation

Ask questions about the 46elks API directly over the phone. Our assistant knows it inside out.

+46 76-686 29 49

You own the entire chain

No black box. You get the audio stream and send back audio. In between you do exactly what you want — transcribe, analyze, generate responses, call your tools.

Works with any AI you choose

OpenAI Realtime, Claude, ElevenLabs, Whisper, your own model — doesn't matter. The only requirement is that your app receives and sends audio.

Easy to build on

A complete AI voice assistant in ~60 lines of Python. No SIP. No Asterisk. Just a WebSocket.

WebSocket URL:

wss://my.server.io/audio

async def openai_bridge(elks_ws, openai_ws):
    # Get the call metadata from the hello message
    hello = json.loads(await elks_ws.recv())
    print(f"Received {hello['to']} <- {hello['from']} ({hello['callid']})")

    # Tell the API the format we want to receive audio in
    await elks_ws.send(json.dumps({
        "t": "listening",
        "format": "pcm_24000"
    }))

    # Tell the API the format we'll be sending audio in
    await elks_ws.send(json.dumps({
        "t": "sending",
        "format": "pcm_24000"
    }))

    elks_recv = asyncio.create_task(elks_ws.recv())
    openai_recv = asyncio.create_task(openai_ws.recv())

    while True:
        receivers = [elks_recv, openai_recv]
        done, _ = await asyncio.wait(receivers, return_when=asyncio.FIRST_COMPLETED)

        if elks_recv in done:
            raw = elks_recv.result()
            elks_recv = asyncio.create_task(elks_ws.recv())
            msg = json.loads(raw)

            if msg["t"] == "audio":
                # Forward the audio to OpenAI
                await openai_ws.send(json.dumps({
                    "type": "input_audio_buffer.append",
                    "audio": msg["data"],
                }))

            elif msg["t"] == "bye":
                # The call has ended
                await openai_ws.send(json.dumps({"type": "response.cancel"}))
                print("Call ended:", msg["message"])
                break

        if openai_recv in done:
            raw = openai_recv.result()
            openai_recv = asyncio.create_task(openai_ws.recv())
            msg = json.loads(raw)

            if msg["type"] == "input_audio_buffer.speech_started":
                # Cancel AI response and interrupt audio
                await openai_ws.send(json.dumps({"type": "response.cancel"}))
                await elks_ws.send(json.dumps({"t": "interrupt"}))

            elif msg["type"] == "response.audio.delta":
                # Forward the audio to 46elks
                await elks_ws.send(json.dumps({
                    "t": "audio",
                    "data": msg["delta"],
                }))

            elif msg["type"] in ("response.audio.done", "response.done", "response.cancelled"):
                # Stop ignoring audio
                await elks_ws.send(json.dumps({
                    "t": "sending",
                    "format": "pcm_24000"
                }))

WebSocket URL:

wss://quiet-lake-1234.ngrok-free.app/audio

async def openai_bridge(elks_ws, openai_ws):
    # Get the call metadata from the hello message
    hello = json.loads(await elks_ws.recv())
    print(f"Received {hello['to']} <- {hello['from']} ({hello['callid']})")

    # Tell the API the format we want to receive audio in
    await elks_ws.send(json.dumps({
        "t": "listening",
        "format": "pcm_24000"
    }))

    # Tell the API the format we'll be sending audio in
    await elks_ws.send(json.dumps({
        "t": "sending",
        "format": "pcm_24000"
    }))

    elks_recv = asyncio.create_task(elks_ws.recv())
    openai_recv = asyncio.create_task(openai_ws.recv())

    while True:
        receivers = [elks_recv, openai_recv]
        done, _ = await asyncio.wait(receivers, return_when=asyncio.FIRST_COMPLETED)

        if elks_recv in done:
            raw = elks_recv.result()
            elks_recv = asyncio.create_task(elks_ws.recv())
            msg = json.loads(raw)

            if msg["t"] == "audio":
                # Forward the audio to OpenAI
                await openai_ws.send(json.dumps({
                    "type": "input_audio_buffer.append",
                    "audio": msg["data"],
                }))

            elif msg["t"] == "bye":
                # The call has ended
                await openai_ws.send(json.dumps({"type": "response.cancel"}))
                print("Call ended:", msg["message"])
                break

        if openai_recv in done:
            raw = openai_recv.result()
            openai_recv = asyncio.create_task(openai_ws.recv())
            msg = json.loads(raw)

            if msg["type"] == "input_audio_buffer.speech_started":
                # Cancel AI response and interrupt audio
                await openai_ws.send(json.dumps({"type": "response.cancel"}))
                await elks_ws.send(json.dumps({"t": "interrupt"}))

            elif msg["type"] == "response.audio.delta":
                # Forward the audio to 46elks
                await elks_ws.send(json.dumps({
                    "t": "audio",
                    "data": msg["delta"],
                }))

            elif msg["type"] in ("response.audio.done", "response.done", "response.cancelled"):
                # Stop ignoring audio
                await elks_ws.send(json.dumps({
                    "t": "sending",
                    "format": "pcm_24000"
                }))

Develop locally with ngrok

ngrok creates a secure tunnel from the internet to your local machine. This lets 46elks reach your WebSocket server without needing to deploy.

ngrok http 8000

Download ngrok

WebSocket URL:

wss://your-voice-agent.lovable.app/audio

Build a demo with Lovable

Paste the prompt below into Lovable to generate a working AI phone assistant with configurable system prompt and welcome message.

Build an AI phone assistant that receives voice calls via 46elks and responds in real time using the OpenAI Realtime API. The system should consist of an Edge Function (WebSocket bridge), a database for call logging, and a dashboard frontend.
---
### 1. Database
Create two tables:
**Table `calls`:**
- `id` (uuid, primary key, default gen_random_uuid())
- `call_id` (text, unique, not null) — 46elks call ID
- `from_number` (text, not null)
- `to_number` (text, not null)
- `status` (text, default 'active')
- `started_at` (timestamptz, default now())
- `ended_at` (timestamptz, nullable)
- `duration_seconds` (integer, nullable)
**Table `call_messages`:**
- `id` (uuid, primary key, default gen_random_uuid())
- `call_id` (text, not null, foreign key → calls.call_id)
- `role` (text, not null) — 'user' or 'assistant'
- `content` (text, not null)
- `created_at` (timestamptz, default now())
Enable Realtime on both tables. Disable RLS (the tables are only used by the edge function with service role key, and the frontend reads publicly).
---
### 2. Edge Function: `voice-stream`
Create an Edge Function named `voice-stream`. Disable JWT verification (verify_jwt = false) so that 46elks can connect directly.
The function is a WebSocket bridge between 46elks and the OpenAI Realtime API.
#### ⚠️ CRITICAL: 46elks WebSocket Protocol
This is the most important part. 46elks uses a JSON-based protocol over WebSocket. If you don't follow this exactly, no audio will be sent or received — the call will be silent in both directions with no error messages.
**Handshake sequence:**
1. **46elks sends** `{"t": "hello", ...}` when the connection opens
2. **You MUST respond** `{"t": "listening", "format": "pcm_24000"}` — this activates the audio stream from the caller to you
3. **Before sending the first audio packet back**, you MUST send `{"t": "sending", "format": "pcm_24000"}` — this opens the write buffer so the caller can hear you
4. **Audio data** is sent as `{"t": "audio", "data": ""}` in both directions
5. **Termination** arrives as `{"t": "bye", "reason": "..."}` — close the OpenAI session and update the database
**Audio format:** PCM 16-bit, 24000 Hz (`pcm_24000` for 46elks, `pcm16` for OpenAI)
#### OpenAI Realtime API Connection
Connect to `wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17` with subprotocols:
```
["realtime", "openai-insecure-api-key.", "openai-beta.realtime-v1"]
```
**Session configuration** (send as `session.update` immediately after the OpenAI socket opens):
```json
{
"type": "session.update",
"session": {
"modalities": ["text", "audio"],
"instructions": "You are a helpful AI assistant.",
"voice": "alloy",
"input_audio_format": "pcm16",
"output_audio_format": "pcm16",
"input_audio_transcription": {
"model": "gpt-4o-mini-transcribe"
    },
"turn_detection": {
"type": "server_vad",
"threshold": 0.5,
"prefix_padding_ms": 300,
"silence_duration_ms": 500
    }
  }
}
```
#### ⚠️ CRITICAL: Greeting Phrase Timing
Do NOT send `response.create` for the greeting immediately. Wait for:
1. OpenAI sends `session.updated` (confirms the session is configured)
2. Then a `setTimeout` of 500ms (lets the audio pipeline stabilize)
3. Only THEN send:
```json
{
"type": "response.create",
"response": {
"modalities": ["text", "audio"],
"instructions": "Start by greeting the user. Speak in English."
  }
}
```
#### Audio Flow
**Incoming (caller → AI):**
- 46elks sends `{"t": "audio", "data": ""}` 
- Forward to OpenAI as `{"type": "input_audio_buffer.append", "audio": ""}`
**Outgoing (AI → caller):**
- OpenAI sends `response.audio.delta` with `delta` (base64 PCM)
- First time: send `{"t": "sending", "format": "pcm_24000"}` to 46elks
- Then send `{"t": "audio", "data": ""}` to 46elks
#### Transcription and Database
- `response.audio_transcript.delta` — accumulate assistant text
- `response.audio_transcript.done` — save to `call_messages` with role='assistant'
- `conversation.item.input_audio_transcription.completed` — save to `call_messages` with role='user'
#### Call Termination
On `{"t": "bye"}` from 46elks, or `onclose` on the elks socket:
- Close the OpenAI socket
- Update the `calls` table: status='completed', ended_at, duration_seconds
#### Complete Flow Overview
```
46elks                    Edge Function                 OpenAI Realtime
  |                           |                              |
  |--- WebSocket connect ---->|                              |
  |                           |--- WebSocket connect ------->|
  |                           |<-- session.created ----------|
  |<-- {"t":"hello"} --------|                              |
  |--- {"t":"listening"} --->|--- session.update ---------->|
  |                           |<-- session.updated ----------|
  |                           |    (wait 500ms)              |
  |                           |--- response.create --------->|
  |                           |<-- response.audio.delta -----|
  |<-- {"t":"sending"} ------|                              |
  |<-- {"t":"audio"} --------|                              |
  |                           |                              |
  |--- {"t":"audio"} ------->|--- input_audio_buffer ------>|
  |                           |<-- response.audio.delta -----|
  |<-- {"t":"audio"} --------|                              |
  |                           |                              |
  |--- {"t":"bye"} --------->|--- close ------------------->|
```
---
### 3. Frontend
Create a dark dashboard with:
**Header:** "AI Phone Assistant" with a phone icon.
**Setup Guide:** A step-by-step guide with 4 steps:
1. Get a 46elks account and phone number (link to 46elks.se)
2. Add your OpenAI API key as a secret in Lovable Cloud
3. Point your 46elks number to the WebSocket URL (show the URL with a copy button)
4. Test by calling the number
The WebSocket URL should be constructed dynamically: `wss:///functions/v1/voice-stream`
**Call Log:** A table showing all calls with:
- Time, from number, to number, duration, status (active/completed)
- Clickable row that expands to show the transcript (user and AI messages)
- Real-time updates via Supabase Realtime (subscribe to `calls` and `call_messages`)
- Active calls should have a pulsing green dot
**Design:** Dark theme with green accents (terminal feel). Use JetBrains Mono for code snippets and monospace text.
---
### 4. 46elks Configuration
The user needs to configure their 46elks number in their dashboard:
- Set `voice_start` to the WebSocket URL from step 3 above
- 46elks will automatically open a WebSocket connection for each incoming call and send `callid`, `from`, and `to` as query parameters
---
### 5. Secrets
The following secret needs to be configured in Lovable Cloud:
- `OPENAI_API_KEY` — OpenAI API key with access to the Realtime API

This is a working AI voice assistant. ~60 lines. Swap out OpenAI for any model you want.

Check out our Getting started guide ->

What you can build

AI voice assistant

An AI that answers calls, speaks naturally, and takes actions — books appointments, checks orders, answers questions. With tool-calling it becomes an agent, not just a voice.

AI that calls out

Your app initiates a call and connects an AI agent. Outbound sales, reminders, follow-ups.

Real-time transcription

Transcribe live. Flag keywords. Run sentiment analysis while the call is in progress. Surface relevant information in real time.

System integration

CRM updates, ticket management, logging — automatic, in real time. The phone becomes an interface to your systems.

Under the hood

Protocol

WebSocket (wss://) with full duplex.

Audio format

PCM 8/16/24 kHz, G.711, G.722, Opus, MP3, WAV.

Direction

Inbound and outbound calls.

Streams

Separate streams for caller and agent, individually controllable.

Control

bye ends the call gracefully. interrupt clears the buffer immediately.

Integration

JSON messages, base64-encoded audio.

Get started Documentation

Pricing

Realtime Voice is included with 46elks virtual numbers

0.15 SEK/min to connect a call to WebSocket

3 concurrent calls per number Talk to us if you need more

See full price list →

Get in touch

If you’re wondering how you can use 46elks for a project,
don’t hesitate to contact us.

+46767861004 help@46elks.com

Connect AI to real phone calls

How does it work?

Phone

46elks

Your app

AI / Tools

Try it now.

Order a pizza

Talk to our documentation

You own the entire chain

Works with any AI you choose

Easy to build on

Develop locally with ngrok

Build a demo with Lovable

What you can build

AI voice assistant

AI that calls out

Real-time transcription

System integration

Under the hood

Protocol

Audio format

Direction

Streams

Control

Integration

Pricing

Get in touch