💡  Beta product

This product is a Beta product. This means that it and the information is subject for change, updates or removal.
If you test this product, please let us know your feedback, so that we can make it the best possible product for you. Please share your feedback with us here.

Using interrupt to clear buffered audio

AI assistants tend to generate audio much faster than it can be played back; for example, it’s not uncommon to receive 1s of audio data in 100–300ms. If your application sends this audio to the API immediately, the data will be buffered and played back with the correct timing.

This behaviour lets you keep your integration code simple without having to deal with complex timing issues. However, there are situations where you want to interrupt playback after it has been buffered — if a user starts speaking, you’d normally want the AI to stop talking over them. For these cases there is the interrupt message.

The interrupt message tells the API to immediately clear its audio buffer and ignore all future audio messages. If your application wants to start sending audio again it must send a new sending message before doing so.

Example: Using the OpenAI realtime API

This example creates an audio bridge between the 46elks API and an OpenAI agent. All audio received from the API is forwarded to OpenAI, and all audio received from OpenAI is forwarded back to the caller. The interrupt message is used to silence the AI’s output when the user starts speaking.

async def openai_bridge(elks_ws, openai_ws):
    # Get the call metadata from the hello message
    hello = json.loads(await elks_ws.recv())
    print(f"Received {hello['to']} <- {hello['from']} ({hello['callid']})")

    # Tell the API the format we want to receive audio in
    await elks_ws.send(json.dumps({
        "t": "listening",
        "format": "pcm_24000"
    }))

    # Tell the API the format we'll be sending audio in
    await elks_ws.send(json.dumps({
        "t": "sending",
        "format": "pcm_24000"
    }))

    elks_recv = asyncio.create_task(elks_ws.recv())
    openai_recv = asyncio.create_task(openai_ws.recv())

    while True:
        receivers = [elks_recv, openai_recv]
        done, _ = await asyncio.wait(receivers, return_when=asyncio.FIRST_COMPLETED)

        if elks_recv in done:
            raw = elks_recv.result()
            elks_recv = asyncio.create_task(elks_ws.recv())

            msg = json.loads(raw)

            if msg["t"] == "audio":
                # Forward the audio to OpenAI
                await openai_ws.send(json.dumps({
                    "type": "input_audio_buffer.append",
                    "audio": msg["data"],
                }))

            elif msg["t"] == "bye":
                # The call has ended
                await openai_ws.send(json.dumps({
                    "type": "response.cancel"
                }))
                print("Call ended:", msg["message"])
                break

        if openai_recv in done:
            raw = openai_recv.result()
            openai_recv = asyncio.create_task(openai_ws.recv())

            msg = json.loads(raw)

            if msg["type"] == "input_audio_buffer.speech_started":
                # Cancel the AI response
                await openai_ws.send(json.dumps({
                    "type": "response.cancel"
                }))
                # Clear the audio buffer and start ignoring audio
                await elks_ws.send(json.dumps({
                    "t": "interrupt"
                }))

            elif msg["type"] == "response.audio.delta":
                # Forward the audio to 46elks
                await elks_ws.send(json.dumps({
                    "t": "audio",
                    "data": msg["delta"],
                }))

            elif msg["type"] in ("response.audio.done", "response.done", "response.cancelled"):
                # Stop ignoring audio
                await elks_ws.send(json.dumps({
                    "t": "sending",
                    "format": "pcm_24000"
                }))