Integrating STT and TTS for Real-time Transcription and Text-to-Speech in Wazo Calls

julius · November 6, 2024, 4:21am

Hi everyone,

I’m working on a project to integrate Speech-to-Text (STT) and Text-to-Speech (TTS) capabilities into Wazo calls. The goal is to achieve real-time transcription of user speech and playback of the transcribed text as synthesized audio.

Here’s a breakdown of the desired workflow:

User Speaks: A user speaks into their phone or headset during a Wazo call.
Audio Capture: Wazo captures the audio stream from the call.
STT Integration: The captured audio is sent to an STT API (e.g., Google Cloud Speech-to-Text, Amazon Transcribe) for transcription.
Transcription Return: The transcribed text is returned to Wazo.
TTS Integration: The transcribed text is sent to a TTS API (e.g., Google Text-to-Speech, Amazon Polly) for text-to-speech conversion.
Audio Playback: The synthesized audio is streamed back to Wazo and played in the call session.

I’m looking for guidance and advice on the following:

Wazo APIs: Which Wazo APIs or modules can be used to capture audio streams and inject synthesized audio into a call?
External API Integration: How can I integrate external STT and TTS APIs with Wazo? Node Red would be a good option
Real-time Processing: What strategies can be employed to ensure low-latency processing and minimal impact on call quality?
Error Handling: How can I handle potential errors or failures in the STT and TTS processes?

Any insights, code examples, or best practices would be greatly appreciated.

Thank you in advance for your help!

julienfr · November 6, 2024, 11:29am

I haven’t done it, but here some ideas.

1/ using application endpoint

you can create an application
POST /applications/{application_uuid}/calls/{call_id}/stt/start
with options

max_time = 0 (unlimited),
engine = 2

Look at

2/ using node Red

I have examples, but here some docs:

cheers

Topic		Replies	Views
Bulk Meta Data and recording Extraction help Installation/Configuration	0	277	July 21, 2020
IVR et google TTS Installation/Configuration	2	355	August 2, 2022
Android/IOS App and WebRTC client - Looking for developers English	0	391	March 22, 2021
Node-red: Traitement du flux audio Français	4	192	October 18, 2023
Can Wazo do app to app calls? English	1	351	May 13, 2020

Integrating STT and TTS for Real-time Transcription and Text-to-Speech in Wazo Calls

Related topics