Do you use voice prompts on your application? If yes, then it is good to have a library of pre-recorded audio files for the prompts. However, if the prompts are not pre-decided, you have to generate them on the fly.
With the help of a Text-to-Speech (TTS) API, you can instantly generate audio clips or the audio file of text messages. It is a pretty handy way, considering the complexity of synthesizing speech. This blog post shows you how to leverage the HiBrainy TTS API to build a web service for generating speech audio files from text messages.
Overview of HiBrainy Text to Speech API
The HiBrainy TTS API is a powerful and simple API for generating audio clips from text messages (AKA Speech Recognition). To try this API, follow these steps to sign up for your free RapidAPI account to access the API console.
1. Sign Up for RapidAPI Account
To begin using the HiBrainy TTS API, you’ll first need to sign up for a free RapidAPI developer account. With this account, you get a universal API Key to access all APIs hosted in RapidAPI.
RapidAPI is the world’s largest API marketplace, with over 10,000 APIs and a community of over 1,000,000 developers. Our goal is to help developers find and connect to APIs to help them build amazing apps.
2. Subscribe to HiBrainy Text to Speech API
Once signed in, log in to your RapidAPI account and access the API console.
Click on the “Pricing” tab at the top. The HiBrainy TTS API offers a “Basic” plan. It has a very generous free quota of 500k API calls per month. Subscribe to the plan, and you are all set to try the API.
3. Test Your API Subscription
It’s time to do a test run of the HiBrainy TTS API.
Go back to the “Endpoints”tab of the API console. On the console, you can see the API endpoint, “POST Speak” and its parameters.
Keeping the default values for the parameters under “Request Body”, you can trigger the API. You should get an API response with 200 Success code. Depending upon your browser’s capability, you will also see an HTML audio widget for playing the synthesized speech returned from the API.
With this step, you are now ready to use this API on your applications. We will use this API within the Flask framework in a Python environment. You can extract the code snippet for invoking this API using the Python requests library.
Building a Web Service for Text to Speech Synthesis
Let us build something interesting with the HiBrainy TTS API. If your application use case relies on dynamically generated voice prompts, this API is a perfect ally. However, managing the workflows involving text to speech generation, querying and fetching the audio clips can become a tedious chore to handle within the application code.
That is where you need to think of separating responsibility. By having a web service to manage all the workflows of a text to speech synthesis process, you can build a REST API to expose the specific workflow actions to client applications.
Using one of the popular Python REST frameworks like Flask, you can quickly deploy a REST API service that leverages the HiBrainy TTS API. It acts as an intermediary between the HiBrainy TTS API and the client application. It alleviates the client application from having to manage the dynamic text to speech synthesis related chores internally.
Prerequisites
Considering the Flask framework, you need to set up the following components in your development environment to build and host the TTS web service.
- Python3: Download and install the Python3 runtime environment from the official website.
- Flask: The Flask micro framework defines the APIs for the web service. You can install it using the pip command pip install flask.
Text to Speech Service Architecture
The backend architecture of the TTS web service is envisioned as follows.
The service exposes three API endpoints.
- /tts/create : This is the API endpoint for creating a new audio clip from an input text. This endpoint, in turn, calls the HiBrainy TTS API and stores the generated audio clip as an audio file with a random name. The mapping between text and audio file is maintained in a row within the SQLite database with a unique ID.
- /tts/list : This API endpoint retrieves the list of all the generated audio files and their corresponding text message.
- /tts/<id> : This API returns the audio clip of a pre-generated text message, identified by its unique ID.
For testing, it also defines a default ‘/’ route for displaying an HTML page. This page acts as the client application that interacts with the service via the API endpoints defined above.
In the next section, we define the different code blocks for the web service for handling each API endpoint.
Text to Speech Service Code Blocks
Before getting into coding, it is a good practice to set up a project directory structure. Create a top level project directory named TTSService. Within that, create two subdirectories named static and templates.
Now launch your favorite Python editor and create a new file TTS.py within the top level project directory, TTSService. Follow along the subsections below to add each of the code blocks for the TTS service.
Code Block #1: Import statements and default route
TTS.py is the main code file that defines all the API endpoints and their respective business logic. First, add a basic flask application with the default route. However, we also need to import specific flask libraries and a few other libraries as follows.
File: TTSService/TTS.py
from flask import Flask, redirect, url_for, request, render_template, flash import requests, string, random import sqlite3 as sql import pathlib import sys app = Flask(__name__) @app.route('/') def default(): return render_template('TTS.html') |
The default route ‘/’ serves an HTML file named TTS.html. This is the client web app for testing the TTS web service. You will define the content of TTS.html a few code blocks down the line. So let’s keep it aside for now.
Code Block #2: Handling /tts/create route
Now let’s define the business logic for handling /tts/create API endpoint.
Add the following code below the default route.
File: TTSService/TTS.py
@app.route('/tts/create',methods = ['POST', 'GET']) def create(): if request.method == 'POST': user = request.form['nm'] url = "https://text-to-speech5.p.rapidapi.com/api/tts" payload = "tech=deep&text="+user+"&language=en" headers = { 'x-rapidapi-host': "text-to-speech5.p.rapidapi.com", 'x-rapidapi-key': "<YOUR_RAPIDAPI_KEY>", 'content-type': "application/x-www-form-urlencoded" } try: response = requests.request("POST", url, data=payload, headers=headers) if response.status_code == 200: rand = get_random_string(5) try: with sql.connect("database.db") as con: cur = con.cursor() cur.execute("INSERT INTO TTSRecords (text,audio) VALUES (?,?)",(user,rand,) ) con.commit() print("Record successfully added") except: con.rollback() print(sys.exc_info()[0]) wav_file = open(str(pathlib.Path(__file__).parent)+"/static/"+rand+".wav", "wb") wav_file.write(response.content) return render_template('TTS.html') else: return response.status_code+" Error" except requests.exceptions.ConnectionError as errc: return str(errc) |
Make sure to replace the placeholder <YOUR_RAPIDAPI_KEY> with your actual API key.
This code makes a call to the HiBrainy TTS API with a text message. Upon a successful response, it generates a random file name for the audio clip and stores the text message and audio clip file name in a row in the SQLite database. Subsequently, the API response body is extracted and stored as a .wav file within the static subdirectory.
Code Block #3: Handling /tts/list route
Append the following code in the TTS.py file for the /tts/list endpoint.
File: TTSService/TTS.py
@app.route('/tts/list') def list(): con = sql.connect("database.db") con.row_factory = sql.Row cur = con.cursor() cur.execute("select * from TTSRecords") rows = cur.fetchall() if len(rows) > 0: return render_template("TTS.html",rows = rows) else: return "No Entries Found!" |
The logic for this endpoint is simple. It retrieves all the stored text messages and their IDs from the SQLite database and returns it as a row argument for the TTS.html template.
Code Block #4: Handling /tts/<ID> route
Append the following code in the TTS.py file for the /tts/<ID> endpoint.
File: TTSService/TTS.py
@app.route('/tts/<ID>') def openfile(ID): print(ID) con = sql.connect("database.db") con.row_factory = sql.Row cur = con.cursor() cur.execute("select * from TTSRecords where id=?",(ID,)) lst = cur.fetchall() return render_template("audio.html",name=lst[0]['audio']) |
This route is a dynamic endpoint for retrieving the audio clip of each text message. This is queried from the SQLite database based on the ID, and returned as a template file named audio.html.
Code Block #5: Adding common function and main block
Lastly, you wrap up the code by adding the function for generating random string for the audio clip file name and the main block.
File: TTSService/TTS.py
def get_random_string(length): letters = string.ascii_lowercase result_str = ''.join(random.choice(letters) for i in range(length)) return result_str if __name__ == '__main__': app.run(debug = True) |
The main block initializes the flask app in debug mode. While running, this launches a development server on http://localhost:5000.
The HTML Template Code Blocks
HTML templates are used to generate dynamic web pages based on specific parameters. For the TTSService, we define two HTML templates. These are not a part of the service. They are just a client application stub to test the TTSService APIs that we have defined earlier.
Let’s add the code blocks for the templates.
Code Block #6: TTS Main Page Template
This is the main page. It has a dropdown to invoke the various API endpoints.
File: TTSService/templates/TTS.html
<html> <head> <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css"> <!-- jQuery library --> <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script> <!-- Latest compiled JavaScript --> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/js/bootstrap.min.js"></script> <style> .modal { display: none; /* Hidden by default */ position: fixed; /* Stay in place */ z-index: 1; /* Sit on top */ padding-top: 100px; /* Location of the box */ left: 0; top: 0; width: 100%; /* Full width */ height: 100%; /* Full height */ overflow: auto; /* Enable scroll if needed */ background-color: rgb(0, 0, 0); /* Fallback color */ background-color: rgba(0, 0, 0, 0.4); /* Black w/ opacity */ } .loader { margin: auto; border: 16px solid #f3f3f3; border-radius: 50%; border-top: 16px solid #3498db; width: 120px; height: 120px; -webkit-animation: spin 2s linear infinite; /* Safari */ animation: spin 2s linear infinite; } @keyframes spin { 0% { transform: rotate(0deg); } 100% { transform: rotate(360deg); } } #dropdown { width: 200px; margin: auto; border: 2px solid black; } #form { margin: auto; } #table { margin: auto; } </style> </head> <body style="text-align: center;"> <script> function dropdownChange() { if (document.getElementById('dropdown').value == 'create') { document.getElementById('form').style.display = ''; var audioEl = document.getElementById('audio'); if (audioEl) document.getElementById('audio').style.display = 'none'; else document.getElementById('table').style.display = 'none'; } else if (document.getElementById('dropdown').value == 'list') { document.getElementById('form').style.display = 'none'; window.location.replace("http://localhost:5000/tts/list"); } } function showLoading() { document.getElementById("myModal").style.display = "block"; } </script> <h1>Text to Speech Service</h1> <select onchange='dropdownChange()' class="select form-control" id="dropdown" name="dropdown"> <option selected="true" disabled="disabled">--Choose--</option> <option value="create">create</option> <option value="list">list</option> </select> <div style="margin-top:25px;"> <form action="http://localhost:5000/tts/create" method="post" id="form" style="display: none"> <p>Enter Name:</p> <p><input type="text" name="nm" /></p> <p><input type="submit" value="submit" onclick="showLoading()" /></p> </form> </div> {% if rows|length > 0 %} <table border=1 id="table"> <thead> <td>ID</td> <td>Text</td> </thead> {% for row in rows %} <tr> <td>{{row['id']}}</td> <td><a href="http://localhost:5000/tts/{{row['id']}}">{{row['text']}}</a></td> </tr> {% endfor %} </table> {% endif %} {% if lt|length > 0 %} <div id="audio"> {% for l in lt %} <h5>{{l['id']}}</h5><audio controls> <source src='{{url_for('static', filename='')}}{{l['audio']}}.wav'></audio> {% endfor %} </div> {% endif %} <div id="myModal" class="modal"> <!-- Modal content --> <div class="loader"> </div> </div> </body> </html> |
The HTML part of the page is pretty straight forward. However, you should notice some non HTML syntax enclosed within { }. This is the syntax of Jinja template. It contains programming constructs to generate individual parts of the HTML dynamically.
The Jinja template code renders the <table> element to display all the previously generated text messages.
Code Block #7: Audio Playback Page Template
The is the page that is displayed when the user clicks on a text message in the list of previously generated text messages.
File: TTSService/templates/audio.html
<html> <body> <audio controls><source src='{{url_for('static', filename='')}}{{name}}.wav' ></audio> </body> </html> |
This template is passed with the filename of the speech audio file stored in the SQLite database during generation. It renders the HTML5 <audio> element to play the generated speech for the text message.
Testing the Application
You are all set to test the TTSService application now. Before proceeding, make sure that you saved all the files.
Before testing the application, you have to create an SQLite database. You can write a small Python script to do that.
File: TTSService/createdb.py
import sqlite3 conn = sqlite3.connect('database.db') print("Opened database successfully") conn.execute('CREATE TABLE TTSRecords (id INTEGER PRIMARY KEY AUTOINCREMENT, text TEXT, audio TEXT)') print("Table created successfully") conn.close() |
Open a terminal, change directory to the top level project directory, and run this script using Python.
python createdb.py
You should see a new file named ‘database.db’ under the top level project directory.
To run this application, issue the following command on the terminal.
python TTS.py
This command will start the flask application server on localhost:5000. Open your browser and point to this URL to load the web app. Go ahead and explore the dropdown options to generate and listen to the audio clips.
Conclusion
As you have experienced, it is easy to build and expose a set of REST APIs for managing a specific service, in this case, the text to speech processing. Flask provides its development server, which we have used for this purpose.
However, in the real world, the Flask application server will reside behind a front server, either Apache or Nginx. This is required for effectively handling heavy load due to many API requests coming in parallel. There are other considerations in case of a hefty traffic, like load balancing, going the container way, or deploying a more modular architecture, but that’s a discussion for another day.
We hope you enjoyed this tutorial. We will be back with a few more REST API implementations in other languages and frameworks soon. Take care.
What is text to speech conversion?
Text to speech conversion is a way of converting a text message into its corresponding spoken sound, in the form of an audio clip. You can perform a text to speech conversion using one of the many tools that allow you to mimic the speech in a specific voice profile, such as gender or age or ethnicity. However, the easiest way to achieve this is with the help of a text to speech API.
Is there an API for text to speech conversion?
Yes, it is possible to use an API to request for text to speech conversion. You can explore one of the many APIs for text to speech synthesis, hosted on the RapidAPI's marketplace.
How to use a text to speech API?
You can use the HiBrainy text to speech API to generate audio clips of a text message. The API allows a simple interface to send an input text message and returns a payload containing the binary data of the audio clip. You can sign up for a free account in RapidAPI and subscribe to the HiBrainy API to use it.
Leave a Reply