Text to Speech: A Look at the Qt Speech Module

By Jeff Tranter Wednesday, June 28, 2017

In this post we'll take a look at one of the newer Qt modules: Qt Speech.

What Is Qt Speech?

Qt Speech is a module providing cross-platform support for text to speech. It supports text to real-time speech output. Common use cases for text to speech include enabling software for visually impaired users and scenarios where users cannot utilize a touchscreen, mouse, or keyboard input, such as in-vehicle applications. The module may support other features, such as speech recognition, in the future.

The Qt Speech module was first introduced in the Qt 5.8.0 release. As of Qt 5.9.0 it is considered to be at a Technical Preview status, meaning that the APIs could still change.

Features

Qt Speech provides a C++ API (there are no QML bindings). It uses back end plugins to support different speech engines.

It provides support for different locales (language, country, and text encoding). Typically there are different voices available, which have the properties name, gender, and age. You can set the parameters pitch, rate, and volume.

It is event-driven. Speech playback is non-blocking. You can query the state (Ready, Speaking, Paused, or BackendError), and call slots to play, pause, resume, or stop playback. Signals are emitted when properties or state are changed.

Supported Platforms/Plugins

Qt Speech is supported on Linux, Android, MacOS, iOS, and Windows. On Linux it uses libspeechd and flite. On other platforms it uses the native text to speech APIs. There is a also a back end for the Commercial Nuance Vocalizer software.

Using the API

There is a single module header file, <QTextToSpeech>, that pulls in all of the relevant classes. To add the module to your qmake project, use the line:

QT += texttospeech

C++ Classes

The module provides three classes. QTextToSpeech is the most important one, providing access to the text-to-speech engines and features. QVoice allows setting and retrieve values of a particular voice. The QTextToSpeechPlugin class is the base for all text-to-speech plug-ins.

Code Example

Here is a very simple example application that illustrates most of the classes and APIs. To keep it short and standalone, it is not event driven. It displays some information about the available speech engines and locales, and the properties of the default engine. It also outputs some speech. You can download the source code and a qmake project file here.

// Qt Speech minimal example.
#include <QDebug>
#include <QTextToSpeech>
#include <QVoice>
#include <QThread>
int main()
{
// List the available engines.
QStringList engines = QTextToSpeech::availableEngines();
qDebug() << "Available engines:";
for (auto engine : engines) {
qDebug() << "  " << engine;
}
// Create an instance using the default engine/plugin.
QTextToSpeech *speech = new QTextToSpeech();
// List the available locales.
qDebug() << "Available locales:";
for (auto locale : speech->availableLocales()) {
qDebug() << "  " << locale;
}
// Set locale.
speech->setLocale(QLocale(QLocale::English, QLocale::LatinScript, QLocale::UnitedStates));
// List the available voices.
qDebug() << "Available voices:";
for (auto voice : speech->availableVoices()) {
qDebug() << "  " << voice.name();
}
// Display properties.
qDebug() << "Locale:" << speech->locale();
qDebug() << "Pitch:" << speech->pitch();
qDebug() << "Rate:" << speech->rate();
qDebug() << "Voice:" << speech->voice().name();
qDebug() << "Volume:" << speech->volume();
qDebug() << "State:" << speech->state();
// Say something.
speech->say("Hello, world! This is the Qt speech engine.");
// Wait for sound to play before exiting.
QThread::sleep(10);
}

The sleep at the end is needed so that the program does not exit before the speech has completed playing. In a real application you would typically make it event driven and could check the state of the engine using signals.

This example was tested with Qt 5.9.0 under Linux, Windows and MacOS. It will also work with Qt 5.8.0.

Here is some typical output on a Linux system:

Available engines:
"speechd"
Available locales:
QLocale(Afrikaans, Latin, SouthAfrica)
QLocale(Bulgarian, Cyrillic, Bulgaria)
QLocale(Bosnian, Latin, BosniaAndHerzegowina)
QLocale(English, Latin, UnitedStates)
...
Available voices:
"en-westindies"
"english-us"
"english_wmids"
"english_rp"
"english-north"
"default"
Locale: QLocale(English, Latin, UnitedStates)
Pitch: 0
Rate: 0
Voice: "en-westindies"
Volume: 0
State: 0

Graphical Example

The Qt source includes a larger example located in <QTDIR>/examples/speech/hello_speak if you have installed Qt's examples. It allows setting most of the engine properties and entering text. Below are screen shots of the application running on the three desktop platforms.

Linux

Windows

MacOS

Other Issues and Comments

On an Ubuntu Linux system, the packages needed for Qt Speech if you build Qt from source are flite, flite1-dev, libflite1, and speech-dispatcher-flite. On the other platforms there are no additional dependencies.

The open source Flite engine used on Linux is not of particularly high quality, being based on the Festival software developed at the University of Edinburgh and Carnegie Mellon University. The quality under MacOS and Windows is better, being based on commercial software.

Summary

The Qt Speech module makes it easy to incorporate speech into your application, in a cross-platform manner. It is still considered to be at a Tech Preview status, so the APIs are still subject to change and we can expect to see more functionality in the future.

References

Qt Speech, Qt on-line documentation, https://doc.qt.io/qt-5/qtspeech-index.html
Speech Dispatcher, project website, https://devel.freebsoft.org/speechd
CMU Flite: A Small, Fast Run Time Synthesis Engine, project website, http://www.festvox.org/flite