Abstract by ebrahimessa


More Info

?How a TTS can help businesses

The huge quantity of dynamic data that people need to check, on the move, about their
daily activities as news, sports, weather, stock options, bank accounts, E-mails etc., is on the
rise. It thus needs a powerful TTS engine that converts them into human voice in real time,
and at the same time to be able to handle the tremendous load of telephone calls using the
.latest techniques in load balancing and streaming

How a TTS technology can save money and time

The TTS technology is becoming inevitable in some businesses that need to feed their
customers with the latest and vital information in real time. These businesses usually use IVR
(Interactive Voice Response) systems and call centers to communicate this information to
their customers and prospects. Converting vital data stored in Web sites, databases and files
into human voice using the traditional expensive and time-consuming human recordings in
studios is becoming a hard and long process since the information is usually dynamic. In
some cases, it would be impossible to track these changes using the human recordings way.
Only a powerful TTS engine that supports Arabic and English languages can be the right
.solution for the information stored in IVR systems and call centers

?Why an Arabic TTS is different

Arabic is a difficult language and it is not like languages as English, French, or Spanish. Those
languages, written in Latin alphabet, have vowels while the Arabic language has special
characters called "diacritics". These diacritics give the Arabic words the correct meaning
inside a sentence. For example, two Arabic words that have different meanings can be
written exactly the same and only the diacritics can help the reader to distinguish them.

For this reason, Sakhr developed the Diacritizer engine. This engine can put the diacritics
needed in Arabic texts automatically. The Diacritizer is the main component in Arabic TTS.
Without the Diacritizer, the output quality of the TTS engine would be inaccurate and not
clear. Since Arabic native speakers write Arabic text without diacritics, the TTS engine should
handle the non-diacritized text. The Diacritizer will convert the non-diacritized text into a
.diacritized text and then the TTS engine will convert it to a clear and human Arabic voice

?How to License Sakhr TTS

Sakhr TTS is licensed for developers as an SDK (Software Development Kit) and for end users
as run-time licenses
The Technology

Text-to-speech technology converts any computer readable text into a human sounding
synthetic speech. Arabic is at least one order of magnitude difficult than other common
languages due to the lack of diacritics, i.e. vowelization needed to properly utter any input

A major limitation encountering the development of Arabic TTS was the constraints imposed
by handling undiacritized Arabic text. Sakhr automatic Diacritizer is integrated with a speech
synthesizer engine to produce a real system with undiacritized Arabic as input and high
quality speech as output. The generation of such an output would be impossible without an
automatic Diacritizer, due to the abundance of different pronunciation of the majority of the
words without diacritics. Sakhr used its unique diacritizer to provide the TTS synthesizer with
.the adequate vowelization to produce a natural and intelligible sound

Sakhr text-to-speech (TTS) engine is composed of three basic parts. The Linguistic Module
converts the input text into a phonetic transcription. The Phonetic Module calculates speech
parameters, and the Acoustic Module uses those parameters to generate synthetic speech

The Linguistic Module

This module is composed of four parts: Text Normalization, Grapheme To Phoneme (G2P)
conversion, Lexical Analysis and Syntactic Analysis. Text Normalization handles language
dependent abbreviations, dates, currencies, time indications, phone numbers and other
special symbols. It also correctly handles quotation marks, Parentheses, apostrophes and
punctuation marks. After Grapheme To Phoneme conversion, the system resolves
pronunciation ambiguities, through lexical and syntactic analysis, and identifies the proper
prosodic phrases for each sentence. The output is a phonetic representation of the input

The Phonetic Module

The Phonetic Module performs segmental synthesis and creates high quality prosodic
patterns. In order to create synthetic speech, Sakhr engine is flexible enough to use the
proper speech segments such as diphones, triphones, tetraphones or much more. Those
segments, which are taken from human speech, preserve phoneme transitions as well as co-
articulation effects. By concatenating the speech segments, high quality synthetic speech is
obtained. To synthesize intelligible and natural sounding speech, it is essential to create
good prosodic characteristics. This is accomplished through the production of good
.intonation contours and the assignment of the correct duration to each phoneme
The Acoustic Module

The Acoustic Processing Module converts the speech data that was created previously, into
speech signals. The Sakhr concatenation of speech segments and the synthesis of prosody
are based on the latest synthesis techniques. The output is an array of wave samples with
sampling rates ranging from 8 to 44kHz to cover a broad range of quality and applications
.from telephony to CD audio quality

Development Tools & Application Programming Interface

For Sakhr customers, Application Programming Interfaces (API) are available in a Software
Development Kit (SDK), ActiveX control and COM object. The functionality of the APIs varies
depending on the market and the intended use of the end product. Some of the functions of
the APIs are to provide the application developer a way to initiate and stop the text-to-
speech synthesis, select the correct exception dictionaries, as well as load the proper
application and language. In addition to the APIs, other application design tools such as an
editor to develop personal exception dictionaries are available and SAPI 4 and SAPI 5.1
.versions are available upon customer request

Feature Set

Convert any computer text into natural sounding speech output with phonetic input               •

Control speaker volume, speech rate, and speech pitch          •

.Natural sounding speech output in both male & female voices           •

Editor for personal exception dictionary       •

Sakhr SDK, is a Windows based SW package for application developers and             •

Supports Arabic Language as default and automatically handles Latin characters as           •
English text. Any SAPI compliant English, German, Spanish, French, Italian etc. can replace
.the built in English TTS

Hardware Availability

Sakhr has ported its text-to-speech, APIs and application development tools to a vast range
of hardware solutions. Sakhr supports for example, a full system on Intel Pentium III or
higher machines, Computer Telephony (CT) boards, Audiotext platforms

To top