5 books on Speech Generation [PDF]
Like
27
These books describe speech synthesis technologies, i.e. how to convert text into natural-sounding speech using AI models, how to train AI to reproduce intonations, stresses, emotions and accents. They also talk about the use of TTS (Text-to-Speech) in voice assistants, audio books, call centers.
1. Generative AI: Techniques, Models and Applications
2025 by Rajan Gupta, Sanju Tiwari, Poonam Chaudhary

This book is like a giant GPT-generated list with numbering like 6.3.4.5.2, so after reading the table of contents, you will have practically studied the entire content of the book. In general, the book is universal - about generative AI, but I was most interested in the part about end-to-end TTS models - these are models that convert text directly into audio-speech without intermediate stages such as phonemicization, manual spectrogram construction or separate prosody modeling. This allows you to do voice the text in real time. In conventional TTS systems text is first analyzed, converted into phonemes, undergoes stress and pause marking. Then mel-spectrogram is generated basing on linguistic information and then the spectrogram is converted into an audio signal. Popular end-to-end models such as Tacotron, FastSpeech, VITS - generate mel-spectrograms on the fly.
Download PDF
2. Progress in Speech Synthesis
2013 by Jan P.H. van Santen, Richard Sproat, Joseph Olive, Julia Hirschberg

This book describes recent advancements in text-to-speech synthesis and is based on global research contributions. In particular it explores how signal processing and source modeling are improving synthesized speech quality while prosodic analysis and synthesis which play a significant role in natural-sounding speech. Besides, visual speech synthesis, involving lip movements, enhances human-computer interaction. Perception evaluation methods are also essential to measure the success of speech synthesis systems.
Download PDF
3. An Introduction to Text-to-Speech Synthesis
2013 by Thierry Dutoit

The book approaches speech synthesis from both linguistic and engineering perspectives. From the first side it requires careful coordination of language modeling and signal processing, from the other - natural language processing and concatenative synthesis, used for digital signal processing. The conclusion is that integration of phonetics and speech communication is vital for creating natural-sounding speech.
Download PDF
4. Text-to-Speech Synthesis
2009 by Paul Taylor

This author also agrees that speech synthesis combines linguistics, phonetics and signal processing. Thus, this field bridges computer science, linguistics and electrical engineering in practical applications. Traditional techniques like format synthesis are still relevant for understanding speech production, but unit selection and hidden Markov models are more modern methods in speech synthesis. You'll also learn why statistical text analysis is integral to creating realistic text-to-speech conversions.
Download PDF
5. Speech Synthesis and Recognition
2002 by Wendy Holmes

This book was written when speech technology was becoming increasingly important for human-machine communication and applicable across engineering and IT disciplines. It claims that it can be understood without mathematical expertise: "No advanced knowledge of phonetics or speech signal properties is required to grasp the field's basics". The book also provides a practical guide for professionals looking to incorporate speech technology into their systems.
Download PDF
How to download PDF:
1. Install Gooreader
2. Enter Book ID to the search box and press Enter
3. Click "Download Book" icon and select PDF*
* - note that for yellow books only preview pages are downloaded
1. Generative AI: Techniques, Models and Applications
2025 by Rajan Gupta, Sanju Tiwari, Poonam Chaudhary

This book is like a giant GPT-generated list with numbering like 6.3.4.5.2, so after reading the table of contents, you will have practically studied the entire content of the book. In general, the book is universal - about generative AI, but I was most interested in the part about end-to-end TTS models - these are models that convert text directly into audio-speech without intermediate stages such as phonemicization, manual spectrogram construction or separate prosody modeling. This allows you to do voice the text in real time. In conventional TTS systems text is first analyzed, converted into phonemes, undergoes stress and pause marking. Then mel-spectrogram is generated basing on linguistic information and then the spectrogram is converted into an audio signal. Popular end-to-end models such as Tacotron, FastSpeech, VITS - generate mel-spectrograms on the fly.
Download PDF
2. Progress in Speech Synthesis
2013 by Jan P.H. van Santen, Richard Sproat, Joseph Olive, Julia Hirschberg

This book describes recent advancements in text-to-speech synthesis and is based on global research contributions. In particular it explores how signal processing and source modeling are improving synthesized speech quality while prosodic analysis and synthesis which play a significant role in natural-sounding speech. Besides, visual speech synthesis, involving lip movements, enhances human-computer interaction. Perception evaluation methods are also essential to measure the success of speech synthesis systems.
Download PDF
3. An Introduction to Text-to-Speech Synthesis
2013 by Thierry Dutoit

The book approaches speech synthesis from both linguistic and engineering perspectives. From the first side it requires careful coordination of language modeling and signal processing, from the other - natural language processing and concatenative synthesis, used for digital signal processing. The conclusion is that integration of phonetics and speech communication is vital for creating natural-sounding speech.
Download PDF
4. Text-to-Speech Synthesis
2009 by Paul Taylor

This author also agrees that speech synthesis combines linguistics, phonetics and signal processing. Thus, this field bridges computer science, linguistics and electrical engineering in practical applications. Traditional techniques like format synthesis are still relevant for understanding speech production, but unit selection and hidden Markov models are more modern methods in speech synthesis. You'll also learn why statistical text analysis is integral to creating realistic text-to-speech conversions.
Download PDF
5. Speech Synthesis and Recognition
2002 by Wendy Holmes

This book was written when speech technology was becoming increasingly important for human-machine communication and applicable across engineering and IT disciplines. It claims that it can be understood without mathematical expertise: "No advanced knowledge of phonetics or speech signal properties is required to grasp the field's basics". The book also provides a practical guide for professionals looking to incorporate speech technology into their systems.
Download PDF
How to download PDF:
1. Install Gooreader
2. Enter Book ID to the search box and press Enter
3. Click "Download Book" icon and select PDF*
* - note that for yellow books only preview pages are downloaded


