Skip to content

SSML Generator

Build Speech Synthesis Markup Language for text-to-speech applications

0 characters
0 characters

Insert SSML Tags

Timing & Structure
Emphasis & Prosody
Pronunciation
Audio & Voice
Amazon Polly Specific

Options

Quick Examples

SSML Tag Reference

Tag Purpose Example
<break> Insert pause <break time="1s"/>
<emphasis> Stress words <emphasis level="strong">now</emphasis>
<prosody> Rate, pitch, volume <prosody rate="fast">hurry</prosody>
<say-as> Interpret type <say-as interpret-as="date">2024-01-15</say-as>
<phoneme> Pronunciation <phoneme ph="təˈmeɪtoʊ">tomato</phoneme>
<sub> Substitution <sub alias="World Wide Web">WWW</sub>

Say-As Interpret Types

characters ABC → A-B-C
cardinal 123 → one twenty-three
ordinal 1 → first
digits 123 → 1-2-3
date 2024-01-15
time 14:30
telephone +1-555-1234
currency $49.99

Complete Guide to Speech Synthesis Markup Language

Speech Synthesis Markup Language (SSML) gives developers precise control over text-to-speech output, transforming robotic readings into natural, expressive speech. Our free SSML generator helps voice developers, Alexa skill builders, and content creators construct valid SSML markup without memorizing tag syntax, supporting all major TTS platforms including Amazon Polly, Google Cloud TTS, and Microsoft Azure.

Why SSML Matters for Voice Applications

Plain text fed to TTS engines often sounds unnatural. Abbreviations get mispronounced. Numbers read as cardinals when they should be ordinals. Emphasis falls on wrong syllables. SSML solves these problems by providing explicit pronunciation, pacing, and emotional instructions. For voice assistants, IVR systems, audiobook production, and accessibility applications, SSML is essential for professional-quality speech output.

The Break Tag: Controlling Pauses

Natural speech contains strategic pauses for comprehension and emphasis. The <break> tag inserts pauses of specified duration (time="500ms" or time="2s") or strength (x-weak, weak, medium, strong, x-strong). Use short breaks between list items, medium breaks between sentences, and strong breaks between topics. Pauses also help listeners process complex information before continuing.

Prosody: Rate, Pitch, and Volume

The <prosody> tag controls three fundamental speech attributes. Rate adjusts speaking speed with values like "x-slow", "slow", "medium", "fast", "x-fast" or percentages (80%, 120%). Pitch modifies voice frequency from "x-low" to "x-high". Volume ranges from "silent" through "x-loud". Combine attributes for effects: slow rate with low pitch creates gravity, while fast rate with high pitch suggests excitement.

Say-As: Interpreting Special Content

The <say-as> tag tells TTS engines how to interpret ambiguous content. "characters" spells out letters (FBI becomes F-B-I). "cardinal" and "ordinal" control number pronunciation. "date" handles various date formats. "telephone" reads phone numbers naturally. "currency" pronounces money correctly ($5.99 as "five dollars and ninety-nine cents"). Without say-as, engines guess interpretation—often incorrectly.

Pronunciation with Phoneme and Sub

Names, foreign words, and technical terms often mispronounce. The <phoneme> tag provides IPA (International Phonetic Alphabet) or platform-specific pronunciation. The simpler <sub> tag substitutes displayed text with spoken alternatives: <sub alias="doctor">Dr.</sub> ensures correct pronunciation regardless of context. Use sub for abbreviations, acronyms, and brand names with unusual pronunciations.

Emphasis for Natural Expression

The <emphasis> tag adds stress to words, mimicking how humans emphasize important information. Levels include "reduced" (less emphasis than surrounding text), "moderate" (default), and "strong" (significant stress). Strategic emphasis improves comprehension: "The meeting is TOMORROW" conveys urgency that plain text cannot. Overusing emphasis, however, sounds unnatural—apply selectively.

Amazon Polly-Specific Features

Amazon Polly extends standard SSML with proprietary tags. The <amazon:effect> tag applies whispered speech or DRC (Dynamic Range Compression) for better audio on small speakers. The <amazon:breath> tag inserts realistic breathing sounds. The <amazon:domain> tag switches between conversational, news, and long-form reading styles. These features create remarkably natural Alexa skill responses.

Platform Compatibility Considerations

While SSML is a W3C standard, implementations vary between platforms. Amazon Polly, Google Cloud TTS, Microsoft Azure, and IBM Watson each support different tag subsets and may interpret attributes differently. Always test SSML on your target platform. Start with widely-supported tags (break, prosody, say-as, emphasis) before using platform-specific extensions that may not transfer.

Best Practices for SSML Development

Build SSML incrementally, testing after each modification. Use paragraph and sentence tags to structure content logically. Keep prosody changes subtle—extreme rate or pitch values sound robotic. Test with multiple voices since SSML effects vary between voice models. Consider your audience: accessibility users may need slower rates, while entertainment applications might use dramatic prosody variations.

Common SSML Mistakes to Avoid

Malformed XML breaks SSML parsing entirely—ensure all tags close properly. Excessive breaks create awkward stuttering. Conflicting prosody settings (wrapping slow in fast) produce unpredictable results. Using phoneme incorrectly makes pronunciation worse. Forgetting the root <speak> tag causes failures on many platforms. This generator helps avoid syntax errors through structured tag insertion.

Frequently Asked Questions