Our Journey with the Token Announcement System Audio Implementation
When we first started building our token system for the clinic's front desk, we knew one thing for sure—it had to be simple, reliable, and sound professional. At first, we were tempted to use the free Google Translate voice service.For a clinic where clear communication is key—especially when patients are waiting to hear their token number—we couldn't take that chance.
That’s when we decided to go with the official Google Cloud Text-to-Speech API. Yes, it meant a bit of extra setup and a bit costly, but it was 100% worth it.
Why?
In the end, it’s not just about technology. It’s about making sure our clinic feels organized, welcoming, and trustworthy—and that starts with something as simple as a token being called out right, every time.
![]() |
A Comparative TTS implementation between Google Translate vs Google Cloud TTS API |
Google Translate TTS vs. Google Cloud Text-to-Speech in Apps Script Web Apps
When adding dynamic audio announcements to a Google Apps Script web application, such as our token queue system, developers often face a crucial decision: leverage the informal, free Google Translate Text-to-Speech (TTS) service, or opt for the robust, official, and paid Google Cloud Text-to-Speech API.
Each has its distinct advantages and disadvantages, making the "best" choice highly dependent on your project's specific requirements.
In our token system, we initially explored the unofficial Google Translate TTS for simplicity, but for a more reliable and production-ready solution, the Google Cloud TTS API would be the preferred route. Let's delve into a detailed comparison to understand why, and help you decide which path to take for your own projects.
The "Free" Option: Google Translate TTS (The Unofficial Link)
Many developers, when first encountering the need for TTS in Apps Script, stumble upon the unofficial Google Translate TTS service. This isn't a documented API, but rather a public-facing URL endpoint used by the Google Translate website itself to play translations.
How it Works:
You can construct a URL that, when accessed, returns an MP3 audio file of the translated text.
https://translate.google.com/translate_tts?ie=UTF-8&q=YOUR_TEXT&tl=en&client=tw-ob
q
: The text you want to convert to speech.tl
: The target language (e.g.,en
for English,hi
for Hindi).client=tw-ob
: A parameter often found to make the request work.ie=UTF-8
: Encoding.
In Apps Script or Backend Server, you'd use UrlFetchApp.fetch()
to get this audio, then encode it to Base64 to send to your frontend, or save it to Google Drive.
Why We Might Have Used It (Initial Exploration / Proof of Concept):
Cost-Free: For personal projects, small internal tools, or quick prototypes, the most appealing factor is zero cost.
Simplicity: No API key setup, no Google Cloud Project configuration. Just construct a URL and fetch.
Rapid Prototyping: Get a voice working very quickly to validate an idea.
Limitations and Why It's Generally NOT Recommended for Production:
- Unofficial & Unstable: This is the biggest drawback. Since it's not a public API, Google can change or block this endpoint at any time without notice, breaking your application.
Rate Limits: While not explicitly documented, frequent or high-volume requests can lead to temporary IP blocking or errors.
Limited Customization: You have no control over voice type (male/female), pitch, speaking rate, or more advanced features like SSML (Speech Synthesis Markup Language) for fine-tuning pronunciation.
Quality: The audio quality is generally acceptable but might not be as natural or high-fidelity as professional TTS services.
Legal/Commercial Use: Using an unofficial endpoint for commercial applications can be legally risky and is against Google's terms of service.
Error Handling: It's difficult to get meaningful error messages from this service beyond generic network errors.
Example (Apps Script - Code.gs
):
function getUnofficialTranslateAudio(textToSpeak, languageCode = 'en') {
const url = `https://translate.google.com/translate_tts?ie=UTF-8&q=${encodeURIComponent(textToSpeak)}&tl=${languageCode}&client=tw-ob`;
try {
const response = UrlFetchApp.fetch(url, { muteHttpExceptions: true });
if (response.getResponseCode() == 200) {
const audioBlob = response.getBlob();
// Convert blob to Base64 to send to client-side HTML
return Utilities.base64Encode(audioBlob.getBytes());
} else {
Logger.log(`Error fetching audio: ${response.getResponseCode()} - ${response.getContentText()}`);
throw new Error("Failed to fetch audio from unofficial TTS. Status: " + response.getResponseCode());
}
} catch (e) {
Logger.log("Exception in getUnofficialTranslateAudio: " + e.message);
throw new Error("Could not generate audio (unofficial): " + e.message);
}
}
// Example usage in your nextToken() function:
// const tokenNumber = "101";
// const audioData = {};
// audioData.nextTokenAudioLang1 = getUnofficialTranslateAudio(`Token Number ${tokenNumber}`, 'en');
// audioData.nextTokenAudioLang2 = getUnofficialTranslateAudio(`टोकन संख्या ${tokenNumber}`, 'hi');
// This data would then be returned to the client and played.
The Professional Choice: Google Cloud Text-to-Speech API
For any serious application, the Google Cloud Text-to-Speech API is the industry standard. It's a robust, feature-rich, and scalable service designed for developers. For Example like the Workflow image below
How it Works:
This involves:
Google Cloud Project Setup: Creating a project in Google Cloud Platform.
API Enablement: Enabling the Text-to-Speech API within that project.
Authentication: Setting up a Service Account and generating an API key, or using OAuth. (For Apps Script, a simple API key often suffices for basic web app calls).
Making a POST Request: Sending a JSON payload containing your text, voice selection, and audio configuration to the API endpoint.
Key Parameters in the JSON payload:
{
"input": {
"text": "Your text to convert."
},
"voice": {
"languageCode": "en-US",
"name": "en-US-Wavenet-F", // Specific voice, e.g., WaveNet female
"ssmlGender": "FEMALE"
},
"audioConfig": {
"audioEncoding": "MP3",
"speakingRate": 1.0,
"pitch": 0.0
}
}
Why We Would Use It (Production / Robust Solutions):
Reliability & Stability: This is an official Google Cloud product with Service Level Agreements (SLAs).
It's designed for continuous service. High Quality & Natural Voices: Offers standard voices and premium WaveNet voices, which are incredibly natural-sounding and human-like.
Extensive Customization:
Many Voices: A vast selection of voices across numerous languages and dialects, including different genders and accents.
SSML Support: Use Speech Synthesis Markup Language to control pauses, pronunciation, emphasis, speaking rate, pitch, and more, allowing for highly natural and expressive speech.
Audio Configuration: Control audio encoding, speaking rate, and pitch directly.
Scalability: Designed to handle high volumes of requests.
Error Handling: Provides structured error responses, making debugging and user feedback much more effective.
Commercial Use: Fully compliant with Google's terms for commercial applications.
Limitations:
Cost: It's a pay-as-you-go service. While the initial free tier is generous, large volumes of speech synthesis will incur costs.
Setup Complexity: Requires setting up a Google Cloud Project, enabling APIs, and handling authentication (though API keys are relatively straightforward for Apps Script web apps).
Apps Script Libraries: While direct
UrlFetchApp
calls work, integrating with more complex Cloud APIs might sometimes benefit from Apps Script libraries, adding another layer of setup.
Example
// Remember to enable the Cloud Text-to-Speech API in your Google Cloud Project
// and set up an API key linked to your Apps Script project.
const GOOGLE_CLOUD_TTS_API_KEY = 'YOUR_GOOGLE_CLOUD_API_KEY'; // Store this securely in PropertiesService in a real app
function getCloudTranslateAudio(textToSpeak, languageCode = 'en-US', voiceName = 'en-US-Wavenet-F', ssmlGender = 'FEMALE') {
const url = `https://texttospeech.googleapis.com/v1/text:synthesize?key=${GOOGLE_CLOUD_TTS_API_KEY}`;
const payload = {
input: {
text: textToSpeak
},
voice: {
languageCode: languageCode,
name: voiceName,
ssmlGender: ssmlGender
},
audioConfig: {
audioEncoding: "MP3" // Can also be LINEAR16, OGG_OPUS
}
};
const options = {
method: 'post',
contentType: 'application/json',
payload: JSON.stringify(payload),
muteHttpExceptions: true // Capture errors rather than throwing
};
try {
const response = UrlFetchApp.fetch(url, options);
const responseCode = response.getResponseCode();
const responseText = response.getContentText();
if (responseCode == 200) {
const data = JSON.parse(responseText);
return `data:audio/mp3;base64,${data.audioContent}`; // Return Base64 data URI
} else {
Logger.log(`Cloud TTS Error: ${responseCode} - ${responseText}`);
const errorData = JSON.parse(responseText);
throw new Error(`Cloud TTS API error: ${errorData.error.message || 'Unknown error'}`);
}
} catch (e) {
Logger.log("Exception in getCloudTranslateAudio: " + e.message);
throw new Error("Could not generate audio (Cloud TTS): " + e.message);
}
}
// Example usage in your nextToken() function:
// const tokenNumber = "101";
// const audioData = {};
// audioData.nextTokenAudioLang1 = getCloudTranslateAudio(`Token Number ${token01}`, 'en-US', 'en-US-Wavenet-F', 'FEMALE');
// audioData.nextTokenAudioLang2 = getCloudTranslateAudio(`टोकन संख्या ${tokenNumber}`, 'hi-IN', 'hi-IN-Wavenet-C', 'MALE'); // Assuming a Hindi WaveNet voice
// This data would then be returned to the client and played.
So Finally What we Concluded :
When to Use Which:
Choose Google Translate TTS (Unofficial) if:
You're building a personal side project or a very quick proof-of-concept.
Cost is the absolute highest priority, and reliability or quality are secondary concerns.
You don't need any customization beyond basic text-to-speech in a specific language.
You understand and accept the risk that your audio functionality might break without warning.
Choose Google Cloud Text-to-Speech API if:
You're building a production-ready application or a tool for a business.
Reliability, stability, and high-quality audio are critical.
You need specific voices, accents, or genders.
You require fine-grained control over speech synthesis using SSML (e.g., for complex pronunciations, emphasis, or pauses).
Your application will have moderate to high usage, where undocumented rate limits would be an issue.
You are comfortable with the Google Cloud Platform ecosystem and potential associated costs.
You plan to integrate with other Google Cloud services in the future.
It was again a roller costar Ride, Learning and experimenting the Future of Technologies with Cloud based interfaces like the Text to Speech API , that is worth for us to Write and help the developer of Web Apps to get some insight on how we faced the issues and what are the solution we had filtered out for our Web Apps