The Future of AI
Artificial intelligence is evolving rapidly...
More text here specifically for the test...
💡 Tip: Highlight any text on this page to listen to it.
🗈 Note: The amount of characters are usually limited usually 3000.
User: You said
Here I would like to learn all about speech generation in browser in just html css js.
Gemini: You can generate speech in the browser natively using the Web Speech API. It requires no external libraries and is built directly into Chrome, Firefox, Safari, and Edge.
Here is a complete guide to understanding and building it with just HTML, CSS, and JS.
The system relies on two main JavaScript objects:
SpeechSynthesisUtterance: The “request.” It contains what you want to say, the language, pitch, and speed.
window.speechSynthesis: The “controller.” It receives the request and makes the browser speak it.
If you just want to verify it works, open your browser console (F12) and type this:
JavaScript let msg = new SpeechSynthesisUtterance(“Hello, I am your browser speaking.”); window.speechSynthesis.speak(msg);
Here is a full application. It includes a text area, voice selector, and sliders for speed and pitch.
HTML (index.html) HTML <!DOCTYPE html>
CSS (style.css) CSS body { font-family: sans-serif; background: #f0f2f5; display: flex; justify-content: center; align-items: center; height: 100vh; margin: 0; }
.speech-box { background: white; padding: 2rem; border-radius: 12px; box-shadow: 0 4px 15px rgba(0,0,0,0.1); width: 400px; }
textarea { width: 100%; height: 120px; margin-bottom: 1rem; padding: 10px; border: 1px solid #ccc; border-radius: 6px; font-family: inherit; box-sizing: border-box; /* Ensures padding doesn’t break width */ }
.settings label { display: block; margin-bottom: 0.5rem; font-weight: bold; font-size: 0.9rem; }
select, input[type=”range”] { width: 100%; margin-bottom: 1rem; }
.buttons { display: flex; gap: 10px; }
button { flex: 1; padding: 12px; background: #007bff; color: white; border: none; border-radius: 6px; cursor: pointer; font-size: 1rem; }
button:hover { background: #0056b3; } button.secondary { background: #6c757d; } button.secondary:hover { background: #545b62; }
JavaScript (script.js)
This script handles the asynchronous loading of voices and the actual speaking.
JavaScript // Init SpeechSynth API const synth = window.speechSynthesis;
// DOM Elements const textInput = document.getElementById(‘text-input’); const voiceSelect = document.getElementById(‘voice-select’); const rateInput = document.getElementById(‘rate’); const pitchInput = document.getElementById(‘pitch’); const rateVal = document.getElementById(‘rate-val’); const pitchVal = document.getElementById(‘pitch-val’);
// Variables let voices = [];
function populateVoices() { // getVoices() retrieves all available voices on your OS/Browser voices = synth.getVoices();
// Reset the dropdown
voiceSelect.innerHTML = '';
// Create an option for every voice found
voices.forEach((voice) => {
const option = document.createElement('option');
// Display name and language (e.g. "Google US English (en-US)")
option.textContent = `${voice.name} (${voice.lang})`;
// Add specific data attributes so we can find it later
option.setAttribute('data-name', voice.name);
option.setAttribute('data-lang', voice.lang);
voiceSelect.appendChild(option);
}); }
// ⚠️ IMPORTANT: Voices are loaded asynchronously. // We must wait for the ‘voiceschanged’ event before populating the list. populateVoices(); if (speechSynthesis.onvoiceschanged !== undefined) { speechSynthesis.onvoiceschanged = populateVoices; }
// Update slider value displays rateInput.addEventListener(‘input’, () => rateVal.textContent = rateInput.value); pitchInput.addEventListener(‘input’, () => pitchVal.textContent = pitchInput.value);
// The Speak Function function speak() { // If it’s already speaking, stop it first to restart if (synth.speaking) { synth.cancel(); }
if (textInput.value === '') return;
// 1. Create the Utterance
const utterThis = new SpeechSynthesisUtterance(textInput.value);
// 2. Find the selected voice object in our voices array
const selectedOption = voiceSelect.selectedOptions[0].getAttribute('data-name');
const selectedVoice = voices.find(v => v.name === selectedOption);
// 3. Apply properties
utterThis.voice = selectedVoice;
utterThis.pitch = pitchInput.value;
utterThis.rate = rateInput.value;
// 4. Speak
synth.speak(utterThis); }
// Event Listeners for Buttons document.getElementById(‘speak-btn’).addEventListener(‘click’, speak); document.getElementById(‘stop-btn’).addEventListener(‘click’, () => synth.cancel());
The voiceschanged Event:
When a page loads, window.speechSynthesis.getVoices() often returns an empty array [] initially.
Browsers load voices in the background. You must listen for the onvoiceschanged event (as shown in the JS above) to know when the list is actually ready.
Browser & OS Differences:
Chrome usually provides “Google” voices (cloud-enhanced) which sound very natural.
Safari/iOS uses the built-in Apple voices.
Offline vs Online: Some voices (like Chrome’s high-quality ones) may require an internet connection. Native OS voices usually work offline.
Mobile Restrictions:
On mobile browsers (iOS/Android), speech must be triggered by a user action (like a click or tap). You generally cannot make the webpage “speak” automatically the moment it loads; the browser will block it to prevent annoyance.
Character Limits:
If you try to speak a massive chunk of text (thousands of words), the browser may time out or stop halfway. For long text, it is best practice to split the text by periods (.) and speak one sentence at a time.
Would you like to see how to implement the “highlight words as they are spoken” feature using the onboundary event?
User: You said
Before that in getVoices(), are there other ways to get more voices?
Gemini: The short answer is no, you cannot add voices using just JavaScript code.
The getVoices() method is read-only. It simply lists whatever voices are already installed on the user’s device or provided by the browser itself. You cannot “download” or “inject” a new voice file (like a font) via your website code.
However, you can get more voices to appear in that list by changing the environment (the OS or Browser) settings. Here is how the voice ecosystem works and how to expand it.
When you call getVoices(), the browser aggregates voices from two sources:
The Operating System (Offline): Windows, macOS, Android, and iOS all have system-level text-to-speech voices installed.
The Browser (Online/Cloud): Some browsers (like Chrome and Edge) inject their own high-quality, server-side voices into the list if you have an internet connection.
Since you cannot add them via code, the user of your computer/phone must install them. Once installed on the OS, your JavaScript will automatically detect them.
On Windows
Windows limits the “high quality” voices available to the browser by default, but you can add language packs.
Go to Settings > Time & Language > Speech.
Under “Manage voices,” click Add voices.
Select a new language (e.g., “English (UK)” or “Japanese”).
Restart the browser. The new voices will appear in your JS getVoices() list.
On macOS
Macs usually have excellent voices (like “Samantha” or “Daniel”), but you can download even higher-quality “Enhanced” versions.
Go to System Settings > Accessibility > Spoken Content.
Click on the “System Voice” dropdown and select Manage Voices….
Find a voice you like and click the download icon (cloud arrow).
Restart the browser.
On Android / iOS
Android: Go to Settings > Accessibility > Text-to-speech output. You can often install the “Google Speech Services” pack or Samsung’s voice pack.
iOS: Go to Settings > Accessibility > Spoken Content > Voices. Download “Enhanced” versions of voices like Siri or others.
If you are testing your app and want to hear better voices immediately without installing anything, try these browsers:
Microsoft Edge: Edge has the absolute best native TTS support right now. It injects “Microsoft Online” voices (Natural) into getVoices(). These sound almost human (AI-generated) and are free to use within the browser.
Google Chrome: Chrome usually adds “Google US English”, “Google Deutsch”, etc. These are better than standard OS voices but generally robotic compared to Edge’s “Natural” voices.
If you strictly need a specific, high-quality voice (like the TikTok voice or a celebrity voice) for all users regardless of their device, the native window.speechSynthesis API cannot do it.
You would have to use an external API (like Google Cloud TTS, Amazon Polly, or OpenAI), send the text to their server, and receive an MP3 file back to play using standard HTML Audio:
. This, however, stops being “just HTML/CSS/JS” and requires API keys and usually a backend.
Next Step: Would you like to see the code for the “karaoke style” highlighter (highlighting words as they are spoken), or would you prefer to explore how to save the speech to an audio file?
User: You said
Not yet, one more thing, how about on Linux?
Gemini: Linux is the most “DIY” of all the platforms when it comes to speech. Unlike Windows or macOS, which have a single, unified voice engine, Linux browsers typically communicate with a system service called Speech Dispatcher (speech-dispatcher).
If your browser cannot find voices, or if they sound very robotic (like a Dalek from 1980), it is usually because you are using the default eSpeak engine.
Here is how to improve the situation on Linux (specifically Ubuntu/Debian, but the concepts apply to Arch/Fedora too).
Your browser does not talk to the audio driver directly. The chain looks like this: Browser (JS) -> Speech Dispatcher (Linux System) -> Synthesizer Engine (eSpeak/Festival) -> Speakers
To get more voices, you must install new Synthesizer Engines and tell Speech Dispatcher to use them.
Festival provides voices that are fuller and less “buzzy” than eSpeak, though still clearly synthetic.
Open your terminal.
Install the engine and the speech dispatcher module:
Bash sudo apt update sudo apt install festival speech-dispatcher-festival
Restart your browser. You should see new voices (often named english-us, english_rp, etc.) in your getVoices() list.
Option B: eSpeak NG (More languages, still robotic)
If you need specific language support rather than audio quality, upgrade to the “Next Generation” eSpeak.
Bash sudo apt install espeak-ng
Option C: Google Voices (The Chrome “Cheat”)
If you use Google Chrome on Linux, you don’t actually need to install anything. Chrome on Linux usually bypasses the OS system for its own voices and provides the “Google US English”, “Google UK English”, etc., directly from the binary.
Firefox on Linux is stricter; it relies 100% on what you have installed in the OS (Speech Dispatcher).
On Linux, simply installing the package isn’t always enough. You sometimes have to edit the Speech Dispatcher config file to enable the new module.
Open the config file:
Bash sudo nano /etc/speech-dispatcher/speechd.conf
Scroll down to the AddModule section.
Make sure the module you want is uncommented (remove the # at the start of the line).
Example: Ensure AddModule “festival” “sd_festival” “festival.conf” does not have a # in front of it.
Restart the dispatcher:
Bash systemctl restart speech-dispatcher
Restart your browser.
Best Quality: Use Google Chrome. It brings its own high-quality voices and ignores the messy Linux subsystem.
Most Open: Use Firefox + Festival (or PicoTTS if you can find the package for your distro).
Default: almost always eSpeak, which sounds very harsh and robotic.
Would you like to move on to the “Karaoke” text highlighter now, or do you have a specific Linux distribution you are trying to configure?
User: You said
Back to the html css js, is possible to download the generated speech as a sound file?
Gemini: The short answer is no, the native Web Speech API cannot do this directly.
The window.speechSynthesis API is designed as a “fire-and-forget” player. It sends the text directly to your device’s audio driver (speakers), completely bypassing the browser’s own audio processing engine. Because of this, JavaScript never “touches” the audio data, so you cannot save it to a file or manipulate it.
However, there is a clever JavaScript workaround.
You can use the Screen Capture API (getDisplayMedia) to record the audio of your own tab while the speech is playing. It’s a bit of a hack, but it is the only way to do it with pure front-end code and no server.
Here is how to build a “Speech Recorder”.
The “Tab Recording” Workaround
This method works by asking the user to “Share this Tab” (with audio), recording the stream while the browser speaks, and then saving that recording as a file.
HTML
Add a record button to your existing HTML:
HTML
JavaScript
Add this logic to your script. It uses MediaRecorder to capture the tab’s audio output.
JavaScript const recordBtn = document.getElementById(‘record-btn’); const audioPlayer = document.getElementById(‘audio-player’);
recordBtn.addEventListener(‘click’, async () => { // 1. Check if text exists const text = document.getElementById(‘text-input’).value; if (!text) return alert(“Please type some text first!”);
try {
// 2. Ask user to select the CURRENT TAB to record
// ⚠️ IMPORTANT: User MUST check "Share tab audio" in the popup!
const stream = await navigator.mediaDevices.getDisplayMedia({
video: true, // Video is required to get tab audio, even if we don't use it
audio: true,
preferCurrentTab: true, // Encourages browser to offer current tab
selfBrowserSurface: "include"
});
// 3. Create the MediaRecorder
const mediaRecorder = new MediaRecorder(stream);
const audioChunks = [];
mediaRecorder.ondataavailable = event => {
audioChunks.push(event.data);
};
mediaRecorder.onstop = () => {
// 6. Create Audio Blob (WebM or MP4 depending on browser)
const audioBlob = new Blob(audioChunks, { type: 'audio/webm' });
const audioUrl = URL.createObjectURL(audioBlob);
// Play it back or Download it
audioPlayer.src = audioUrl;
audioPlayer.style.display = "block";
// Trigger auto-download
const link = document.createElement('a');
link.href = audioUrl;
link.download = 'speech-recording.webm';
link.click();
// Stop all tracks to turn off the "sharing" indicator
stream.getTracks().forEach(track => track.stop());
};
// 4. Start Recording and Speaking
mediaRecorder.start();
const utterance = new SpeechSynthesisUtterance(text);
// Match the voice/settings from your UI
const selectedOption = document.getElementById('voice-select').selectedOptions[0].getAttribute('data-name');
utterance.voice = voices.find(v => v.name === selectedOption);
utterance.rate = document.getElementById('rate').value;
utterance.pitch = document.getElementById('pitch').value;
// 5. When speech finishes, stop recording
utterance.onend = () => {
// Add a tiny delay to catch the very end of the audio
setTimeout(() => {
mediaRecorder.stop();
}, 500);
};
window.speechSynthesis.speak(utterance);
} catch (err) {
console.error("Error recording:", err);
alert("You must select 'Share Tab Audio' for this to work.");
} });
Critical Limitations of this Method
User Friction: The user must manually select their browser tab in the popup window and tick the box that says “Also share tab audio”. You cannot automate this permissions step for security reasons.
Format: The resulting file is usually a .webm video file (with a black screen and audio track). You can rename it to .weba or use a converter library (like ffmpeg.wasm) to convert it to MP3, but .webm plays in most browsers.
Noise: This records all audio from the tab. If you have a YouTube video playing in the background of the same tab, it will record that too.
The “Clean” Alternative (External Libraries)
If you want to generate a clean MP3 file without the clumsy “screen share” popup, you must stop using the browser’s native speechSynthesis and use a JavaScript library that generates the sound itself.
meSpeak.js / speak.js: These are ancient libraries (ports of eSpeak). They sound robotic, but they generate audio data purely in variables, allowing you to instantly export a WAV file without recording anything.
APIs (Google/Amazon/OpenAI): You send text to them, they send back an MP3 file. This is how professional sites do it, but it requires an internet connection and usually a paid key.
Would you like to try the “Karaoke” text highlighter now to wrap up the project, or do you want to stick with the recording concept?
User: You said
Moving on to another project. Supposed I have text contents on my website. What is your recommended way to add speaker button on my texts? Consider the size of the text as well. My website structure have js separated as libraries.
Gemini: Since you want to keep your JS separated as a library, the best architectural approach is a Data-Attribute Driver.
This allows you to create a generic tts-manager.js file (your library) that you include once. Then, you can turn any text on your site into speech just by adding a button with a data-target attribute in your HTML.
Here is the robust, production-ready way to do it.
You mentioned “consider the size of the text.” This is critical because browsers have a bug (especially Chrome) where speechSynthesis times out and stops abruptly if an utterance is longer than ~15 seconds.
Wrong Way: speak(wholeArticleText) -> Stops randomly.
Recommended Way: “Chunking.” Your library should automatically split long text into sentences and feed them to the browser one by one. This ensures stable playback for infinite length.
You simply place a button and point it to the container you want to read using data-target.
HTML
"Simplicity is the ultimate sophistication."
Artificial intelligence is evolving rapidly...
More text here specifically for the test...
Save this as a standalone file. It handles button toggling (Play/Stop), icon switching, and the critical text splitting for long content.
JavaScript class TTSManager { constructor() { this.synth = window.speechSynthesis; this.currentUtterance = null; this.isSpeaking = false; this.activeButton = null; this.chunks = []; // To store split sentences this.chunkIndex = 0;
// Bind global click listener for any .tts-btn
document.addEventListener('click', (e) => {
const btn = e.target.closest('.tts-btn');
if (btn) this.handleButtonClick(btn);
});
}
handleButtonClick(btn) {
// If clicking the same button that is currently playing: STOP
if (this.activeButton === btn && this.isSpeaking) {
this.stop();
return;
}
// If clicking a different button while one is playing: STOP current, START new
if (this.isSpeaking) {
this.stop();
}
this.start(btn);
}
start(btn) {
const targetSelector = btn.getAttribute('data-target');
const targetEl = document.querySelector(targetSelector);
if (!targetEl) return console.error('TTS: Target not found');
this.activeButton = btn;
this.isSpeaking = true;
this.updateButtonUI(btn, 'stop');
// 1. Get text and clean it up
const text = targetEl.innerText;
// 2. SMART CHUNKING: Split by punctuation (. ! ? \n) to avoid browser timeout limits
// This regex splits but keeps the punctuation attached to the sentence
this.chunks = text.match(/[^.!?\n]+[.!?\n]+/g) || [text];
this.chunkIndex = 0;
this.speakNextChunk();
}
speakNextChunk() {
if (this.chunkIndex >= this.chunks.length) {
this.stop(); // Done
return;
}
const chunk = this.chunks[this.chunkIndex];
const utterance = new SpeechSynthesisUtterance(chunk);
// Optional: Configure voice/rate here
utterance.rate = 1;
utterance.onend = () => {
this.chunkIndex++;
if (this.isSpeaking) {
this.speakNextChunk();
}
};
utterance.onerror = (e) => {
console.error('TTS Error:', e);
this.stop();
};
this.synth.speak(utterance);
}
stop() {
this.isSpeaking = false;
this.synth.cancel();
if (this.activeButton) {
this.updateButtonUI(this.activeButton, 'play');
this.activeButton = null;
}
}
updateButtonUI(btn, state) {
// You can customize this to toggle classes or text
if (state === 'stop') {
btn.innerHTML = '⏹️ Stop';
btn.classList.add('playing');
} else {
// Restore original text or icon
btn.innerHTML = '🔊 Listen';
btn.classList.remove('playing');
}
} }
// Initialize on load document.addEventListener(‘DOMContentLoaded’, () => { window.ttsManager = new TTSManager(); });
Add visual feedback so the user knows which button is active.
CSS .tts-btn { cursor: pointer; border: 1px solid #ccc; background: #fff; padding: 5px 10px; border-radius: 4px; transition: all 0.2s; font-size: 0.9rem; display: inline-flex; align-items: center; gap: 5px; }
.tts-btn:hover { background: #f0f0f0; }
/* Active State */ .tts-btn.playing { background: #ffebee; border-color: #ef5350; color: #c62828; animation: pulse 1.5s infinite; }
@keyframes pulse { 0% { box-shadow: 0 0 0 0 rgba(239, 83, 80, 0.4); } 70% { box-shadow: 0 0 0 6px rgba(239, 83, 80, 0); } 100% { box-shadow: 0 0 0 0 rgba(239, 83, 80, 0); } }
Why this approach is “Recommended”?
Decoupled: The Javascript doesn’t care about your HTML structure. It only looks for data-target. You can redesign your page, move the text, or move the button, and it will still work as long as the ID matches.
Solves the “Long Text” Bug: By splitting the text into an array of sentences (this.chunks) and feeding them recursively (onend triggers the next chunk), you bypass the 15-second browser timeout. This makes it safe for 2,000-word blog posts.
One Controller: It uses a single class instance (ttsManager). If a user clicks “Play” on a second article while the first is reading, the manager knows to stop the first one automatically. No overlapping voices.
User: You said
Here I have to pu ids on each div of texts and audio buttons. Is this the simplest way or is there a way to automate this?
Gemini: Yes, manually matching IDs is tedious and error-prone. You can absolutely automate this.
The cleanest approach is the “Auto-Injector” pattern.
Instead of writing
Here is the “Zero-Config” solution.
You just add a class. No buttons, no IDs.
HTML
This is the first section of text. The button will appear automatically.
Here is another standalone paragraph that needs audio.
Update your TTSManager class to include an injectButtons() method. This script runs on page load, finds every .readable element, creates a button, and attaches the event listener directly.
JavaScript class TTSManager { constructor() { this.synth = window.speechSynthesis; this.isSpeaking = false; this.currentUtterance = null; this.activeButton = null;
// Auto-run the injector
this.injectButtons();
}
injectButtons() {
const targets = document.querySelectorAll('.readable');
targets.forEach((target, index) => {
// 1. Create the button
const btn = document.createElement('button');
btn.className = 'tts-btn';
btn.innerHTML = '🔊'; // Icon only to save space
btn.title = "Listen to this text";
// 2. Insert button into the container
// We prepend it so it appears at the start (top-right or top-left)
// You must set the parent to position: relative in CSS
target.prepend(btn);
// 3. Direct binding (No IDs needed!)
btn.addEventListener('click', (e) => {
// Prevent bubbling so clicking the button doesn't trigger other clicks
e.stopPropagation();
this.handleButtonClick(btn, target);
});
});
}
handleButtonClick(btn, targetEl) {
if (this.activeButton === btn && this.isSpeaking) {
this.stop();
return;
}
if (this.isSpeaking) this.stop();
this.start(btn, targetEl);
}
start(btn, targetEl) {
this.activeButton = btn;
this.isSpeaking = true;
this.updateButtonUI(btn, 'stop');
const text = targetEl.innerText;
// Simple chunking to prevent browser timeout (essential for long text)
// Splits by periods, exclamation, or question marks
const sentences = text.match(/[^.!?]+[.!?]+/g) || [text];
let i = 0;
const speakNext = () => {
if (i >= sentences.length || !this.isSpeaking) {
this.stop();
return;
}
const utterance = new SpeechSynthesisUtterance(sentences[i]);
utterance.onend = () => {
i++;
speakNext();
};
this.synth.speak(utterance);
};
speakNext();
}
stop() {
this.isSpeaking = false;
this.synth.cancel();
if (this.activeButton) {
this.updateButtonUI(this.activeButton, 'play');
this.activeButton = null;
}
}
updateButtonUI(btn, state) {
if (state === 'stop') {
btn.innerHTML = '⏹️'; // Stop icon
btn.classList.add('playing');
} else {
btn.innerHTML = '🔊'; // Speaker icon
btn.classList.remove('playing');
}
} }
// Initialize document.addEventListener(‘DOMContentLoaded’, () => new TTSManager());
Since we are injecting the button inside your text container, we need to make sure it floats nicely in the corner without breaking your text flow.
CSS /* 1. The container must be relative so the button positions correctly / .readable { position: relative; padding-right: 40px; / Make space for button so text doesn’t overlap */ }
/* 2. The Injected Button / .tts-btn { position: absolute; top: 0; right: 0; / Pins it to top-right corner */
background: transparent;
border: 1px solid #ddd;
border-radius: 50%;
width: 32px;
height: 32px;
cursor: pointer;
font-size: 16px;
display: flex;
align-items: center;
justify-content: center;
transition: all 0.2s;
opacity: 0.5; /* Subtle by default */ }
/* Make it pop when hovering over the text block */ .readable:hover .tts-btn { opacity: 1; background: #fff; box-shadow: 0 2px 5px rgba(0,0,0,0.1); }
.tts-btn:hover { background: #e3f2fd; transform: scale(1.1); }
.tts-btn.playing { background: #ffebee; border-color: red; opacity: 1; animation: pulse 1s infinite; }
@keyframes pulse { 0% { transform: scale(1); } 50% { transform: scale(1.1); } 100% { transform: scale(1); } }
Why this is better
No IDs: The script uses the DOM node reference directly (targetEl). You never need to invent names like id=”text-section-42”.
Scalable: You can have 100 different text blocks on one page. Just add the class, and the script handles the logic for all of them.
Clean UI: The CSS position: absolute ensures the button sits neatly in the corner of your paragraph or div, appearing unobtrusively.
Would you like to know how to add a “Global Floating Player” (like a sticky footer) instead of little buttons on every paragraph?
User: You said
Yes and explain what that is as well.
Gemini: A Global Floating Player is a persistent control bar—usually fixed to the bottom of the screen (like on Spotify or a podcast app)—that stays visible while the user scrolls.
Instead of hunting for the specific paragraph button to stop or pause the audio, the user has a central “remote control” always available on their screen.
How it works
The Trigger: We still inject small “Play” icons next to your .readable text (using the auto-injector method).
The Action: Clicking an icon doesn’t just play sound; it activates the Global Player at the bottom of the screen.
The Control: The user uses the Global Player to Pause, Resume, or Stop, regardless of where they scroll.
Here is the complete implementation.
You need your content (marked with .readable) and a separate hidden div for the player.
HTML
A global player improves user experience significantly because users often scroll away from the content they are listening to.
We use fixed positioning in CSS to keep the bar stuck to the bottom of the viewport.
We style the player to be “sticky” at the bottom and animate its entrance.
CSS /* The Floating Bar / #tts-player { position: fixed; bottom: 0; left: 0; width: 100%; background: #222; / Dark theme */ color: white; padding: 15px 20px; box-shadow: 0 -2px 10px rgba(0,0,0,0.2); display: flex; flex-direction: column; gap: 10px; transition: transform 0.3s ease-in-out; z-index: 1000; }
/* Hide it by pushing it off-screen */ .player-hidden { transform: translateY(100%); }
.player-content { display: flex; justify-content: space-between; align-items: center; }
.player-controls button { background: #444; border: none; color: white; padding: 8px 15px; border-radius: 4px; cursor: pointer; font-size: 1.2rem; margin-left: 10px; }
.player-controls button:hover { background: #666; }
/* Visual Progress Bar */ .progress-bar { width: 100%; height: 4px; background: #444; border-radius: 2px; overflow: hidden; }
#progress-fill { height: 100%; width: 0%; background: #4caf50; /* Green */ transition: width 0.3s linear; }
/* The injected triggers in the text */ .readable { position: relative; } .tts-trigger { float: right; cursor: pointer; background: none; border: 1px solid #ccc; border-radius: 50%; width: 24px; height: 24px; font-size: 12px; margin-left: 10px; opacity: 0.6; } .tts-trigger:hover { opacity: 1; transform: scale(1.1); }
This is a more advanced version of the previous script. It links the “Trigger Buttons” to the “Global Player.”
JavaScript class GlobalTTS { constructor() { this.synth = window.speechSynthesis; this.playerEl = document.getElementById(‘tts-player’); this.statusEl = document.getElementById(‘player-status’); this.progressEl = document.getElementById(‘progress-fill’);
this.chunks = [];
this.currentChunkIndex = 0;
this.isPaused = false;
// Auto-inject triggers
this.injectTriggers();
// Bind Player Controls
document.getElementById('player-stop').addEventListener('click', () => this.stop());
document.getElementById('player-pause').addEventListener('click', () => this.togglePause());
}
injectTriggers() {
const targets = document.querySelectorAll('.readable');
targets.forEach(target => {
const btn = document.createElement('button');
btn.className = 'tts-trigger';
btn.innerHTML = '🔊';
btn.title = "Read this section";
// Insert at the start of text
target.prepend(btn);
btn.addEventListener('click', (e) => {
e.stopPropagation();
this.playSection(target.innerText);
});
});
}
playSection(text) {
// 1. Reset current state
this.stop(false); // false = don't hide player yet
// 2. Prepare Chunks (Sentence splitting)
this.chunks = text.match(/[^.!?]+[.!?]+/g) || [text];
this.currentChunkIndex = 0;
// 3. Show Player
this.playerEl.classList.remove('player-hidden');
this.isPaused = false;
document.getElementById('player-pause').innerHTML = '⏸️'; // Show Pause icon
// 4. Start
this.speakNext();
}
speakNext() {
if (this.currentChunkIndex >= this.chunks.length) {
this.stop(); // Finished
return;
}
const chunk = this.chunks[this.currentChunkIndex];
const utterance = new SpeechSynthesisUtterance(chunk);
utterance.onend = () => {
if (this.isPaused) return; // Do not proceed if paused
this.currentChunkIndex++;
this.updateProgress();
this.speakNext();
};
// Update Text Status
this.statusEl.innerText = `Reading sentence ${this.currentChunkIndex + 1} of ${this.chunks.length}`;
this.synth.speak(utterance);
}
togglePause() {
if (this.isPaused) {
// RESUME
this.isPaused = false;
document.getElementById('player-pause').innerHTML = '⏸️';
this.speakNext(); // Resume logic
} else {
// PAUSE
this.isPaused = true;
this.synth.cancel(); // Stop current audio
document.getElementById('player-pause').innerHTML = '▶️';
this.statusEl.innerText = "Paused";
}
}
stop(hide = true) {
this.synth.cancel();
this.chunks = [];
this.currentChunkIndex = 0;
this.isPaused = false;
if (hide) {
this.playerEl.classList.add('player-hidden');
}
}
updateProgress() {
const pct = ((this.currentChunkIndex) / this.chunks.length) * 100;
this.progressEl.style.width = `${pct}%`;
} }
// Init document.addEventListener(‘DOMContentLoaded’, () => new GlobalTTS());
Why this architecture is powerful:
State Management: Notice how togglePause() works. The native speechSynthesis.pause() is notoriously buggy on some browsers (it can freeze the engine).
My Fix: I “fake” a pause by canceling the audio (synth.cancel()) but remembering the index (currentChunkIndex). When you click Resume, it simply starts speaking again from that specific sentence index. It is bulletproof.
User Focus: The user can start a long article, scroll down to look at a footer or comments, and still pause the audio instantly without scrolling back up to find the paragraph.
Visual Feedback: The simple green progress bar (#progress-fill) gives users a visual cue of how much is left.