Fal Speech-to-Text — Free AI

Free.ai ~500 tokens/msg

All Models

Fal Speech-to-Text

Hi! I'm Fal Speech-to-Text. Ask me anything.

Fal Speech-to-Text requires purchased tokens. Get Tokens | Sign Up — 10K Free | Use Free Model

Model Details

Hosted on Free.ai

Category Misc

Cost ~500 tokens/msg

About

Fal Speech-to-Text is a misc model by Free.ai, available on Free.ai. Each generation costs approximately 54,000 tokens. Compare against our self-hosted models which run free within your daily limit.

Use via API

curl https://api.free.ai/v1/chat/ \

                      -H "Authorization: Bearer YOUR_KEY" \

                      -d '{"model":"premium/speech-to-text"}'

API Docs

Compare

FAQ

Free.ai offers Whisper-powered speech to text with excellent accuracy, 99 languages, subtitle export, speaker detection, and live mic capture — completely free.

Upload an audio or video file (MP3, WAV, MP4, M4A), click Transcribe, and get accurate speech to text in seconds. Or record live from your microphone.

Yes. Paste any YouTube URL in the URL tab and the speech to text tool extracts the audio and converts it. Works with Instagram, TikTok, Spotify, and 1,300+ platforms.

Yes. Auto-detect or select from 99 languages. Our speech to text handles accents, background noise, and mixed-language audio well.

Yes. Select multiple audio files at once — each is sent through speech to text with progress tracking and the results are downloadable separately or combined.

Yes. The speech to text API at /api/ is OpenAI-compatible. Upload audio programmatically and receive JSON with the transcript, language, and timestamps.

Yes. Toggle Speaker Detection before uploading and the speech to text output is labelled per speaker (Speaker 1, Speaker 2…). Adds 50% to token cost.

Speech to text accepts files up to 500MB per upload. For multi-hour content, split the audio into chunks first.

Very accurate for clear audio — typically 95%+ word accuracy in English with our Whisper large-v3 backend. Quality depends on audio clarity, accent, and background noise.

Yes. The transcript is fully editable in-place. Fix errors, reformat, and copy/download as TXT, SRT, or VTT.

Yes. Audio is processed on our own GPUs and deleted after speech to text completes. Nothing is stored long-term, shared, or used for training.

Yes. Upload an audio or video file in /chat/ and ask the AI to transcribe it — combine speech to text with follow-up questions and summarization in one workflow.

Model Details

About

Use via API

Compare

FAQ

Get 10,000 Free Tokens

Wait — Get 10K Free Tokens!

Want more?