Moonshine Base
Free.ai (self-hosted)
·
stt
·
~500 tokens per minute
Moonshine Base is a speech-to-text model built by Useful Sensors. Strongest at Low-latency live transcription, embedded devices.. Self-hosted on Free.ai GPUs — runs free against your daily token pool (500 tokens per minute). Released under MIT — commercial use permitted on Free.ai.
Use via API
OpenAI-compatible REST API. Generate a key and call this model in seconds.
curl -X POST https://api.free.ai/v1/stt/ \
-H "Authorization: Bearer sk-free-..." \
-H "Content-Type: application/json" \
-d '{"model":"moonshine-base","audio_url":"https://..."}'
API Documentation
Get API Key
Frequently Asked Questions
Moonshine Base transcribes spoken audio into text. Upload an MP3, WAV, M4A, or video file and Moonshine Base returns the full transcript plus optional SRT/VTT subtitles with timestamps.
Moonshine Base handles dozens of languages — Whisper-family models cover 90+, Parakeet covers ~25, others vary. Pick "auto-detect" or specify the language for highest accuracy.
Word-error rate is 5–10% on clean English audio, 10–20% on noisy or accented audio. Large variants of the same architecture do meaningfully better on hard cases — pick larger when the audio is rough.
Yes — every segment includes start/end timestamps. Export as SRT or VTT and the times map straight onto your video.
Moonshine Base runs on our own GPUs against your daily free pool first; $5 → 200,000 paid tokens after that. About ~500 tokens per minute.
MP3, WAV, M4A, FLAC, OGG, plus video (MP4, MOV, WebM) — we extract the audio. Max 500 MB per upload. Longer files? Split with /audio/cut/ or use /v1/stt/batch/.
Speaker diarization is a separate pass — toggle "diarize" on /transcribe/. Moonshine Base handles the transcription; diarization labels each segment with Speaker 1 / Speaker 2 / etc.
Yes — /batch/ accepts a folder of audio files. Each transcript lands in /account/?tab=history with the original filename. For folder-tree preservation use the API.
Yes — POST your audio to /v1/stt/transcribe/ with model="Moonshine Base". Returns JSON with text + segments + word-level timestamps. /api/ has the full reference.
Self-hosted models keep audio on our GPUs; premium pass through with a DPA. Audio is deleted after the share-window (24h anon, 7d signed-in). We do not train on your inputs.
Yes — Free.ai grants commercial use of transcripts. You need rights to the audio you uploaded (your own recording, licensed material, or content with consent).
Real-time factor is roughly 0.05–0.2× — a 60-minute podcast transcribes in 3–12 minutes. Premium models often finish faster. Use the queue button to close the tab.