Azure Media Indexer

Speech to text is a powerful media processor that enhances the accessibility and discoverability of your content!

Speech to text takes your multimedia content and use advanced natural langauge processing algorithms to generate spoken word transcripts for captions, index files for search engines, and keyword files for tagging!

Speech to text currently supports the following languages: English, Spanish, French, Italian, German, Portuguese, Chinese (Mandarin, Simplified), Arabic, Russian (new), and Japanese (new)

This demo demonstrates the captioning scenario, letting you view and edit the transcript outputs of a speech to text job to create perfectly accurate caption tracks.

Once you've created any necessary edits, you can export the captions to standard WebVTT or TTML formats for your own use.

Try selecting one of the sample languages below, and see how easily you can generate 100% accurate transcripts with the outputs of Speech to text!

Back

Azure Speech to Text

Quick links