Paying for Automated Metadata
Requesting Whisper Automated Transcription
In Aviary it is possible to request automated speech-to-text transcription from OpenAI's Whisper, directly from Aviary resources and in bulk from the resources table (similar to the existing IBM Watson functionality). Features of the service are:
Low cost/high quality output ($0.040 per minute of audio/video recording)
Skip common words (skips common words like “umm” which are skipped by default to improve readability)
Proper noun prompts (the option to provide up to 244 characters of prompts to improve transcription of proper nouns)
Sentence/phrase level transcription
Supports almost 100 languages, See Whisper language support for more information.
To request Whisper Automated Transcription, follow these steps:
Don’t forget that all transcription requests must be approved by an Org Owner
Requesting Whisper Automated Translation to English
Requesting IBM Watson Automated Transcription
In Aviary it is possible to request automated speech-to-text transcription from IBM Watson, directly from Aviary resources and in bulk from the resources table. Features of the service are:
Automatic speaker diarization (the option will turn on speaker identification)
Smart formatting (the option to generate special formatting for dates, times, series of digits and numbers, phone numbers, currency values, internet email, and web addresses)
Profanity filter (the option will censor profanity from the results)
Speaker hesitation markers (the option to and or remove speaker "%HESITATION" from the results)
Word level transcription
Supports several languages, See IBM language support for more information.
Automatically Align a Legacy Transcript with Gentle
In Aviary is possible to align existing transcripts to the timecode of the associated media file. To do this, we have implemented Gentle (http://lowerquality.com/gentle) an application the takes text transcripts that do not have time codes and aligns it audiovisual media files. Gentle is designed specifically for English speech and text, but is very lenient. This means it can align lower quality audio recordings to transcripts and also highly edited legacy transcripts that may include words not spoken in the recording or skip words that are not included in the legacy transcripts. Gentle simply aligns all the words it can recognize and arranges all the words from the transcript that is cannot in between the recognized sections. Gentle is also a good choice for legacy transcripts formatted in longer paragraphs. The align existing transcript service automatically produces a new transcript with the same paragraph formatting as the original AND also creates captions sentence-by-sentence to display in the media player for accessibility reasons.
Features of the service are:
Align existing legacy transcription that do not contain time codes. This service is particularly useful for highly edited transcripts that either omit or add many words that are not spoken on the recording.
Maintain spacing and paragraph formatting different from automated transcripts or captions.
Outputs both paragraph formatted transcript better for reading and short-snippet caption to use for accessibility purposes.
Note: this service runs only once per day. It may take up to 24 hours for service to complete; depending on the time request is placed.