AI-based Auto-Transcription

<< Click to Display Table of Contents >>

Navigation:  START YOUR OBSERVATIONS > Transcriptions and Comments >

AI-based Auto-Transcription

Since 2024, INTERACT Premium offers an offline, AI-based auto-Transcription module.

After correct installation, this module enables you to auto-transcribe any video with clear audio locally - fully GDPR compliant.

Note: During the installation of INTERACT and the first run of the auto-transcribe module, an internet connection is required.

Start an Automated Transcription

Open you video in INTERACT

Open an INTERACT data file.

Create a DataSet by clicking the Btn_AddSet Add Set button.

Click Insert - Reference - Multimedia to DataSet Btn_LinkVideoToDataSet

Or...

Make a right click inside the corresponding DataSet.

Choose Insert file reference > Link current videos to current DataSet from the context menu.

Make sure you can open your linked video by double-clicking a time stamp of the DataSet, otherwise the transcription tool is not able to 'find' the video.

Configure Auto-Transcript

Open the configuration dialog with the command Text - Text analysis > Autotranscribe-Whisper.

Auto-TranscribeMenu

The configuration dialog appears:

Auto-Transcribe-Dialog Whisper_TranscriptExportFormats

With those default settings, you already receive a rather good transcription of your video.

Model - The selected model determines both the quality of the result and the time it takes to complete the transcription. The base Model is a very good compromise. For a rough index of the spoken words, even the tiny model might be sufficient. You'll need to test that for yourself.

Language - Automatic for automated language detection. Not all languages can be transcribed correctly.

Device - Specifies if the workload should be handled by the CPU or the GPU. If you have a good CUDA compatible graphics board,  GPU handling is much faster.*

Number of threads - default workload spread.

Transcript format - Specifies the file format of the resulting text file. SRT and VTT are specific sub-title formats that can also be imported directly into INTERACT.
 

Transcript type - Specifies how the Events are created: Per sentence or per Word. A per Word transcription creates an Event for every word, which results in accurate timing per word.

Output path

Specifies where the transcript file is stored. Creating this file in the same directory as the video makes it easy to find it.

Add transcripts as INTERACT Events - This option ensures the automated creation of INTERACT Events. If you clear this option, you can import the SRT file into INTERACT later.

Repetitive Transcription passes

The drop down at the bottom offers the following options:

oSkip file and do not create Events - If the video has already been transcribed, nothing happens

oOverwrite and transcribe again - Previous transcriptions are overwritten and the video is processed again.

oUse existing transcript for creating Events - Previous transcriptions are used to re-create Events in the current data file.

*) activating GPU will NOT work out of the box. It requires the installation of the compatible CUDA drivers for that card and a special version of the corresponding Python Torch modules.

The Model you select determines the quality of the transcription.
The better the quality, the longer it takes for the transcription to complete.

Auto-TranscribeProgressBar

The length of a video and the number of spoken words is another important factor for the duration of the task.

Some indications about the duration of the transcription process:

oA 30 second video running on a decent Corei7 CPU takes about 30 seconds when using the base model, but 5 Minutes when using the medium model.

oThe same 30 second video on a correctly configured GPU takes less than 20 Seconds for the Medium model and about 3 minutes for the Large model (if your GPU offers enough memory)

These are only rough estimates and cannot be multiplied linearly for longer video, but it indicates the difference between those three models.