Before using the service, please read the preliminary information containing a description of steps that enable access to the CLARIN-PL developer interface.
Whisper is an ASR (Automatic Speech Recognition) service for transcribing the content of audio files, based on an external model Whisper from OpenAI. It allows for speech recognition in Polish and English
The service can be run:
Select the appropriate language option for the input file:
lang
- available labguage options defining the selected model:
diabiz
- model diabiz
for the Polish languageen
- model en
for the English languagees
/fr
/... - model multilingual
The service can be run in the Windows system with default values using the following LPMN query: ['whisper']
[{'whisper': {'lang': 'en'}}]
- input data in the form of a compressed directory (.zip)
An audio file in .wav format.
A text file containing a transcription of the data.
In Colab: Whisper - Transcription of the content of audio files
Marcin Oleksy, Jan Wieczorek, Dorota Drużyłowska, Julia Klyus, Aleksandra Domogała, Krzysztof Hwaszcz, Hanna Kędzierska, Daria Mikoś, Anita Wróż (2022) "DiaBiz.Kom - towards a Polish Dialogue Act Corpus Based on ISO 24617-2 Standard", Proceedings of the 29th International Conference on Computational Linguistics, International Committee on Computational Linguistics: Gyeongju, Republic of Korea, 3631–3638.
(C) CLARIN-PL