SpokesBiz is a corpus of conversational Polish currently comprising over 650 hours of recordings. The transcribed recordings have been diarized and manually annotated for punctuation and casing. The corpus can be searched by 23 categories, including: year of recording, type of communication, number of words in a given segment, age and level of education of the speaker (“none”, “primary”, “secondary” and “higher”).
SpokesBiz was developed at the University of Łódź within the CLARIN-BIZ project (2020-2023) in collaboration with VoiceLab.
To access SpokesBiz, please fill in the following Access Request Form. After completing it, an e-mail with an access link will be sent to the indicated e-mail address.
Piotr Pęzik, Sylwia Karasińska, Anna Cichosz, Łukasz Jałowiecki, Konrad Kaczyński, Małgorzata Krawentek, Karolina Walkusz, Paweł Wilk, Mariusz Kleć, Krzysztof Szklanny, Szymon Marszałkowski (2023) "SpokesBiz – an Open Corpus of Conversational Polish".
(C) CLARIN-PL