Spokes is a multimedia search engine for a unique corpus of casual conversational Polish. It allows for very advanced data mining and visualization, using formulated queries. The interface and documentation are available only in English. Developers can also access text data and recordings through dedicated services.
Spokes is currently being developed by the PELCRA team as part of the Polish CLARIN Infrastructure.
The corpus currently contains 658,181 utterances (8,535,617 words, 1024 hours) in 1,594 transcriptions of mostly casual conversations. The conversations were recorded in everyday communicative contexts, transcribed, anonymised, annotated with sociolinguistic metadata and time-aligned with the original audio files. Most of the transcriptions are aligned with the audio recordings.
Complete documentation is available on the PELCRA Tools website.
Among others, in research in the field of linguistics and other disciplines in the humanities and social sciences requiring the analysis of samples of natural spoken language using data exploration and visualization tools.
The service is available here. It does not require creating an account nor logging in.
Spokes also contains an experimental instance that has been made available for the spoken component of the British National Corpus without recordings. It can be used to explore the spoken BNC data sets made available as a result of a joint project of the British Library Sound Archive and the Oxford University Phonetics Laboratory.
Spokes uses the SlopeQ 2 query syntax. A description of queries' structure is available in Spokes' documentation on the PELCRA Tools website.
Piotr Pęzik (2015) "Spokes – a Search and Exploration Service for Conversational Corpus Data", Selected Papers from the CLARIN 2014 Conference, October 24-25 2014, Soesterberg, The Netherlands, 99–109. Linköping Electronic Conference Proceedings. Linköping University Electronic Press, Linköpings universitet.
(C) CLARIN-PL