Before using the service, please read the preliminary information containing a description of steps that enable access to the CLARIN-PL developer interface.
Punctuator is a service that adds punctuation to a text in Polish, English, and Russian. It was developed with the aim of restoring punctuation to text originating from the transcription of spoken text.
Note that Punctuator adds punctuation to a text, it does not correct existing punctuation, so the correct preparation of the input data is crucial for the service to work properly - the input text should not contain any punctuation.
The current version of the model has been trained on parliamentary corpus and a subset of the Polish Wikipedia for the task of restoring the original punctuation. It is based on the BERT architecture.
Performance parameters:
The service allows you to choose the language of the processed text:
It may be useful in research that:
The service can be run:
language
:
pl
- Polish (default)en
- Englishru
- RussianThe service can be run in the Windows system with default values using the following LPMN query: ['punctuator']
[['punctuator']]
- input data in the form of a compressed directory (.zip)[{'punctuator':{'language':'en'}}]
- input data in English[{'punctuator':{'language':'ru'}}]
- input data in RussianThe prepared text file cannot contain any punctuation marks or capital letters. If there are any punctuation marks in the text, Punctuator will process the text as if they were not present, and all capital letters will be converted to lowercase before the proper processing of the text.
A text file with added punctuation marks and capital letters.
In Colab: Adding punctuation to the text
(C) CLARIN-PL