Before using the service, please read the preliminary information containing a description of steps that enable access to the CLARIN-PL developer interface.
The service allows you to determine the emotions and sentiment in a text in Polish. The analysis can be performed on the entire text, selected paragraphs or individual sentences. It allows you to process texts, single text files and corpora.
It takes into account the eight emotion categories from Plutchik's model (joy, trust, anticipation, surprise, fear, sadness, disgust, and anger) as well as three sentiment categories: positive, negative, neutral. The result of the service is the confidence score for each category, ranging from 0 to 1. A confidence score closer to 1 means that a given emotion is expressed in the analysed text fragment.
The scoring model is based on the Transformer architecture (XLM-RoBERTa-Large) and was trained on the CLARIN-Emo corpus. The quality of the model (F1-micro score across all categories) is 0.84. The exact quality results are presented in Table 3 of the article describing CLARIN-Emo.
The service can be run:
mode
: selection of the analysis level
text
- if the document is no longer than ~200 words, the entire document is processed. If it is longer, it is semantically segmented into possibly the longest chunks. This is the default option.paragraph
- paragraphs are processedsentence
- sentences are processedThe service can be run in the Windows system with default values using the following LPMN query: ['emotagger']
.
['any2txt', 'emotagger']
- input data in the form of a text file (.txt)['any2txt', {'emotagger': {'mode': 'sentence'}}]
- sentence-level segmentation of the textA text, a .zip file containing a text file, or a corpus of texts.
text
- a segment of the input text after segmentationjoy
trust
anticipation
surprise
fear
sadness
disgust
anger
positive
negative
neutral
For processing files or corpora, the analysis results are saved to an Excel file containing the following columns:
filename
chunk id
- ID of the text fragmenttext
- a segment of the input text after segmentationIn Colab: Emotagger - wyznaczanie emocji i wydźwięku w tekście.
A script with an example query to a locally running service: https://gitlab.clarin-pl.eu/nlpworkers/emotagger/-/blob/master/test.py?ref_type=heads.
Bartłomiej Koptyra, Anh Ngo, Łukasz Radliński, Jan Kocoń (2023) "CLARIN-Emo: Training Emotion Recognition Models Using Human Annotation and ChatGPT", Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 14073, Springer, Cham, 365-379.
(C) CLARIN-PL