Before using the service, please read the preliminary information containing a description of steps that enable access to the CLARIN-PL developer interface.
TermoPL is a service for extracting terms from a set of texts in Polish. It searches for single words and multi-word expressions, using a mechanism of grammatical compatibility. It detects words and phrases that are characteristic of a given field. The service uses the TermoPL tool developed at IPI PAN, implemented in the Clarin-PL infrastructure.
Terminology extraction can be useful in research involving:
TermoPL can be used in various fields, including journalism, medicine, law and e-commerce.
The service can be run:
mw
: true
- only multi-word terms will be returnedsw
- the path in the service system to the file containing the list of words that are not taken into account (stoplista)cp
- the path in the service system to the file containing the list of pronoun patterns that are not taken into account (ang. compound prepositions)Note: All options are described in the TermoPL documentation.
The service can be run in the Windows system with default values using the following LPMN query: [['any2txt','postagger'],'termopl']
- input data in Polish in JSONL format.
['unzip','termopl']
- the auxiliary tool unzip ensures that the directory is handled correctlyA .zip directory containing text files.
A directory containing the following:
terms.csv
- default file format,termsshort.csv
- terms.csv
file limited to 1000 records,terms.xlsx
- a terms.csv
file converted to an Excel file with encoding and column descriptions set. It is recommended to use this file.lemmasDictionary.json
- a file containing lemmas, including lemmas for multi-word expressions, formatted for the startlist of Fextor3.In Colab: TermoPL - Term extraction from a corpus of texts
(C) CLARIN-PL