Before using the service, please read the preliminary information containing a description of steps that enable access to the CLARIN-PL developer interface.
TermoPL is a service for extracting terms from a set of texts in Polish. It searches for single words and multi-word expressions, using a mechanism of grammatical compatibility. It detects words and phrases that are characteristic of a given field. The service uses the TermoPL tool developed at IPI PAN, implemented in the Clarin-PL infrastructure.
Terminology extraction can be useful in research involving:
TermoPL can be used in various fields, including journalism, medicine, law and e-commerce.
The service can be run:
mw : true - only multi-word terms will be returnedsw - the path in the service system to the file containing the list of words that are not taken into account (stoplista)cp - the path in the service system to the file containing the list of pronoun patterns that are not taken into account (ang. compound prepositions)Note: All options are described in the TermoPL documentation.
The service can be run in the Windows system with default values using the following LPMN query: [['any2txt','postagger'],'termopl'] - input data in Polish in JSONL format.
['unzip','termopl'] - the auxiliary tool unzip ensures that the directory is handled correctlyA .zip directory containing text files.
A directory containing the following:
terms.csv - default file format,termsshort.csv - terms.csv file limited to 1000 records,terms.xlsx - a terms.csv file converted to an Excel file with encoding and column descriptions set. It is recommended to use this file.lemmasDictionary.json - a file containing lemmas, including lemmas for multi-word expressions, formatted for the startlist of Fextor3.In Colab: TermoPL - Term extraction from a corpus of texts
(C) CLARIN-PL