Before using the service, please read the preliminary information containing a description of steps that enable access to the CLARIN-PL developer interface.
Sentence processes sentences into sequences of word embeddings or sequences of words using a selected deep model. Depending on the selected model, the processing result can be, for example, word embeddings or keywords. The obtained word embeddings can then be used by the user, for example, to calculate the semantic similarity of sentences or for other purposes consistent with the research direction.
Word embedding is a technique for representing the meaning of words in a vector space as semantically meaningful embeddings. An embedding vector is a numeric vector resulting from the transformation of a given word from a text into a numeric vector, representing its occurrence in a specific context.
Models
The following deep models are currently available in the service:
Models 1-3 are used to generate embeddings, and 4-5 to generate keywords.
Model selection from the ones listed above.
Sentence can be run in the LPMN Client service by writing code in Python using the lpmn_client_biz library and the methods and classes available in it.
Note: Since Sentence processes lists of sentences only, it uses only the run_sent
method.
Model
- for example:
model = "sbert-klej-cdsc-r"
Text
Output data depends on the selected model. The following models generate word embeddings:
whereas the above generate keywords:
To determine embeddings for long texts or to introduce input data in the form of a file or directory, use Embedder.
In Colab: Sentence - Determining the embeddings of short texts
(C) CLARIN-PL