CLARIN Tool Portal

Active filters:

Tool task: Lemmatisation

66 record(s) found

Search results

GreynirPackage (2021-05-12)

3 resources

GreynirPackage is a Python 3 package for working with Icelandic natural language text. Greynir can parse text into sentence trees, find lemmas, inflect noun phrases, assign part-of-speech tags and much more. Greynir's sentence trees can inter alia be used to extract information from text, for instance about people, titles, entities, facts, actions and opinions. Greynir uses the Tokenizer package, by the same authors, to tokenize text. More information at https://github.com/mideind/GreynirPackage and detailed documentation at https://greynir.is/doc/. GreynirPackage er Python 3 pakki sem vinnur með íslenskan texta. Greynir þáttar texta í setningar, lemmar og markar texta, beygir nafnliði og margt fleira. Hægt er að nýta þáttunartrén sem tólið býr til í þeim tilgangi að draga upplýsingar út úr texta, til dæmis um manneskjur, starfstitla, sérnafnaeiningar, staðreyndir, atburði og skoðanir. Greynir notar Tokenizer-pakkann, eftir sömu höfunda, til að tilreiða texta. Frekari upplýsingar má finna á https://github.com/mideind/GreynirPackage og ítarlega skjölun (á ensku) á https://greynir.is/doc/.

Use "GreynirPackage (2021-05-12)"
IceNLP Natural Language Processing toolkit

3 resources

IceNLP is an open source Natural Language Processing (NLP) toolkit for analyzing and processing Icelandic text. The toolkit is implemented in Java. IceNLP er safn málgreiningartóla, gefið út með opnu leyfi, til þess að greina og vinna íslenskan texta. Tólin eru unnin í Java.

Use "IceNLP Natural Language Processing toolkit"
GreynirPackage 3.5.2 (22.10)

2 resources

GreynirPackage is a Python 3 package for working with Icelandic natural language text. Greynir can parse text into sentence trees, find lemmas, inflect noun phrases, assign part-of-speech tags and much more. Greynir's sentence trees can inter alia be used to extract information from text, for instance about people, titles, entities, facts, actions and opinions. Greynir uses the Tokenizer package, by the same authors, to tokenize text (see http://hdl.handle.net/20.500.12537/262). More information at https://github.com/icelandic-lt/GreynirEngine and detailed documentation at https://greynir.is/doc/. GreynirPackage er Python 3 pakki sem vinnur með íslenskan texta. Greynir þáttar texta í setningar, lemmar og markar texta, beygir nafnliði og margt fleira. Hægt er að nýta þáttunartrén sem tólið býr til í þeim tilgangi að draga upplýsingar út úr texta, til dæmis um manneskjur, starfstitla, sérnafnaeiningar, staðreyndir, atburði og skoðanir. Greynir notar Tokenizer-pakkann, eftir sömu höfunda, til að tilreiða texta (sjá http://hdl.handle.net/20.500.12537/262). Frekari upplýsingar má finna á https://github.com/icelandic-lt/GreynirEngine og ítarlega skjölun (á ensku) á https://greynir.is/doc/.

Use "GreynirPackage 3.5.2 (22.10)"
The Trankit model for linguistic processing of written and spoken Slovenian 1.2

2 resources

This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the concatenation of the SSJ UD treebank of written Slovenian (featuring fiction, non-fiction, periodicals and Wikipedia texts) and the SST UD treebank of spoken Slovenian (featuring transcriptions of spontaneous speech in various settings). It is able to predict sentence segmentation, tokenization, lemmatization, language-specific morphological annotation (MULTEXT-East morphosyntactic tags), as well as universal part-of-speech tagging, morphological features, and dependency parses in accordance with the Universal Dependencies annotation scheme (https://universaldependencies.org/). In comparison to its counterpart models trained on SSJ (http://hdl.handle.net/11356/1963) or SST datasets only, this model yields a significantly better performance on spoken transcripts and an identical state-of-the-art performance on written texts. The model can therefore be recommended as the default, 'universal' Trankit model for processing Slovenian, regardless of the data type. To utilize this model, please follow the instructions provided in our github repository (https://github.com/clarinsi/trankit-train) or refer to the Trankit documentation (https://trankit.readthedocs.io/en/latest/training.html#loading). This ZIP file contains models for both xlm-roberta-large (which delivers better performance but requires more hardware resources) and xlm-roberta-base. In comparison to the previous version, this version was trained on a newer, slightly improved version of the SSJ UD treebank (UD v2.14, https://github.com/UniversalDependencies/UD_Slovenian-SSJ/tree/r2.14) and a substantially extended and improved version of the SST UD treebank (https://github.com/UniversalDependencies/UD_Slovenian-SST/tree/r2.15), thus producing significantly better results for spoken data. In contrast to the previous versions of this model (1.0, 1.1), the model 1.2 was trained on a new SST train-dev-test split introduced in UD v2.15.

Use "The Trankit model for linguistic processing of written and spoken Slovenian 1.2"
Text Tonsorium - Lemmas.

The Text Tonsorium designs and enacts workflows that fulfil your goal. Here, the goal is set to "lemmatization of the input". Once in Text Tonsorium, you can refine or change the goal.

Use "Text Tonsorium - Lemmas."
LiLa Text Linker

The LiLa Text Linker is a pos-tagger and Lemmatizer for Latin that also provides, for each analyzed token, a link to the lemma entry in the LOD-compliant LiLa Lemma Bank. The tool was produced in the context of the 'LiLa - Linking Latin' project (https://lila-erc.eu/).

Use "LiLa Text Linker"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Active filters:

Search results

GreynirPackage (2021-05-12)

IceNLP Natural Language Processing toolkit

GreynirPackage 3.5.2 (22.10)

The Trankit model for linguistic processing of written and spoken Slovenian 1.2

Text Tonsorium - Lemmas.

LiLa Text Linker

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Active filters:

Search results

GreynirPackage (2021-05-12)

IceNLP Natural Language Processing toolkit

GreynirPackage 3.5.2 (22.10)

The Trankit model for linguistic processing of written and spoken Slovenian 1.2

Text Tonsorium - Lemmas.

LiLa Text Linker

Session recording