WebLicht Easy Chain for POS Tagging and Lemmatization (German). The pipeline makes use of WebLicht's TCF converter, the IMS tokenizer, and the IMS TreeTagger. WebLicht's Tundra can be used to visualize the result.
WebLicht Easy Chain for POS Tagging and Lemmatization (English). The pipeline makes use of WebLicht's TCF converter, the Stanford tokenizer, the Jitar POS Tagger, and the lemmatizer service from MorphAdorner. WebLicht's Tundra can be used to visualize the result.
WebLicht Easy Chain for POS Tagging and Lemmatization (French). The pipeline makes use of WebLicht's TCF converter, the IMS tokenizer, and the IMS TreeTagger. WebLicht's Tundra can be used to visualize the result.
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. It performs automatic linguistic enrichment such as part of speech tagging, lemmatisation, named entity recognition, shallow parsing, dependency parsing and morphological analysis. All NLP modules are based on TiMBL.
GreynirPackage is a Python 3 package for working with Icelandic natural language text. Greynir can parse text into sentence trees, find lemmas, inflect noun phrases, assign part-of-speech tags and much more. Greynir's sentence trees can inter alia be used to extract information from text, for instance about people, titles, entities, facts, actions and opinions. Greynir uses the Tokenizer package, by the same authors, to tokenize text. More information at https://github.com/mideind/GreynirPackage and detailed documentation at https://greynir.is/doc/.
GreynirPackage er Python 3 pakki sem vinnur með íslenskan texta. Greynir þáttar texta í setningar, lemmar og markar texta, beygir nafnliði og margt fleira. Hægt er að nýta þáttunartrén sem tólið býr til í þeim tilgangi að draga upplýsingar út úr texta, til dæmis um manneskjur, starfstitla, sérnafnaeiningar, staðreyndir, atburði og skoðanir. Greynir notar Tokenizer-pakkann, eftir sömu höfunda, til að tilreiða texta. Frekari upplýsingar má finna á https://github.com/mideind/GreynirPackage og ítarlega skjölun (á ensku) á https://greynir.is/doc/.
IceParser is a shallow parser for Icelandic. The parser comprises a sequence of finite-state transducers, which add syntactic information, in an incremental manner, into the input text. The input to IceParser is part-of-speech (PoS) tagged text and it produces output which includes annotation of both constituent structure and syntactic functions.
The distributed file contains the entirety of IceNLP, a toolkit of various NLP tools for processing and analysing Icelandic. The current version of IceParser in IceNLP has been specifically changed and updated to be able to annotate input tagged with the revised Icelandic POS tagset.
---
IceParser er hlutaþáttari fyrir íslensku. Þáttarinn samanstendur af röð af stöðuferjöldum sem bæta setningafræðilegum upplýsingum inn í inntakstextann á stigvaxandi hátt. Inntakið í IceParser er markaður texti og þáttarinn skilar af sér úttaki sem inniheldur bæði merkingar á setningaliðum og setningafræðilegum hlutverkum.
Skráin sem fylgir inniheldur allt IceNLP, þ.e. safn tóla til að vinna með og greina íslensku. Núverandi útgáfa af IceParser í IceNLP hefur verið breytt og uppfærð til að greina og merkja inntak sem er markað með hinu endurskoðað íslenska markamengi.
This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the SST treebank of spoken Slovenian (UD v2.15, https://github.com/UniversalDependencies/UD_Slovenian-SST/tree/r2.15) featuring transcriptions of spontaneous speech in various everyday settings.
It is able to predict sentence segmentation, tokenization, lemmatization, language-specific morphological annotation (MULTEXT-East morphosyntactic tags), as well as universal part-of-speech tagging, morphological feature prediction, and dependency parses in accordance with the Universal Dependencies annotation scheme (https://universaldependencies.org/).
Please note this model has been published for archiving purposes only. For production use, we recommend using the state-of-the art Trankit model available here: http://hdl.handle.net/11356/1965 (v1.2 or newest). The latter was trained on both spoken (SST) and written (SSJ) data, and demonstrates a significantly higher performance to the model featured in this submission.
In comparison with version 1.0, this model was trained on a new train-dev-test split of the SST treebank introduced in release UD v2.15.
GreynirPackage is a Python 3 package for working with Icelandic natural language text. Greynir can parse text into sentence trees, find lemmas, inflect noun phrases, assign part-of-speech tags and much more.
Greynir's sentence trees can inter alia be used to extract information from text, for instance about people, titles, entities, facts, actions and opinions.
Greynir uses the Tokenizer package, by the same authors, to tokenize text. More information at https://github.com/mideind/GreynirPackage and detailed documentation at https://greynir.is/doc/.
GreynirPackage er Python 3 pakki sem vinnur með íslenskan texta. Greynir þáttar texta í setningar, lemmar og markar texta, beygir nafnliði og margt fleira.
Hægt er að nýta þáttunartrén sem tólið býr til í þeim tilgangi að draga upplýsingar út úr texta, til dæmis um manneskjur, starfstitla, sérnafnaeiningar, staðreyndir, atburði og skoðanir.
Greynir notar Tokenizer-pakkann, eftir sömu höfunda, til að tilreiða texta. Frekari upplýsingar má finna á https://github.com/mideind/GreynirPackage og ítarlega skjölun (á ensku) á https://greynir.is/doc/.
This is a retrained Slovenian spoken language model for Trankit v1.1.1 library (https://pypi.org/project/trankit/). It is able to predict sentence segmentation, tokenization, lemmatization, language-specific morphological annotation (MULTEXT-East morphosyntactic tags), as well as universal part-of-speech tagging, feature prediction, and dependency parsing in accordance with the Universal Dependencies annotation scheme (https://universaldependencies.org/).
The model was trained using a combination of two datasets published by Universal Dependencies in release 2.12, the spoken SST treebank (https://github.com/UniversalDependencies/UD_Slovenian-SSJ/tree/r2.12) and the written SSJ treebank (https://github.com/UniversalDependencies/UD_Slovenian-SST/tree/r2.12). Its evaluation on the spoken SST test set yields an F1 score of 97.78 for lemmas, 97.19 for UPOS, 95.05 for XPOS and 81.26 for LAS, a significantly better performance in comparison to the counterpart model trained on written SSJ data only (http://hdl.handle.net/11356/1870).
To utilize this model, please follow the instructions provided in our github repository (https://github.com/clarinsi/trankit-train) or refer to the Trankit documentation (https://trankit.readthedocs.io/en/latest/training.html#loading). This ZIP file contains models for both xlm-roberta-large (which delivers better performance but requires more hardware resources) and xlm-roberta-base.
The version of the Tool Portal that you are currently using
is recording the behaviour of its user for testing purposes.
By pressing "Continue" below, you agree to the recording of your
actions while using this site. If you do not wish to agree to this,
please navigate away from this site.