Result filters

Metadata provider

Language

Resource type

Tool task

Field of study

Availability

Active filters:

  • Tool task: Part-of-speech tagging
Loading...
66 record(s) found

Search results

  • GreynirPackage 3.5.2 (22.10)

    GreynirPackage is a Python 3 package for working with Icelandic natural language text. Greynir can parse text into sentence trees, find lemmas, inflect noun phrases, assign part-of-speech tags and much more. Greynir's sentence trees can inter alia be used to extract information from text, for instance about people, titles, entities, facts, actions and opinions. Greynir uses the Tokenizer package, by the same authors, to tokenize text (see http://hdl.handle.net/20.500.12537/262). More information at https://github.com/icelandic-lt/GreynirEngine and detailed documentation at https://greynir.is/doc/. GreynirPackage er Python 3 pakki sem vinnur með íslenskan texta. Greynir þáttar texta í setningar, lemmar og markar texta, beygir nafnliði og margt fleira. Hægt er að nýta þáttunartrén sem tólið býr til í þeim tilgangi að draga upplýsingar út úr texta, til dæmis um manneskjur, starfstitla, sérnafnaeiningar, staðreyndir, atburði og skoðanir. Greynir notar Tokenizer-pakkann, eftir sömu höfunda, til að tilreiða texta (sjá http://hdl.handle.net/20.500.12537/262). Frekari upplýsingar má finna á https://github.com/icelandic-lt/GreynirEngine og ítarlega skjölun (á ensku) á https://greynir.is/doc/.
  • ABLTagger (PoS) - 1.0.0

    A Part-of-Speech (PoS) tagger for Icelandic. In this submission, you will find ABLTagger v1.0.0. This is a PoS tagger that works with the revised tagset and achieves an accuracy of 95.59% on MIM-Gold (cross-validation). For additional details, error analysis and categorization of this tagger and other taggers (including a previous version of ABLTagger), see I4 report for milestone (2020) in Language Technology Programme for Icelandic 2019-2023. For the most recent versions, installation, usage, and other instructions see https://github.com/cadia-lvl/POS on CLARIN: - Python wheel, version 1.0.0 - GitHub repository at version 1.0.0 - Model files (tagger and dictionaries) - Docker image, version 1.0.0 ------------------------------------------------------------------------------------------- Markari fyrir íslensku. Í þessum pakka er ABLTagger v.1.0.0. Þetta er markari sem virkar fyrir nýja markamengið og nær 95.59% nákvæmni á MÍM-Gull (krossprófanir). Fyrir nánari upplýsingar, villugreiningu og villuflokkun fyrir þennan markara og aðra (ásamt fyrri útgáfu af þessum markara), sjá I4 skýrslu fyrir vörðu 3 (2020) í Máltækniáætlun fyrir íslensku 2019-2023. Fyrir nýjustu útgáfur, uppsetninga-, notenda- og aðrar leiðbeiningar sjá https://github.com/cadia-lvl/POS Á CLARIN: - Python wheel, útgáfa 1.0.0 - GitHub repository af útgáfu 1.0.0 - Líkan (markari and orðabækur) - Docker mynd, útgáfa 1.0.0
  • The Trankit model for linguistic processing of written and spoken Slovenian 1.2

    This is a retrained Slovenian model for the Trankit v1.1.1 library for multilingual natural language processing (https://pypi.org/project/trankit/), trained on the concatenation of the SSJ UD treebank of written Slovenian (featuring fiction, non-fiction, periodicals and Wikipedia texts) and the SST UD treebank of spoken Slovenian (featuring transcriptions of spontaneous speech in various settings). It is able to predict sentence segmentation, tokenization, lemmatization, language-specific morphological annotation (MULTEXT-East morphosyntactic tags), as well as universal part-of-speech tagging, morphological features, and dependency parses in accordance with the Universal Dependencies annotation scheme (https://universaldependencies.org/). In comparison to its counterpart models trained on SSJ (http://hdl.handle.net/11356/1963) or SST datasets only, this model yields a significantly better performance on spoken transcripts and an identical state-of-the-art performance on written texts. The model can therefore be recommended as the default, 'universal' Trankit model for processing Slovenian, regardless of the data type. To utilize this model, please follow the instructions provided in our github repository (https://github.com/clarinsi/trankit-train) or refer to the Trankit documentation (https://trankit.readthedocs.io/en/latest/training.html#loading). This ZIP file contains models for both xlm-roberta-large (which delivers better performance but requires more hardware resources) and xlm-roberta-base. In comparison to the previous version, this version was trained on a newer, slightly improved version of the SSJ UD treebank (UD v2.14, https://github.com/UniversalDependencies/UD_Slovenian-SSJ/tree/r2.14) and a substantially extended and improved version of the SST UD treebank (https://github.com/UniversalDependencies/UD_Slovenian-SST/tree/r2.15), thus producing significantly better results for spoken data. In contrast to the previous versions of this model (1.0, 1.1), the model 1.2 was trained on a new SST train-dev-test split introduced in UD v2.15.
  • Postagger

    Set of tools used in natural language processing to assign labels or tags to text elements such as words or tokens. Postagger works at the stage after the text has been analyzed by a morphological or syntactic tagger and is intended to make the final classification and assign appropriate labels to individual text elements.