Result filters

Metadata provider

Language

Resource type

Availability

Loading...
703 record(s) found

Search results

  • The CLASSLA-StanfordNLP model for lemmatisation of non-standard Serbian 1.1

    The model for lemmatisation of non-standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200), the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241), the hr500k training corpus (http://hdl.handle.net/11356/1183) and the RAPUT corpus (https://www.aclweb.org/anthology/L16-1513/), using the srLex inflectional lexicon (http://hdl.handle.net/11356/1233). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~97.62. The difference to the previous version of the lemmatizer is that now it relies solely on XPOS annotations, and not on a combination of UPOS, FEATS (lexicon lookup) and XPOS (lemma prediction) annotations.
  • HaskPL

    HaskPL is a Polish phraseological database designed for language professionals including linguists, language teachers, lexicographers, language materials developers and translators. Query results can be visualised and exported as spreadsheets. A complementary tool is HaskProof (http://pelcra.clarin-pl.eu:9894/#/lang/pl) identifying potential collocations in any text inserted by the user.
  • CUBBITT Translation Models (en-cs) (v1.0)

    CUBBITT En-Cs translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2014 (BLEU): en->cs: 27.6 cs->en: 34.4 (Evaluated using multeval: https://github.com/jhclark/multeval)
  • PyTorch model for Slovenian Named Entity Recognition SloNER 1.0

    The SloNER is a model for Slovenian Named Entity Recognition. It is is a PyTorch neural network model, intended for usage with the HuggingFace transformers library (https://github.com/huggingface/transformers). The model is based on the Slovenian RoBERTa contextual embeddings model SloBERTa 2.0 (http://hdl.handle.net/11356/1397). The model was trained on the SUK 1.0 training corpus (http://hdl.handle.net/11356/1747).The source code of the model is available on GitHub repository https://github.com/clarinsi/SloNER.
  • The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian 1.3

    This model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~97.06. The difference to the previous version of the model is that the model now also includes the Sloleks inflectional lexicon.
  • Voice control and question answering (22.10)

    [English] The goal of this work package was to develop Kaldi recipes for voice control and question answering systems for Icelandic. We defined six tasks and either generated or gathered data for each, normalized the data and trained Kaldi language models. Included in this submission are six ASR language models, an acoustic model, the training data for the language model and all the code used to generate the data and create the models. For further information have a look at the file README.md. [Icelandic] Markmiðið með þessu verkefni var að búa til talgreiningar uppskriftir með Kalda fyrir raddskipanir og fyrirspurnir. Við skilgreindum sex verkefni og annaðhvort söfnuðum eða bjuggum til gögn fyrir hvert og eitt þeirra, undirbjuggum gögnin og þjálfuðum mállíkön. Í þessu safni er að finna sex sérhæfð mállíkön, hljóðlíkan, gögnin sem voru notuð til þess að búa til mállíkönin ásamt öllum kóða sem notaður var til þess að búa til gögnin og líkönin. Freakri upplýsingar má finna í skránni README.md.
  • Liner2.5 model NER

    Przygotował: Michał Marcińczuk <marcinczuk@gmail.com> Data: 25.05.2016 Projekt: Clarin-PL (http://clarin-pl.eu) Autorzy: Michał Marcińczuk, Jan Kocoń, Michał Krautforst Modele do narzędzia Liner2.5 do rozpoznawania jednostek identyfikacyjnych. Narzędzie Liner2.5 dostępne jest pod linkiem http://hdl.handle.net/11321/231. Paczka zawiera trzy modele: 1. config-nam.ini -- granice jednostek identyfikacyjnych, 2. config-top9.ini -- granice i ogólna kategoryzacja jednostek (9 kategorii), 3. config-n82.ini -- granice i szczegółowa kategoryzacja jednostek (82 kategorie).
  • The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Bulgarian 1.0

    This model for morphosyntactic annotation of standard Bulgarian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the BulTreeBank training corpus (http://hdl.handle.net/11495/D93F-C6E9-65D9-2) and using the CoNLL2017 word embeddings (http://hdl.handle.net/11234/1-1989). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~96.8.