Result filters

Metadata provider

Language

Resource type

Availability

  • Share-alike

Active filters:

  • Availability: Share-alike
Loading...
193 record(s) found

Search results

  • Slavic Forest, Norwegian Wood (models)

    Trained models for UDPipe used to produce our final submission to the Vardial 2017 CLP shared task (https://bitbucket.org/hy-crossNLP/vardial2017). The SK model was trained on CS data, the HR model on SL data, and the SV model on a concatenation of DA and NO data. The scripts and commands used to create the models are part of separate submission (http://hdl.handle.net/11234/1-1970). The models were trained with UDPipe version 3e65d69 from 3rd Jan 2017, obtained from https://github.com/ufal/udpipe -- their functionality with newer or older versions of UDPipe is not guaranteed. We list here the Bash command sequences that can be used to reproduce our results submitted to VarDial 2017. The input files must be in CoNLLU format. The models only use the form, UPOS, and Universal Features fields (SK only uses the form). You must have UDPipe installed. The feats2FEAT.py script, which prunes the universal features, is bundled with this submission. SK -- tag and parse with the model: udpipe --tag --parse sk-translex.v2.norm.feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu A slightly better after-deadline model (sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe), which we mention in the accompanying paper, is also included. It is applied in the same way (udpipe --tag --parse sk-translex.v2.norm.Case-feats07.w2v.trainonpred.udpipe sk-ud-predPoS-test.conllu). HR -- prune the Features to keep only Case and parse with the model: python3 feats2FEAT.py Case < hr-ud-predPoS-test.conllu | udpipe --parse hr-translex.v2.norm.Case.w2v.trainonpred.udpipe NO -- put the UPOS annotation aside, tag Features with the model, merge with the left-aside UPOS annotation, and parse with the model (this hassle is because UDPipe cannot be told to keep UPOS and only change Features): cut -f1-4 no-ud-predPoS-test.conllu > tmp udpipe --tag no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe no-ud-predPoS-test.conllu | cut -f5- | paste tmp - | sed 's/^\t$//' | udpipe --parse no-translex.v2.norm.tgttagupos.srctagfeats.Case.w2v.udpipe
  • ForFun 1.0

    ForFun is a database of linguistic forms and their syntactic functions built with the use of the multi-layer annotated corpora of Czech, the Prague Dependency Treebanks. The purpose of the Prague Database of Forms and Functions (ForFun) is to help the linguists to study the form-function relation, which we assume to be one of the principal tasks of both theoretical linguistics and natural language processing. A prototypical question to be asked is "What purposes does a preposition 'po' serve for" or "What are the linguistic means in the sentence that can express the meaning 'a destination of an action'?". There are almost 1500 distinct forms (besides the 'po' preposition) and 65 distinct functions (besides the 'destination').
  • Universal Dependencies 2.0 Models for UDPipe (2017-08-01)

    Tokenizer, POS Tagger, Lemmatizer and Parser models for all 50 languages of Universal Depenencies 2.0 Treebanks, created solely using UD 2.0 data (http://hdl.handle.net/11234/1-1983). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/users-manual#universal_dependencies_20_models . To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe . In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.
  • The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.2

    The model for lemmatisation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). The estimated F1 of the lemma annotations is ~99.0. The difference to the previous version is that now it relies solely on XPOS annotations, and not on a combination of UPOS, FEATS (lexicon lookup) and XPOS (lemma prediction) annotations.
  • Translation Models (en-de) (v1.0)

    En-De translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2020 (BLEU): en->de: 25.9 de->en: 33.4 (Evaluated using multeval: https://github.com/jhclark/multeval)
  • The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Croatian 1.1

    The model for morphosyntactic annotation of standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1183) and using the CLARIN.SI-embed.hr word embeddings (http://hdl.handle.net/11356/1205). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~94.1. The difference to the previous version of the model is that now the whole XPOS tag is predicted and not specific characters, as was the case in stanfordnlp, which resulted in illegal XPOS tags (and slightly decreased performance).
  • Universal Dependencies 2.5 Models for UDPipe (2019-12-06)

    Tokenizer, POS Tagger, Lemmatizer and Parser models for 94 treebanks of 61 languages of Universal Depenencies 2.5 Treebanks, created solely using UD 2.5 data (http://hdl.handle.net/11234/1-3105). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_25_models . To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe . In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.
  • The CLASSLA-Stanza model for lemmatisation of standard Serbian 2.1

    The model for lemmatisation of standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) combined with the Serbian non-standard training corpus ReLDI-NormTagNER-sr (http://hdl.handle.net/11356/1794) and using the srLex inflectional lexicon (http://hdl.handle.net/11356/1233). The estimated F1 of the lemma annotations is ~98.02. The difference to the previous version is that this version was trained on a combination of the standard (SETimes.SR) and non-standard (ReLDI-NormTagNER-sr) Serbian training corpora.