CLARIN Tool Portal

698 record(s) found

Search results

Universal Dependencies 2.6 models for UDPipe 2 (2020-08-31)

2 resources

Tokenizer, POS Tagger, Lemmatizer and Parser models for 99 treebanks of 63 languages of Universal Depenencies 2.6 Treebanks, created solely using UD 2.6 data (https://hdl.handle.net/11234/1-3226). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_26_models . To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .

Use "Universal Dependencies 2.6 models for UDPipe 2 (2020-08-31)"
Corpus2MWE

2 resources

A CCL reader (Corpus2) with MWE detection.

Use "Corpus2MWE"
The CLASSLA-Stanza model for morphosyntactic annotation of standard Croatian 2.1

3 resources

The model for morphosyntactic annotation of standard Croatian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the hr500k training corpus (http://hdl.handle.net/11356/1792) and using the CLARIN.SI-embed.hr word embeddings (http://hdl.handle.net/11356/1790). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~94.87. The difference to the previous version of the model is that this version was trained using the new version of the hr500k corpus and the new version of the Croatian word embeddings.

Use "The CLASSLA-Stanza model for morphosyntactic annotation of standard Croatian 2.1"
The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Serbian 1.2

3 resources

The model for morphosyntactic annotation of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) and using the CLARIN.SI-embed.sr word embeddings (http://hdl.handle.net/11356/1206). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~95.2. The difference to the previous version of the model is that the pre-trained embeddings are limited to 250 thousand entries and adapted to the new code base.

Use "The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Serbian 1.2"
KPWr n82 NER model (on Polish RoBERTa base)

2 resources

The named entity recognition model for fine-grained categories of entities (82 types) was trained on the KPWr corpus using Polish RoBERTa base language model. Details can be found on the following page: https://github.com/mczuk/xlm-roberta-ner

Use "KPWr n82 NER model (on Polish RoBERTa base)"
PyTorch model for Slovenian Coreference Resolution

2 resources

Slovenian model for coreference resolution: a neural network based on a customized transformer architecture, usable with the code published on https://github.com/matejklemen/slovene-coreference-resolution. The model is based on the Slovenian CroSloEngual BERT 1.1 model (http://hdl.handle.net/11356/1330). It was trained on the SUK 1.0 training corpus (http://hdl.handle.net/11356/1747), specifically the SentiCoref subcorpus. Using the evaluation setting where entity mentions are assumed to be correctly pre-detected, the model achieves the following metric values: MUC: precision = 0.931, recall = 0.957, F1 = 0.943 BCubed: precision = 0.887, recall = 0.947, F1 = 0.914 CEAFe: precision = 0.945, recall = 0.893, F1 = 0.916 CoNLL-12: precision = 0.921, recall = 0.932, F1 = 0.924

Use "PyTorch model for Slovenian Coreference Resolution"
SuperMatrix

2 resources

SuperMatrix is a system to support automatic extraction of semantic relations, based on the analysis of large text corpora. System was developed as a tool for expansion of Polish wordnet (Słowosieć).Expansion consist of two steps: system suggests a potential links between lexical units. Linguist verify these suggestions and decide which form will go to wordnet. This speeded up the work and preserve the integrity of data entry.

Use "SuperMatrix"
Grafon

1 resources

Representation of sentence semantic with deepened semantic graphs. Graphs are composed based on the output of saper tool https://clarin-pl.eu/dspace/handle/11321/278

Use "Grafon"
CORDEX inflectional lookup data 1.0

2 resources

The inflectional data lookup module serves as an optional component within the cordex library (https://github.com/clarinsi/cordex/) that significantly improves the quality of the results. The module consists of a pickled dictionary of 111,660 lemmas, and maps these lemmas to their corresponding word forms. Each word form in the dictionary is accompanied by its MULTEXT-East morphosytactic descriptions, relevant features (custom features extracted from morphosytactic descriptions with the help of https://gitea.cjvt.si/generic/conversion_utils and its frequency within the Gigafida 2.0 corpus (http://hdl.handle.net/11356/1320), or Gigafida 1.0 when other information is unavailable. The dictionary is used to select the most frequent word form of a lemma that satisfies additional filtering conditions (ie. find the most utilized word form of lemma "centralen" in singular, i.e."centralni").

Use "CORDEX inflectional lookup data 1.0"
Malach Center User Interface 1.0

2 resources

Source code of the first full and running version for the Malach Center User Interface, does not contain data or metadata fo the digital objects and resources.

Use "Malach Center User Interface 1.0"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

Universal Dependencies 2.6 models for UDPipe 2 (2020-08-31)

Corpus2MWE

The CLASSLA-Stanza model for morphosyntactic annotation of standard Croatian 2.1

The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Serbian 1.2

KPWr n82 NER model (on Polish RoBERTa base)

PyTorch model for Slovenian Coreference Resolution

SuperMatrix

Grafon

CORDEX inflectional lookup data 1.0

Malach Center User Interface 1.0

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

Session recording