CLARIN Tool Portal

698 record(s) found

Search results

The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.3

2 resources

The model for lemmatisation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). The estimated F1 of the lemma annotations is ~99.7. The difference to the previous version is that the internal lexicon is built on the lexicon training data only, and not on the (automatically XPOS-annoteted) corpus data.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.3"
The CLASSLA-Stanza model for lemmatisation of standard Croatian 2.1

2 resources

The model for lemmatisation of standard Croatian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the hr500k training corpus (http://hdl.handle.net/11356/1792) and using the hrLex inflectional lexicon (http://hdl.handle.net/11356/1232). The estimated F1 of the lemma annotations is ~98.02. The difference to the previous version is that this version was trained on the new version of the hr500k corpus.

Use "The CLASSLA-Stanza model for lemmatisation of standard Croatian 2.1"
The CLASSLA-StanfordNLP model for lemmatisation of non-standard Slovenian 1.1

2 resources

The model for lemmatisation of non-standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and the Janes-Tag corpus (http://hdl.handle.net/11356/1238), using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~98.86. The difference to the previous version of the lemmatizer is that now it relies solely on XPOS annotations, and not on a combination of UPOS, FEATS (lexicon lookup) and XPOS (lemma prediction) annotations.

Use "The CLASSLA-StanfordNLP model for lemmatisation of non-standard Slovenian 1.1"
Saper

1 resources

Shallow semantic parser for polish texts processing. Contains word sense disambiguation, mapping go SUMO concepts and semantic role labelling.

Use "Saper"
DG-POLFIE: POLFIE and Malt-based syntactic parser

1 resources

DG-POLFIE is a prototypical parser that tries to merge parse fragments generated by POLFIE using Polish Dependency Parser DG-POLFIE aims to improve the coverage of the POLFIE parser (i.e. the percentage of sentences with at least one analysis). In order to increase the number of Polish sentences and constructions that could be parsed with the POLFIE-based parser, DG-POLFIE defines some rules that use depenency structure to build full parse from the FRAGMENTS provided by POLFIE.

Use "DG-POLFIE: POLFIE and Malt-based syntactic parser"
LiStr: Linguistic Structure Induction Tookit

2 resources

This toolkit comprises the tools and supporting scripts for unsupervised induction of dependency trees from raw texts or texts with already assigned part-of-speech tags. There are also scripts for simple machine translation based on unsupervised parsing and scripts for minimally supervised parsing into Universal-Dependencies style.

Use "LiStr: Linguistic Structure Induction Tookit"
The CLASSLA-Stanza model for morphosyntactic annotation of non-standard Serbian 2.1

3 resources

This model for morphosyntactic annotation of non-standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200), the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1794) and the hr500k training corpus (http://hdl.handle.net/11356/1792), using the CLARIN.SI-embed.sr word embeddings (http://hdl.handle.net/11356/1789). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~92.64. The difference to the previous version of the model is that this version uses the new version of Serbian word embeddings and is trained on a combination of three training corpora (SETimes.SR, ReLDI-NormTagNER-sr, hr500k).

Use "The CLASSLA-Stanza model for morphosyntactic annotation of non-standard Serbian 2.1"
The CLASSLA-StanfordNLP model for named entity recognition of non-standard Croatian 1.0

2 resources

This model for named entity recognition of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1183), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241) and the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), using the CLARIN.SI-embed.hr word embeddings (http://hdl.handle.net/11356/1205). The training corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed.

Use "The CLASSLA-StanfordNLP model for named entity recognition of non-standard Croatian 1.0"
Samrómur-Adolescents Kaldi Recipe 22.06

2 resources

The "Samrómur-Adolescents Kaldi Recipe 22.06" is a code recipe intended to show how to integrate the adolescent portion of the corpus "Samrómur Children's Icelandic Speech Data 21.09" [1] and the "Icelandic Language Models with Pronunciations 22.01" [2] to create automatic speech recognition systems using the Kaldi toolkit [3].

Use "Samrómur-Adolescents Kaldi Recipe 22.06"
UDify Pretrained Model

3 resources

Pretrained model weights for the UDify model, and extracted BERT weights in pytorch-transformers format. Note that these weights slightly differ from those used in the paper.

Use "UDify Pretrained Model"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.3

The CLASSLA-Stanza model for lemmatisation of standard Croatian 2.1

The CLASSLA-StanfordNLP model for lemmatisation of non-standard Slovenian 1.1

Saper

DG-POLFIE: POLFIE and Malt-based syntactic parser

LiStr: Linguistic Structure Induction Tookit

The CLASSLA-Stanza model for morphosyntactic annotation of non-standard Serbian 2.1

The CLASSLA-StanfordNLP model for named entity recognition of non-standard Croatian 1.0

Samrómur-Adolescents Kaldi Recipe 22.06

UDify Pretrained Model

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

Session recording