CLARIN Tool Portal

The CLASSLA-StanfordNLP model for lemmatisation of standard Croatian 1.1

2 resources

The model for lemmatisation of standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1183) and using the hrLex inflectional lexicon (http://hdl.handle.net/11356/1232). The estimated F1 of the lemma annotations is ~97.6. The difference to the previous version of the model is that it is trained with the lemmatiser padding bug removed, cf. https://github.com/stanfordnlp/stanfordnlp/issues/143.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Croatian 1.1"

The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.2

2 resources

The model for lemmatisation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). The estimated F1 of the lemma annotations is ~99.0. The difference to the previous version is that now it relies solely on XPOS annotations, and not on a combination of UPOS, FEATS (lexicon lookup) and XPOS (lemma prediction) annotations.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.2"

The CLASSLA-Stanza model for lemmatisation of standard Serbian 2.1

2 resources

The model for lemmatisation of standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) combined with the Serbian non-standard training corpus ReLDI-NormTagNER-sr (http://hdl.handle.net/11356/1794) and using the srLex inflectional lexicon (http://hdl.handle.net/11356/1233). The estimated F1 of the lemma annotations is ~98.02. The difference to the previous version is that this version was trained on a combination of the standard (SETimes.SR) and non-standard (ReLDI-NormTagNER-sr) Serbian training corpora.

Use "The CLASSLA-Stanza model for lemmatisation of standard Serbian 2.1"

The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian 1.1

2 resources

The model for lemmatisation of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) and using the srLex inflectional lexicon (http://hdl.handle.net/11356/1233). The estimated F1 of the lemma annotations is ~97.9. The difference to the previous version of the model is that it is trained with the lemmatiser padding bug removed, cf. https://github.com/stanfordnlp/stanfordnlp/issues/143.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian 1.1"

The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.1

2 resources

The model for lemmatisation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). The estimated F1 of the lemma annotations is ~99.0. The difference to the previous version of the model is that it is trained with the lemmatiser padding bug removed, cf. https://github.com/stanfordnlp/stanfordnlp/issues/143.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.1"

The CLASSLA-StanfordNLP model for lemmatisation of standard Croatian

2 resources

The model for lemmatisation of standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1183) and using the hrLex inflectional lexicon (http://hdl.handle.net/11356/1232). The estimated F1 of the lemma annotations is ~97.6.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Croatian"

The CLASSLA-StanfordNLP model for lemmatisation of standard Bulgarian 1.1

2 resources

The model for lemmatisation of standard Bulgarian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the BulTreeBank training corpus (http://hdl.handle.net/11495/D93F-C6E9-65D9-2) and using the Bulgarian inflectional lexicon (Popov, Simov, and Vidinska 1998). The estimated F1 of the lemma annotations is ~98.8. The difference to the previous version of the lemmatizer is that now it relies solely on XPOS annotations, and not on a combination of UPOS, FEATS (lexicon lookup) and XPOS (lemma prediction) annotations.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Bulgarian 1.1"

The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.3

2 resources

The model for lemmatisation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). The estimated F1 of the lemma annotations is ~99.7. The difference to the previous version is that the internal lexicon is built on the lexicon training data only, and not on the (automatically XPOS-annoteted) corpus data.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.3"

The CLASSLA-Stanza model for lemmatisation of standard Croatian 2.1

2 resources

The model for lemmatisation of standard Croatian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the hr500k training corpus (http://hdl.handle.net/11356/1792) and using the hrLex inflectional lexicon (http://hdl.handle.net/11356/1232). The estimated F1 of the lemma annotations is ~98.02. The difference to the previous version is that this version was trained on the new version of the hr500k corpus.

Use "The CLASSLA-Stanza model for lemmatisation of standard Croatian 2.1"

The CLASSLA-StanfordNLP model for lemmatisation of non-standard Slovenian 1.1

2 resources

The model for lemmatisation of non-standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and the Janes-Tag corpus (http://hdl.handle.net/11356/1238), using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~98.86. The difference to the previous version of the lemmatizer is that now it relies solely on XPOS annotations, and not on a combination of UPOS, FEATS (lexicon lookup) and XPOS (lemma prediction) annotations.

Use "The CLASSLA-StanfordNLP model for lemmatisation of non-standard Slovenian 1.1"

Result filters

Metadata provider

Language

Resource type

Tool task

Field of study

Availability

Organisation

Project

Keywords

Active filters:

Search results

The CLASSLA-StanfordNLP model for lemmatisation of standard Croatian 1.1

The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.2

The CLASSLA-Stanza model for lemmatisation of standard Serbian 2.1

The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian 1.1

The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.1

The CLASSLA-StanfordNLP model for lemmatisation of standard Croatian

The CLASSLA-StanfordNLP model for lemmatisation of standard Bulgarian 1.1

The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.3

The CLASSLA-Stanza model for lemmatisation of standard Croatian 2.1

The CLASSLA-StanfordNLP model for lemmatisation of non-standard Slovenian 1.1

Result filters

Metadata provider

Language

Resource type

Tool task

Field of study

Availability

Organisation

Project

Keywords

Active filters:

Search results

Session recording