CLARIN Tool Portal

698 record(s) found

Search results

The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Bulgarian 1.0

3 resources

This model for morphosyntactic annotation of standard Bulgarian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the BulTreeBank training corpus (http://hdl.handle.net/11495/D93F-C6E9-65D9-2) and using the CoNLL2017 word embeddings (http://hdl.handle.net/11234/1-1989). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~96.8.

Use "The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Bulgarian 1.0"
HaskEN

2 resources

HaskEN is an English phraseological database designed for language professionals including linguists, language teachers, lexicographers, language materials developers and translators. Query results can be visualised and exported as spreadsheets.

Use "HaskEN"
Poliqarp2

6 resources

Poliqarp2 is a linguistic search engine, capable of searching through large corpora annotated on multiple levels. It is not an upgraded version of Poliqarp, it is a completely new software developed from scratch.

Use "Poliqarp2"
The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian 1.1

3 resources

This model for morphosyntactic annotation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~97.06. The difference to the previous version of the model is that now the whole XPOS tag is predicted and not specific characters, as was the case in stanfordnlp, which resulted in illegal XPOS tags (and slightly decreased performance).

Use "The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian 1.1"
ENIAMtoolkit (2017-03-06)

2 resources

ENIAMtoolkit is a collection of libraries that: - perform tokenization, lemmatization, part of speech tagging; - detect MWE and abbreviations; - split text into sentences; - LCG parsing.

Use "ENIAMtoolkit (2017-03-06)"
NELexicon2

2 resources

NELexicon2 to rozszerzona wersją gazetteera nazw własnych, która zawiera ponad 2,3 miliona unikalnych napisów. NELexicon został wzmogacony o następujące zasoby: - zdrobnienia imion, - obcojęzyczne formy polskich imion, - nazwy wyciągnięte z infoboxów polskiej Wikipedii, - formy odmiany nazw z infoboxów polskiej Wikipedii wyciągnięte z linków wewnętrznych Wikipedii, - lista nazw rozpoznanych przez Liner2 z modelem 56 nam o liczbie wystąpień równej lub większej niż 5. Jako, że nazwy zostały rozpoznane automatycznie, to lista może zawierać błędnie rozpoznane nazwy. - formy odmiany nazw wyciągnięte z polskiego Wikisłownika.

Use "NELexicon2"
Toposław

2 resources

Toposław is an editor of multi-word unit inflection lexicons.

Use "Toposław"
The CLASSLA-StanfordNLP model for lemmatisation of non-standard Serbian 1.1

2 resources

The model for lemmatisation of non-standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200), the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241), the hr500k training corpus (http://hdl.handle.net/11356/1183) and the RAPUT corpus (https://www.aclweb.org/anthology/L16-1513/), using the srLex inflectional lexicon (http://hdl.handle.net/11356/1233). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~97.62. The difference to the previous version of the lemmatizer is that now it relies solely on XPOS annotations, and not on a combination of UPOS, FEATS (lexicon lookup) and XPOS (lemma prediction) annotations.

Use "The CLASSLA-StanfordNLP model for lemmatisation of non-standard Serbian 1.1"
The CLASSLA-Stanza model for morphosyntactic annotation of standard Macedonian 2.1

3 resources

This model for morphosyntactic annotation of standard Macedonian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the 1984 training corpus expanded with the Macedonian SETimes corpus (to be published) and using the Macedonian CLARIN.SI word embeddings (http://hdl.handle.net/11356/1788). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~97.14. The difference from the previous version is that this version was trained using a larger training dataset and the new version of the Macedonian word embeddings.

Use "The CLASSLA-Stanza model for morphosyntactic annotation of standard Macedonian 2.1"
MUSCIMarker

2 resources

MUSCIMarker is an open-source tool for annotating visual objects and their relationships in binary images. It is implemented in Python, known to run on Windows, Linux and OS X, and supports working offline. MUSCIMarker is being used for creating a dataset of musical notation symbols, but can support any object set. The user documentation online is currently (12.2016) incomplete, as it is continually changing to reflect annotators' comments and incorporate new features. This version of the software is *not* the final one, and it is under continuous development (we're currently working on adding grayscale image support with auto-binarization, and Android support for touch-based annotation). However, the current version (1.1) has already been used to annotate more than 100 pages of sheet music, over all the major desktop OSes, and I believe it is already in a state where it can be useful beyond my immediate music notation data gathering use case.

Use "MUSCIMarker"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Bulgarian 1.0

HaskEN

Poliqarp2

The CLASSLA-StanfordNLP model for morphosyntactic annotation of standard Slovenian 1.1

ENIAMtoolkit (2017-03-06)

NELexicon2

Toposław

The CLASSLA-StanfordNLP model for lemmatisation of non-standard Serbian 1.1

The CLASSLA-Stanza model for morphosyntactic annotation of standard Macedonian 2.1

MUSCIMarker

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

Session recording