CLARIN Tool Portal

698 record(s) found

Search results

Miðeind's Neural Constituency Parser - v. 1.0

2 resources

The Miðeind neural constituency parser is an experimental variant of the Berkeley neural parser architecture. It is self-contained and conveniently plug-and-play via a docker image. Currently POS tags are not part of its constituency trees. The input to the parser is a full path to a text file (${INPUT_FILE}) where each line contains a sentence that will be parsed. No prior tokenization is required. The output file will be located in ${OUTPUT_DIR}/output.txt and the output format is line-separated bracketed trees . To run the parser use the following: docker run --volume ${INPUT_FILE}:/data/input.txt --volume ${OUTPUT_DIR}:/data/ mideind/neural-parser:${TAG} The output follows the bracketed tree format described at https://www.ling.upenn.edu/~janabeck/tutorial.html --- Tauganetsþáttari Miðeindar er tilraunaafbrigði af Berkeley tauganetsþáttaranum. Þáttarinn skilar stofnliðatrjám án POS-marka (eins og er). Inntakið í þáttarann er full algjör slóð texta að skrá (${INPUT_FILE}) þar sem hver lína geymir eina málsgrein. Eftir keyrslu má finna úttakið í skránni ${OUTPUT_DIR}/output.txt þar sem úttakssniðið er tré á svigaformi með auðri línu á milli . Til að keyra þáttarann skal nota: docker run --volume ${INPUT_FILE}:/data/input.txt --volume ${OUTPUT_DIR}:/data/ mideind/neural-parser:${TAG} (edited)

Use "Miðeind's Neural Constituency Parser - v. 1.0"
WebStylo

2 resources

Web based, open stylometry system based on Multilevel Text Analysis. Runs cluto and stylo (R system) clusterisation methods. Based on Natural Language Processing Workflow engine (included in the distribution).

Use "WebStylo"
VIADAT-REPO+DEPOSIT

2 resources

VIADAT-REPO is an additional module to the lindat-dspace platform which allows for depositing data records in the field of oral history, including its specific metadata workflow; it has been created within the VIADAT project and as such will be a part of a "virtual assistant" for processing, annotation, enrichment and accessing of audio and video recordings. This package contains VIADAT-DEPOSIT module; bundled with VIADAT-REPO to ease the integration.

Use "VIADAT-REPO+DEPOSIT"
The CLASSLA-Stanza model for lemmatisation of standard Slovenian 2.0

2 resources

This model for lemmatisation of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The estimated F1 of the lemma annotations is ~99.11. The difference to the previous version of the model is that the model was trained using the SUK training corpus and uses new embeddings and the new version of the Slovene morphological lexicon Sloleks 3.0 (http://hdl.handle.net/11356/1745).

Use "The CLASSLA-Stanza model for lemmatisation of standard Slovenian 2.0"
Toposław 2 (2016-05-31)

3 resources

Toposław 2 is an editor of multi-world unit inflection lexicons.

Use "Toposław 2 (2016-05-31)"
ENIAM

4 resources

ENIAM: Categorial Syntactic-Semantic Parser for Polish

Use "ENIAM"
Liner2.6 model NER NKJP

3 resources

Liner2.6 NER NKJP model The package contains a pre-trained Liner2 (https://github.com/CLARIN-PL/Liner2) model for recognition named entities according to NKJP guidelines. The model was trained on the NKJP corpus (http://nkjp.pl/) and evaluated in the PolEval 2018 Task 2 (http://poleval.pl/tasks/). The model won third place with the following results: Exact — 0.778, Overlap — 0.818, Final — 0.810. References: * NKJP corpus in TEI format — http://clip.ipipan.waw.pl/NationalCorpusOfPolish?action=AttachFile&do=view&target=NKJP-PodkorpusMilionowy-1.2.tar.gz * PolEval 2018 Task 2 evaluation corpus — http://mozart.ipipan.waw.pl/~axw/poleval2018/

Use "Liner2.6 model NER NKJP"
The CLASSLA-Stanza model for lemmatisation of non-standard Croatian 2.1

2 resources

The model for lemmatisation of non-standard Croatian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the hr500k training corpus (http://hdl.handle.net/11356/1792) and the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1793), using the hrLex inflectional lexicon (http://hdl.handle.net/11356/1232). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~94.23. The difference to the previous version of the model is that this version is trained on a combination of two corpora (hr500k, ReLDI-NormTagNER-hr).

Use "The CLASSLA-Stanza model for lemmatisation of non-standard Croatian 2.1"
EVALD 3.0 – Evaluator of Discourse

3 resources

EVALD 3.0 serves for automatic evaluation of surface coherence (cohesion) in Czech texts written by native speakers of Czech.

Use "EVALD 3.0 – Evaluator of Discourse"
The CLASSLA-Stanza model for semantic role labeling of standard Slovenian 2.0

2 resources

The model for semantic role labeling of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) extended with the MaCoCu-sl Slovenian web corpus (http://hdl.handle.net/11356/1517). The estimated F1 of the semantic role annotations is ~76.24. The difference to the previous version of the model is that the model was trained using the SUK training corpus and the updated word embeddings.

Use "The CLASSLA-Stanza model for semantic role labeling of standard Slovenian 2.0"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

Miðeind's Neural Constituency Parser - v. 1.0

WebStylo

VIADAT-REPO+DEPOSIT

The CLASSLA-Stanza model for lemmatisation of standard Slovenian 2.0

Toposław 2 (2016-05-31)

ENIAM

Liner2.6 model NER NKJP

The CLASSLA-Stanza model for lemmatisation of non-standard Croatian 2.1

EVALD 3.0 – Evaluator of Discourse

The CLASSLA-Stanza model for semantic role labeling of standard Slovenian 2.0

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

Session recording