Result filters

Metadata provider

Language

Resource type

Availability

Loading...
698 record(s) found

Search results

  • Miðeind's Neural Constituency Parser - v. 1.0

    The Miðeind neural constituency parser is an experimental variant of the Berkeley neural parser architecture. It is self-contained and conveniently plug-and-play via a docker image. Currently POS tags are not part of its constituency trees. The input to the parser is a full path to a text file (${INPUT_FILE}) where each line contains a sentence that will be parsed. No prior tokenization is required. The output file will be located in ${OUTPUT_DIR}/output.txt and the output format is line-separated bracketed trees . To run the parser use the following: docker run --volume ${INPUT_FILE}:/data/input.txt --volume ${OUTPUT_DIR}:/data/ mideind/neural-parser:${TAG} The output follows the bracketed tree format described at https://www.ling.upenn.edu/~janabeck/tutorial.html --- Tauganetsþáttari Miðeindar er tilraunaafbrigði af Berkeley tauganetsþáttaranum. Þáttarinn skilar stofnliðatrjám án POS-marka (eins og er). Inntakið í þáttarann er full algjör slóð texta að skrá (${INPUT_FILE}) þar sem hver lína geymir eina málsgrein. Eftir keyrslu má finna úttakið í skránni ${OUTPUT_DIR}/output.txt þar sem úttakssniðið er tré á svigaformi með auðri línu á milli . Til að keyra þáttarann skal nota: docker run --volume ${INPUT_FILE}:/data/input.txt --volume ${OUTPUT_DIR}:/data/ mideind/neural-parser:${TAG} (edited)
  • WebStylo

    Web based, open stylometry system based on Multilevel Text Analysis. Runs cluto and stylo (R system) clusterisation methods. Based on Natural Language Processing Workflow engine (included in the distribution).
  • VIADAT-REPO+DEPOSIT

    VIADAT-REPO is an additional module to the lindat-dspace platform which allows for depositing data records in the field of oral history, including its specific metadata workflow; it has been created within the VIADAT project and as such will be a part of a "virtual assistant" for processing, annotation, enrichment and accessing of audio and video recordings. This package contains VIADAT-DEPOSIT module; bundled with VIADAT-REPO to ease the integration.
  • The CLASSLA-Stanza model for lemmatisation of standard Slovenian 2.0

    This model for lemmatisation of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The estimated F1 of the lemma annotations is ~99.11. The difference to the previous version of the model is that the model was trained using the SUK training corpus and uses new embeddings and the new version of the Slovene morphological lexicon Sloleks 3.0 (http://hdl.handle.net/11356/1745).
  • Liner2.6 model NER NKJP

    Liner2.6 NER NKJP model The package contains a pre-trained Liner2 (https://github.com/CLARIN-PL/Liner2) model for recognition named entities according to NKJP guidelines. The model was trained on the NKJP corpus (http://nkjp.pl/) and evaluated in the PolEval 2018 Task 2 (http://poleval.pl/tasks/). The model won third place with the following results: Exact — 0.778, Overlap — 0.818, Final — 0.810. References: * NKJP corpus in TEI format — http://clip.ipipan.waw.pl/NationalCorpusOfPolish?action=AttachFile&do=view&target=NKJP-PodkorpusMilionowy-1.2.tar.gz * PolEval 2018 Task 2 evaluation corpus — http://mozart.ipipan.waw.pl/~axw/poleval2018/
  • The CLASSLA-Stanza model for lemmatisation of non-standard Croatian 2.1

    The model for lemmatisation of non-standard Croatian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the hr500k training corpus (http://hdl.handle.net/11356/1792) and the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1793), using the hrLex inflectional lexicon (http://hdl.handle.net/11356/1232). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~94.23. The difference to the previous version of the model is that this version is trained on a combination of two corpora (hr500k, ReLDI-NormTagNER-hr).
  • The CLASSLA-Stanza model for semantic role labeling of standard Slovenian 2.0

    The model for semantic role labeling of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SUK training corpus (http://hdl.handle.net/11356/1747) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204) extended with the MaCoCu-sl Slovenian web corpus (http://hdl.handle.net/11356/1517). The estimated F1 of the semantic role annotations is ~76.24. The difference to the previous version of the model is that the model was trained using the SUK training corpus and the updated word embeddings.