CLARIN Tool Portal

698 record(s) found

Search results

VIADAT

2 resources

This component integrates other VIADAT modules; together with VIADAT-REPO this composes the Virtual Assistant for accessing historical audiovisual data. The zip archive contains sources for the following modules: VIADAT, VIADAT-DEPOSIT, VIADAT-TEXT, VIADAT-ANNOTATE, VIADAT-ANALYZE, VIADAT-STAT, VIADAT-GIS and VIADAT-SEARCH. Developed in cooperation with ÚSD AV ČR and NFA.

Use "VIADAT"
CorPipe 23 multilingual CorefUD 1.2 model (corpipe23-corefud1.2-240906)

2 resources

The `corpipe23-corefud1.2-240906` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 23 <https://github.com/ufal/crac2023-corpipe>. It is released under the CC BY-NC-SA 4.0 license. The model is language agnostic (no corpus id on input), so it can be in theory used to predict coreference in any `mT5` language. However, the model expects empty nodes to be already present on input, predicted by the https://www.kaggle.com/models/ufal-mff/crac2024_zero_nodes_baseline/. This model was present in the CorPipe 24 paper as an alternative to a single-stage approach, where the empty nodes are predicted joinly with coreference resolution (via http://hdl.handle.net/11234/1-5672), an approach circa twice as fast but of slightly worse quality.

Use "CorPipe 23 multilingual CorefUD 1.2 model (corpipe23-corefud1.2-240906)"
The CLASSLA-StanfordNLP model for named entity recognition of standard Bulgarian 1.0

3 resources

This model for named entity recognition of standard Bulgarian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the BulTreeBank training corpus (http://hdl.handle.net/11495/D93F-C6E9-65D9-2) and using the CoNLL2017 word embeddings (http://hdl.handle.net/11234/1-1989).

Use "The CLASSLA-StanfordNLP model for named entity recognition of standard Bulgarian 1.0"
Universal Dependencies 1.2 Models for UDPipe

2 resources

Tokenizer, POS Tagger, Lemmatizer and Parser models for all Universal Depenencies 1.2 Treebanks, created solely using UD 1.2 data (http://hdl.handle.net/11234/1-1548). To use these models, you need UDPipe binary, which you can download from http://ufal.mff.cuni.cz/udpipe.

Use "Universal Dependencies 1.2 Models for UDPipe"
EVALD 4.0 for Beginners – Evaluator of Discourse

3 resources

EVALD 4.0 for Beginners is a software that serves for automatic evaluation of Czech texts written by non-native speakers of Czech – language beginners.

Use "EVALD 4.0 for Beginners – Evaluator of Discourse"
Slovenian text summarization models

6 resources

A text summarisation task aims to convert a longer text into a shorter text while preserving the essential information of the source text. In general, there are two approaches to text summarization. The extractive approach simply rewrites the most important sentences or parts of the text, whereas the abstractive approach is more similar to human-made summaries. We release 5 models that cover extractive, abstractive, and hybrid types: Metamodel: a neural model based on the Doc2Vec document representation that suggests the best summariser. Graph-based model: unsupervised graph-based extractive approach that returns the N most relevant sentences. Headline model: a supervised abstractive approach (T5 architecture) that returns returns headline-like abstracts. Article model: a supervised abstract approach (T5 architecture) that returns short summaries. Hybrid-long model: unsupervised hybrid (graph-based and transformer model-based) approach that returns short summaries of long texts. Details and instructions to run and train the models are available at https://github.com/clarinsi/SloSummarizer. The web service with a demo is available at https://slovenscina.eu/povzemanje.

Use "Slovenian text summarization models"
Piper TTS (VITS) models for Talrómur1

6 resources

Trained models for four voices from the Talrómur [1] corpus trained with VITS [2] and exported to the onnxruntime [3] for Piper TTS [4]. The four voices are Búi, Salka, Steinn and Ugla. Módel fyrir fjórar raddir úr Talrómi [1]. Raddirnar eru þjálfaðar með VITS [2] og varpað í onnxruntime [3] skrá fyrir Piper TTS [4] verkefnið. Raddirnar fjórar eru Búi, Salka, Steinn og Ugla. [1] http://hdl.handle.net/20.500.12537/104 [2] https://github.com/jaywalnut310/vits/ [3] https://onnxruntime.ai/ [4] https://github.com/rhasspy/piper

Use "Piper TTS (VITS) models for Talrómur1"
Word embeddings CLARIN.SI-embed.mk 2.0

3 resources

CLARIN.SI-embed.mk contains word embeddings induced from a large collection of Macedonian texts crawled from the .mk top-level domain. The embeddings are based on the skip-gram model of fastText trained on 933,231,582 tokens of running text for 986,670 lowercased surface forms. The difference to the previous version of the embeddings is that this version was trained on the original dataset expanded with the MaCoCu-mk web crawl corpus (http://hdl.handle.net/11356/1512).

Use "Word embeddings CLARIN.SI-embed.mk 2.0"
CroSloEngual BERT 1.1

4 resources

Trilingual BERT (Bidirectional Encoder Representations from Transformers) model, trained on Croatian, Slovenian, and English data. State of the art tool representing words/tokens as contextually dependent word embeddings, used for various NLP classification tasks by finetuning the model end-to-end. CroSloEngual BERT are neural network weights and configuration files in pytorch format (i.e. to be used with pytorch library). Changes in version 1.1: fixed vocab.txt file, as previous verson had an error causing very bad results during fine-tuning and/or evaluation.

Use "CroSloEngual BERT 1.1"
Plumper

1 resources

Ontology mapper. Mapping plWordNet onto SUMO ontology.

Use "Plumper"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

VIADAT

CorPipe 23 multilingual CorefUD 1.2 model (corpipe23-corefud1.2-240906)

The CLASSLA-StanfordNLP model for named entity recognition of standard Bulgarian 1.0

Universal Dependencies 1.2 Models for UDPipe

EVALD 4.0 for Beginners – Evaluator of Discourse

Slovenian text summarization models

Piper TTS (VITS) models for Talrómur1

Word embeddings CLARIN.SI-embed.mk 2.0

CroSloEngual BERT 1.1

Plumper

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

Session recording