CLARIN Tool Portal

DigiLing e-Learning Hub: e-Courses for Digital Linguistics

8 resources

The files represent exported e-learning resources created within the DigiLing project, www.digiling.eu. We have identified seven core subjects in Digital Linguistics and built seven corresponding courses: - Introduction to Text Processing and Analysis - Introduction to Python for Linguists - Computational Lexicology and Lexicography - Localization Tools and Workflows - Post-Editing Machine Translation - Mining and Managing Multilingual Terminology - Variability of Languages in Time and Space The data format is .mbz, a compressed archive compatible with any e-learning environment running Moodle.

Use "DigiLing e-Learning Hub: e-Courses for Digital Linguistics"

Universal Dependencies 2.0 Models for UDPipe (2017-08-01)

3 resources

Tokenizer, POS Tagger, Lemmatizer and Parser models for all 50 languages of Universal Depenencies 2.0 Treebanks, created solely using UD 2.0 data (http://hdl.handle.net/11234/1-1983). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/users-manual#universal_dependencies_20_models . To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe . In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.

Use "Universal Dependencies 2.0 Models for UDPipe (2017-08-01)"

Translation Models (en-de) (v1.0)

2 resources

En-De translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2020 (BLEU): en->de: 25.9 de->en: 33.4 (Evaluated using multeval: https://github.com/jhclark/multeval)

Use "Translation Models (en-de) (v1.0)"

Universal Dependencies 2.5 Models for UDPipe (2019-12-06)

97 resources

Tokenizer, POS Tagger, Lemmatizer and Parser models for 94 treebanks of 61 languages of Universal Depenencies 2.5 Treebanks, created solely using UD 2.5 data (http://hdl.handle.net/11234/1-3105). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_25_models . To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe . In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.

Use "Universal Dependencies 2.5 Models for UDPipe (2019-12-06)"

Universal Dependencies 2.4 Models for UDPipe (2019-05-31)

93 resources

Tokenizer, POS Tagger, Lemmatizer and Parser models for 90 treebanks of 60 languages of Universal Depenencies 2.4 Treebanks, created solely using UD 2.4 data (http://hdl.handle.net/11234/1-2988). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_24_models . To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe . In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.

Use "Universal Dependencies 2.4 Models for UDPipe (2019-05-31)"

WiKNN Text Classifier

2 resources

WiKNN is an online text classifier service for Polish and English texts. It supports hierarchical labelled classification of user-submitted texts with Wikipedia categories. WiKNN is available through a web-based interface (http://pelcra.clarin-pl.eu/tools/classifier/) and as a REST service with interactive documentation available at http://clarin.pelcra.pl/apidocs/wiknn.

Use "WiKNN Text Classifier"

CorPipe 24 Multilingual CorefUD 1.2 Model (corpipe24-corefud1.2-240906)

2 resources

The `corpipe24-corefud1.2-240906` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 24 (https://github.com/ufal/crac2024-corpipe). It is released under the CC BY-NC-SA 4.0 license. The model is language agnostic (no corpus id on input), so it can be in theory used to predict coreference in any `mT5` language. This model jointly predicts also the empty nodes needed for zero coreference. The paper introducing this model also presents an alternative two-stage approach first predicting empty nodes (via https://www.kaggle.com/models/ufal-mff/crac2024_zero_nodes_baseline/) and then performing coreference resolution (via http://hdl.handle.net/11234/1-5673), which is circa twice as slow but slightly better.

Use "CorPipe 24 Multilingual CorefUD 1.2 Model (corpipe24-corefud1.2-240906)"

LiStr: Linguistic Structure Induction Tookit

2 resources

This toolkit comprises the tools and supporting scripts for unsupervised induction of dependency trees from raw texts or texts with already assigned part-of-speech tags. There are also scripts for simple machine translation based on unsupervised parsing and scripts for minimally supervised parsing into Universal-Dependencies style.

Use "LiStr: Linguistic Structure Induction Tookit"

UDify Pretrained Model

3 resources

Pretrained model weights for the UDify model, and extracted BERT weights in pytorch-transformers format. Note that these weights slightly differ from those used in the paper.

Use "UDify Pretrained Model"

TMODS:ENG-CZE -- query translation

3 resources

AMALACH project component TMODS:ENG-CZE; machine translation of queries from Czech to English. This archive contains models for the Moses decoder (binarized, pruned to allow for real-time translation) and configuration files for the MTMonkey toolkit. The aim of this package is to provide a full service for Czech->English translation which can be easily utilized as a component in a larger software solution. (The required tools are freely available and an installation guide is included in the package.) The translation models were trained on CzEng 1.0 corpus and Europarl. Monolingual data for LM estimation additionally contains WMT news crawls until 2013.

Use "TMODS:ENG-CZE -- query translation"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Organisation

Project

Keywords

Active filters:

Search results

DigiLing e-Learning Hub: e-Courses for Digital Linguistics

Universal Dependencies 2.0 Models for UDPipe (2017-08-01)

Translation Models (en-de) (v1.0)

Universal Dependencies 2.5 Models for UDPipe (2019-12-06)

Universal Dependencies 2.4 Models for UDPipe (2019-05-31)

WiKNN Text Classifier

CorPipe 24 Multilingual CorefUD 1.2 Model (corpipe24-corefud1.2-240906)

LiStr: Linguistic Structure Induction Tookit

UDify Pretrained Model

TMODS:ENG-CZE -- query translation

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Organisation

Project

Keywords

Active filters:

Search results

Session recording