Marian multilingual translation model from Catalan into Romanian, Italian and Occitan. Primary CUNI submission for WMT21 Multilingual
Low-Resource Translation for Indo-European Languages Shared Task.
This submission contains trained end-to-end models for the Neural Monkey toolkit for Czech and English, solving three NLP tasks: machine translation, image captioning, and sentiment analysis.
The models are trained on standard datasets and achieve state-of-the-art or near state-of-the-art performance in the tasks.
The models are described in the accompanying paper.
The same models can also be invoked via the online demo: https://ufal.mff.cuni.cz/grants/lsd
There are several separate ZIP archives here, each containing one model solving one of the tasks for one language.
To use a model, you first need to install Neural Monkey: https://github.com/ufal/neuralmonkey
To ensure correct functioning of the model, please use the exact version of Neural Monkey specified by the commit hash stored in the 'git_commit' file in the model directory.
Each model directory contains a 'run.ini' Neural Monkey configuration file, to be used to run the model. See the Neural Monkey documentation to learn how to do that (you may need to update some paths to correspond to your filesystem organization).
The 'experiment.ini' file, which was used to train the model, is also included.
Then there are files containing the model itself, files containing the input and output vocabularies, etc.
For the sentiment analyzers, you should tokenize your input data using the Moses tokenizer: https://pypi.org/project/mosestokenizer/
For the machine translation, you do not need to tokenize the data, as this is done by the model.
For image captioning, you need to:
- download a trained ResNet: http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz
- clone the git repository with TensorFlow models: https://github.com/tensorflow/models
- preprocess the input images with the Neural Monkey 'scripts/imagenet_features.py' script (https://github.com/ufal/neuralmonkey/blob/master/scripts/imagenet_features.py) -- you need to specify the path to ResNet and to the TensorFlow models to this script
Feel free to contact the authors of this submission in case you run into problems!
The latinpipe-evalatin24-240520 is a PhilBerta-based model for LatinPipe 2024 <https://github.com/ufal/evalatin2024-latinpipe>, performing tagging, lemmatization, and dependency parsing of Latin, based on the winning entry to the EvaLatin 2024 <https://circse.github.io/LT4HALA/2024/EvaLatin> shared task. It is released under the CC BY-NC-SA 4.0 license.
EVALD 2.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.
Marian NMT model for Catalan to Occitan translation. Primary CUNI submission for WMT21 Multilingual
Low-Resource Translation for Indo-European Languages Shared Task.
CUBBITT En-Fr translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/).
Models are compatible with Tensor2tensor version 1.6.6.
For details about the model training (data, model hyper-parameters), please contact the archive maintainer.
Evaluation on newstest2014 (BLEU):
en->fr: 38.2
fr->en: 36.7
(Evaluated using multeval: https://github.com/jhclark/multeval)
Tokenizer, POS Tagger, Lemmatizer and Parser models for 123 treebanks of 69 languages of Universal Depenencies 2.10 Treebanks, created solely using UD 2.10 data (https://hdl.handle.net/11234/1-4758). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_210_models .
To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .
The PELCRA for NKJP search engine 2 provides access to the full National Corpus of Polish dataset (over 1.5 billion word tokens). In addition to linguistically motivated corpus queries, it supports a number of data exploration and visualisation features. Most of the functionality of the search engine is available through a REST web service. Access to the API is available upon request.
The version of the Tool Portal that you are currently using
is recording the behaviour of its user for testing purposes.
By pressing "Continue" below, you agree to the recording of your
actions while using this site. If you do not wish to agree to this,
please navigate away from this site.