CLARIN Tool Portal

Czech image captioning, machine translation, and sentiment analysis (Neural Monkey models)

7 resources

This submission contains trained end-to-end models for the Neural Monkey toolkit for Czech and English, solving three NLP tasks: machine translation, image captioning, and sentiment analysis. The models are trained on standard datasets and achieve state-of-the-art or near state-of-the-art performance in the tasks. The models are described in the accompanying paper. The same models can also be invoked via the online demo: https://ufal.mff.cuni.cz/grants/lsd There are several separate ZIP archives here, each containing one model solving one of the tasks for one language. To use a model, you first need to install Neural Monkey: https://github.com/ufal/neuralmonkey To ensure correct functioning of the model, please use the exact version of Neural Monkey specified by the commit hash stored in the 'git_commit' file in the model directory. Each model directory contains a 'run.ini' Neural Monkey configuration file, to be used to run the model. See the Neural Monkey documentation to learn how to do that (you may need to update some paths to correspond to your filesystem organization). The 'experiment.ini' file, which was used to train the model, is also included. Then there are files containing the model itself, files containing the input and output vocabularies, etc. For the sentiment analyzers, you should tokenize your input data using the Moses tokenizer: https://pypi.org/project/mosestokenizer/ For the machine translation, you do not need to tokenize the data, as this is done by the model. For image captioning, you need to: - download a trained ResNet: http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz - clone the git repository with TensorFlow models: https://github.com/tensorflow/models - preprocess the input images with the Neural Monkey 'scripts/imagenet_features.py' script (https://github.com/ufal/neuralmonkey/blob/master/scripts/imagenet_features.py) -- you need to specify the path to ResNet and to the TensorFlow models to this script Feel free to contact the authors of this submission in case you run into problems!

Use "Czech image captioning, machine translation, and sentiment analysis (Neural Monkey models)"

CUBBITT Translation Models (en-fr) (v1.0)

3 resources

CUBBITT En-Fr translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2014 (BLEU): en->fr: 38.2 fr->en: 36.7 (Evaluated using multeval: https://github.com/jhclark/multeval)

Use "CUBBITT Translation Models (en-fr) (v1.0)"

CUBBITT Translation Models (en-cs) (v1.0)

3 resources

CUBBITT En-Cs translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2014 (BLEU): en->cs: 27.6 cs->en: 34.4 (Evaluated using multeval: https://github.com/jhclark/multeval)

Use "CUBBITT Translation Models (en-cs) (v1.0)"

Neural Machine Translation model for Slovene-English language pair RSDO-DS4-NMT 1.2.6

3 resources

This Neural Machine Translation model for Slovene-English language pair was trained following the NVIDIA NeMo NMT AAYN recipe (for details see the official NVIDIA NeMo NMT documentation, https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/machine_translation/machine_translation.html, and NVIDIA NeMo GitHub repository https://github.com/NVIDIA/NeMo). It provides functionality for translating text written in Slovene language to English and vice versa. The training corpus was built from publicly available datasets, including Parallel corpus EN-SL RSDO4 1.0 (https://www.clarin.si/repository/xmlui/handle/11356/1457), as well as a small portion of proprietary data. In total the training corpus consisted of 32.638.758 translation pairs and the validation corpus consisted of 8.163 translation pairs. The model was trained on 64GPUs and on the validation corpus reached a SacreBleu score of 48.3191 (at epoch 37) for translation from Slovene to English and a SacreBleu score of 53.8191 (at epoch 47) for translation from English to Slovene.

Use "Neural Machine Translation model for Slovene-English language pair RSDO-DS4-NMT 1.2.6"

GreynirSeq Domain Translation Pipeline (22.06)

3 resources

This is a pipeline for creating GreynirSeq domain-aware translation models. A valid checkpoint of a base translation model based on mBART25 can be finetuned as a domain translation model. The resulting model can be queried using a label for the requested domain. We recommend the English -- Icelandic translation models available in https://repository.clarin.is/repository/xmlui/handle/20.500.12537/125 . The included preprocess script expects a .tsv input file with the three fields (domains, english, icelandic), this is the training corpus. The script finetune.sh can be run to fine tune the model until convergence. Finally, one can run evaluate.sh to compute BLEU over the development set of Flores. See the README file for further details on setting up an environment and fetching data.

Use "GreynirSeq Domain Translation Pipeline (22.06)"

NeMo Neural Machine Translation service RSDO-DS4-NMT-API 1.0

2 resources

Neural Machine Translation service for NeMo AAYN Base models. For more details about building such models, see the official NVIDIA NeMo documentation (https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/machine_translation/machine_translation.html) and NVIDIA NeMo GitHub (https://github.com/NVIDIA/NeMo). A model for language pair SL-EN can be downloaded from http://hdl.handle.net/11356/1736. The service accepts the source language and target language, and either a single string or list of strings to be translated. The result will be in the same format as the request, either as a single string or list of strings. The maximal accepted text length is 5000c. Note that transcription of one 5000c text block on cpu will take advantage of all available cores, consume up to 3GB RAM and may take ~200s (on a system with 24 vCPU). See the service README.md for further details.

Use "NeMo Neural Machine Translation service RSDO-DS4-NMT-API 1.0"

Debiasing Algorithm through Model Adaptation

2 resources

Debiasing Algorithm through Model Adaptation (DAMA) is based on guarding stereotypical gender signals and model editing. DAMA is performed on specific modules prone to convey gender bias, as shown by causal tracing. Our novel method effectively reduces gender bias in LLaMA models in three diagnostic tests: generation, coreference (WinoBias), and stereotypical sentence likelihood (StereoSet). The method does not change the model’s architecture, parameter count, or inference cost. We have also shown that the model’s performance in language modeling and a diverse set of downstream tasks is almost unaffected. This package contains both the source codes and English, English-to-Czech, and English-to-German datasets.

Use "Debiasing Algorithm through Model Adaptation"

MT: Moses-SMT (1.0)

5 resources

Moses phrase-based statistical machine translation (Moses PBSMT) is a system which is used to develop and run machine translation models. It is distributed here as four packages: 1. Code from a github repository to train and run models. 2. Pretrained is-en system (Docker) 3. Pretrained en-is system (Docker) 4. Frontend to pre- and postprocess text for translation (Docker) The models here are not (exactly) the same as were used for human evaluation. These models have additionally been trained on open dictionaries to extend their vocabularies. Moses phrase-based statistical machine translation (Moses PBSMT) er kerfi til þess að þróa og keyra tölfræðilegar vélþýðingar. Hér er dreift fjórum pökkum: 1. Kóða af github geymslusvæði fyrir þjálfun og keyrslu á líkönum 2. Forþjálfuðu is-en vélþýðingarlíkani (Docker) 3. Forþjálfuðu en-is vélþýðingarlíkani (Docker) 4. Framenda til að for- og eftirvinna texta fyrir þýðingar (Docker) Líkönin sem eru sett hér eru ekki (nákvæmlega) þau sömu og voru notuð við mannlegt mat. Þessi líkön hafa aukalega verið þjálfuð á gögnum úr opnum orðabókum til þess að auka orðaforða.

Use "MT: Moses-SMT (1.0)"

GreynirT2T Serving - En--Is NMT Inference and Pre-trained Models (1.0)

3 resources

Code and models required to run the GreynirT2T Transformer NMT system for translation between English and Icelandic. Includes a Docker-Compose file that starts a REST web server making the translation models available to clients. Forrit og líkön til að keyra GreynirT2T Transformer vélþýðingarlíkön fyrir þýðingar á milli íslensku og ensku. Docker-Compose uppskrift keyrir upp REST vefþjón sem gerir líkönin aðgengileg netbiðlurum.

Use "GreynirT2T Serving - En--Is NMT Inference and Pre-trained Models (1.0)"

GreynirT2T - En--Is NMT with Tensor2Tensor (1.0)

2 resources

A program library for training English-Icelandic neural machine translation systems, built on top of Tensor2Tensor and Tensorflow. Supports training with or without back-translated data. Forritasafn til að þjálfa þýðingarlíkön sem þýða milli íslensku og ensku. Uppsetningin er byggð á Tensor2Tensor og Tensorflow. Safnið styður þjálfun með og án bakþýðingargagna.

Use "GreynirT2T - En--Is NMT with Tensor2Tensor (1.0)"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

Czech image captioning, machine translation, and sentiment analysis (Neural Monkey models)

CUBBITT Translation Models (en-fr) (v1.0)

CUBBITT Translation Models (en-cs) (v1.0)

Neural Machine Translation model for Slovene-English language pair RSDO-DS4-NMT 1.2.6

GreynirSeq Domain Translation Pipeline (22.06)

NeMo Neural Machine Translation service RSDO-DS4-NMT-API 1.0

Debiasing Algorithm through Model Adaptation

MT: Moses-SMT (1.0)

GreynirT2T Serving - En--Is NMT Inference and Pre-trained Models (1.0)

GreynirT2T - En--Is NMT with Tensor2Tensor (1.0)

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

Session recording