Result filters

Metadata provider

Resource type

Tool task

  • Machine translation

Availability

Active filters:

  • Tool task: Machine translation
Loading...
33 record(s) found

Search results

  • EdUKate Czech-Ukrainian translation model 2024

    This package includes Czech-to-Ukrainian translation model adapted for the educational domain. The model is exported into the TensorFlow Serving format (using Tensor2tensor version 1.6.6), so it can be used in the Charles Translator service (https://translator.cuni.cz) and in the web portal Škola s nadhledem. This model was developed within the EdUKate project, which aims to help mitigate language barriers between non-Czech-speaking children in the Czech Republic and the education in the Czech school system. The project focuses on the development and dissemination of multilingual digital learning materials for students in primary and secondary schools.
  • Semi-supervised Icelandic-Polish Translation System (22.09)

    This Icelandic-Polish translation model (bi-directional) was trained using fairseq (https://github.com/facebookresearch/fairseq) by means of semi-supervised translation by starting with the mBART50 model. The model was then trained using a multi-task curriculum to first learn to denoise sentences. Then the model was trained to translate using aligned parallel texts. Finally the model was provided with monolingual texts in both Icelandic and Polish with which it iteratively creates back-translations. For the PL-IS direction the model achieves a BLEU score of 27.60 on held out true parallel training data and 15.30 on the out-of-domain Flores devset. For the IS-PL direction the model achieves a score of 27.70 on the true data and 13.30 on the Flores devset. -- Þetta íslensk-pólska þýðingarlíkan (tvíátta) var þjálfað með fairseq (https://github.com/facebookresearch/fairseq) með hálf-sjálfvirkum aðferðum frá mBART50 líkaninu. Líkanið var þjálfað á þremur verkefnum, afruglun, samhliða þýðingum og bakþýðingum sem voru myndaðar á þjálfunartíma. Fyrir PL-IS áttina fæst BLEU skor 27.60 á raun gögnum sem voru tekin til hliðar og 15.30 á Flores þróunargögnunum. Fyrir IS-PL áttina fæst skor 27.70 á raun gögnunum og 13.30 á Flores þróunargögnunum.
  • Translation Models (en-de) (v1.0)

    En-De translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2020 (BLEU): en->de: 25.9 de->en: 33.4 (Evaluated using multeval: https://github.com/jhclark/multeval)
  • TMODS:ENG-CZE -- query translation

    AMALACH project component TMODS:ENG-CZE; machine translation of queries from Czech to English. This archive contains models for the Moses decoder (binarized, pruned to allow for real-time translation) and configuration files for the MTMonkey toolkit. The aim of this package is to provide a full service for Czech->English translation which can be easily utilized as a component in a larger software solution. (The required tools are freely available and an installation guide is included in the package.) The translation models were trained on CzEng 1.0 corpus and Europarl. Monolingual data for LM estimation additionally contains WMT news crawls until 2013.
  • GreynirTranslate - mBART25 NMT models for Translations between Icelandic and English (1.0)

    Provided are a general domain IS-EN and EN-IS translation models developed by Miðeind ehf. They are based on a multilingual BART model (https://arxiv.org/pdf/2001.08210.pdf) and finetuned for translation on parallel and backtranslated data. The model is trained using the Fairseq sequence modeling toolkit by PyTorch. Provided here are a model files, sentencepiece subword-tokenizing model and dictionary files for running the model locally. You can run the scripts infer-enis.sh and infer-isen.sh to test the model by translating sentences command-line. For translating documents and evaluating results you will need to binarize the data using fairseq-preprocess and use fairseq-generate for translating. Please refer to the Fairseq documentation for further information on running a pre-trained model: https://fairseq.readthedocs.io/en/latest/ - Pakkinn inniheldur almenn þýðingarlíkön fyrir áttirnar IS-EN og EN-IS þróuð af Miðeind ehf. Þau eru byggð á margmála BART líkani (https://arxiv.org/pdf/2001.08210.pdf) og fínþjálfuð fyrir þýðingar. Líkönin eru þjálfað með Fairseq og PyTorch. Líkönin sjálf og ásamt sentencepiece tilreiðingarlíkani eru gerð aðgengileg. Skripturnar infer-enis.sh og infer-isen.sh gefa dæmi um hvernig er hægt að keyra líkönin á skipanalínu. Til að þýða stór skjöl og meta niðurstöður þarf að nota fairseq-preprocess skipunina ásamt fairseq-generate. Frekari upplýsingar er að finna í Fairseq leiðbeiningunum: https://fairseq.readthedocs.io/en/latest/
  • Czech image captioning, machine translation, sentiment analysis and summarization (Neural Monkey models)

    This submission contains trained end-to-end models for the Neural Monkey toolkit for Czech and English, solving four NLP tasks: machine translation, image captioning, sentiment analysis, and summarization. The models are trained on standard datasets and achieve state-of-the-art or near state-of-the-art performance in the tasks. The models are described in the accompanying paper. The same models can also be invoked via the online demo: https://ufal.mff.cuni.cz/grants/lsd In addition to the models presented in the referenced paper (developed and published in 2018), we include models for automatic news summarization for Czech and English developed in 2019. The Czech models were trained using the SumeCzech dataset (https://www.aclweb.org/anthology/L18-1551.pdf), the English models were trained using the CNN-Daily Mail corpus (https://arxiv.org/pdf/1704.04368.pdf) using the standard recurrent sequence-to-sequence architecture. There are several separate ZIP archives here, each containing one model solving one of the tasks for one language. To use a model, you first need to install Neural Monkey: https://github.com/ufal/neuralmonkey To ensure correct functioning of the model, please use the exact version of Neural Monkey specified by the commit hash stored in the 'git_commit' file in the model directory. Each model directory contains a 'run.ini' Neural Monkey configuration file, to be used to run the model. See the Neural Monkey documentation to learn how to do that (you may need to update some paths to correspond to your filesystem organization). The 'experiment.ini' file, which was used to train the model, is also included. Then there are files containing the model itself, files containing the input and output vocabularies, etc. For the sentiment analyzers, you should tokenize your input data using the Moses tokenizer: https://pypi.org/project/mosestokenizer/ For the machine translation, you do not need to tokenize the data, as this is done by the model. For image captioning, you need to: - download a trained ResNet: http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz - clone the git repository with TensorFlow models: https://github.com/tensorflow/models - preprocess the input images with the Neural Monkey 'scripts/imagenet_features.py' script (https://github.com/ufal/neuralmonkey/blob/master/scripts/imagenet_features.py) -- you need to specify the path to ResNet and to the TensorFlow models to this script The summarization models require input that is tokenized with Moses Tokenizer (https://github.com/alvations/sacremoses) and lower-cased. Feel free to contact the authors of this submission in case you run into problems!