CLARIN Tool Portal

CUBBITT Translation Models (en-cs) (v1.0)

3 resources

CUBBITT En-Cs translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2014 (BLEU): en->cs: 27.6 cs->en: 34.4 (Evaluated using multeval: https://github.com/jhclark/multeval)

Use "CUBBITT Translation Models (en-cs) (v1.0)"

EVALD 2.0 for Foreigners

3 resources

EVALD 2.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.

Use "EVALD 2.0 for Foreigners"

EVALD 4.0 for Foreigners – Evaluator of Discourse

3 resources

EVALD 4.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.

Use "EVALD 4.0 for Foreigners – Evaluator of Discourse"

NameTag

1 resources

NameTag is an open-source tool for named entity recognition (NER). NameTag identifies proper names in text and classifies them into predefined categories, such as names of persons, locations, organizations, etc.

LINDAT Translation

1 resources

The input file size is limited to 100kB. Translates from->to: Czech->English, Hindi, French, Russian, German English->Russsian, German, Czech, Hindi, French Russian->German, French, Czech, Hindi, English German->Russian, Hindi, Czech, English, French French->Russian, German, Czech, English, Hindi

UDPipe

1 resources

UDPipe is an trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files. UDPipe is language-agnostic and can be trained given only annotated data in CoNLL-U format. Trained models are provided for nearly all UD treebanks.

EduPo: Analysis and Generation of Czech Poetry, v0.5

2 resources

A suite of tools for analysis and generation of Czech poetry. This is a snapshot of the public Github repository at https://github.com/ufal/edupo -- the beta-version of the tool suite, released together with a scientific paper at the NLP4DH 2025 conference. Sada nástrojů pro analýzu a generování české poezie. Tato verze veřejného repozitáře na Githubu https://github.com/ufal/edupo je beta-verzí doprovázející vydání vědeckého článku na konferenci NLP4DH 2025.

Use "EduPo: Analysis and Generation of Czech Poetry, v0.5"

Generator of Czech lyrics according to structure

3 resources

Fine-tuned Czech TinyLlama model (https://huggingface.co/BUT-FIT/CSTinyLlama-1.2B) and Czech GPT2 small model (https://huggingface.co/lchaloupsky/czech-gpt2-oscar) to generate lyrics of song sections based on the provided syllable counts, keywords and rhyme scheme. The TinyLlama-based model yields better results, however, the GPT2-based model can run locally. Both models are discussed in a Bachelor Thesis: Generation of Czech Lyrics to Cover Songs.

Use "Generator of Czech lyrics according to structure"

Debiasing Algorithm through Model Adaptation

2 resources

Debiasing Algorithm through Model Adaptation (DAMA) is based on guarding stereotypical gender signals and model editing. DAMA is performed on specific modules prone to convey gender bias, as shown by causal tracing. Our novel method effectively reduces gender bias in LLaMA models in three diagnostic tests: generation, coreference (WinoBias), and stereotypical sentence likelihood (StereoSet). The method does not change the model’s architecture, parameter count, or inference cost. We have also shown that the model’s performance in language modeling and a diverse set of downstream tasks is almost unaffected. This package contains both the source codes and English, English-to-Czech, and English-to-German datasets.

Use "Debiasing Algorithm through Model Adaptation"

THEaiTRobot 2.0

4 resources

The THEaiTRobot 2.0 tool allows the user to interactively generate scripts for individual theatre play scenes. The previous version of the tool (http://hdl.handle.net/11234/1-3507) was based on GPT-2 XL generative language model, using the model without any fine-tuning, as we found that with a prompt formatted as a part of a theatre play script, the model usually generates continuation that retains the format. The current version also uses vanilla GPT-2 by default, but can also instead use a GPT-2 medium model fine-tuned on theatre play scripts (as well as film and TV series scripts). Apart from the basic "flat" generation using a theatrical starting prompt and the script model, the tool also features a second, hierarchical variant, where in the first step, a play synopsis is generated from its title using a synopsis model (GPT-2 medium fine-tuned on synopses of theatre plays, as well as film, TV series and book synopses). The synopsis is then used as input for the second stage, which uses the script model. The choice of models to use is done by setting the MODEL variable in start_server.sh and start_syn_server.sh THEaiTRobot 2.0 was used to generate the second THEaiTRE play, "Permeation/Prostoupení".

Use "THEaiTRobot 2.0"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Organisation

Project

Keywords

Active filters:

Search results

CUBBITT Translation Models (en-cs) (v1.0)

EVALD 2.0 for Foreigners

EVALD 4.0 for Foreigners – Evaluator of Discourse

NameTag

LINDAT Translation

UDPipe

EduPo: Analysis and Generation of Czech Poetry, v0.5

Generator of Czech lyrics according to structure

Debiasing Algorithm through Model Adaptation

THEaiTRobot 2.0

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Organisation

Project

Keywords

Active filters:

Search results

Session recording