CLARIN Tool Portal

WebRICE - An Open Source Web Reader (21.06)

2 resources

[ENGLISH] WebRICE (Web Reader ICE) is an open source web reader in development at Reykjavik University. We hope that Icelandic developers will add this free software to their websites to enable Icelandic audiences to listen to the web instead of reading it. For users, we also have the WebRICE browser extension (1).

Use "WebRICE - An Open Source Web Reader (21.06)"

UDPipe

2 resources

UDPipe is an trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files. UDPipe is language-agnostic and can be trained given only annotated data in CoNLL-U format. Trained models are provided for nearly all UD treebanks. UDPipe is available as a binary, as a library for C++, Python, Perl, Java, C#, and as a web service. UDPipe is a free software under Mozilla Public License 2.0 (http://www.mozilla.org/MPL/2.0/) and the linguistic models are free for non-commercial use and distributed under CC BY-NC-SA (http://creativecommons.org/licenses/by-nc-sa/4.0/) license, although for some models the original data used to create the model may impose additional licensing conditions. UDPipe is versioned using Semantic Versioning (http://semver.org/). UDPipe website http://ufal.mff.cuni.cz/udpipe contains download links of both the released packages and trained models, hosts documentation and offers online demo. UDPipe development repository http://github.com/ufal/udpipe is hosted on GitHub.

Use "UDPipe"

Debiasing Algorithm through Model Adaptation

2 resources

Debiasing Algorithm through Model Adaptation (DAMA) is based on guarding stereotypical gender signals and model editing. DAMA is performed on specific modules prone to convey gender bias, as shown by causal tracing. Our novel method effectively reduces gender bias in LLaMA models in three diagnostic tests: generation, coreference (WinoBias), and stereotypical sentence likelihood (StereoSet). The method does not change the model’s architecture, parameter count, or inference cost. We have also shown that the model’s performance in language modeling and a diverse set of downstream tasks is almost unaffected. This package contains both the source codes and English, English-to-Czech, and English-to-German datasets.

Use "Debiasing Algorithm through Model Adaptation"

Corpus extraction tool LIST 1.3

2 resources

The LIST corpus extraction tool is a Java program for extracting lists from text corpora on the levels of characters, word parts, words, and word sets. It supports VERT and TEI P5 XML formats and outputs .CSV files that can be imported into Microsoft Excel or similar statistical processing software. Version 1.3 adds support for the KOST 2.0 Slovene Learner Corpus (http://hdl.handle.net/11356/1887) in XML format. It also allows program execution using the command line (see 00README.txt for details), and uses a later version of Java (tested using JDK 21). In addition, Windows users no longer need to have Java installed on their computers to run the program.

Use "Corpus extraction tool LIST 1.3"

WordnetLoom 2.0

4 resources

WordneLoom 2.0 executable files for plWordnet 4.0. Source code available at https://github.com/CLARIN-PL/WordnetLoom WordnetLoom – is an wordnet editor application built for the needs of the construction of a the largest Polish wordnet called plWordNet. WordnetLoom provides two means of interaction: a form-based, implemented initially, and a visual, graph-based introduced recently. The visual, graph-based interactive presentation of the wordnet structure enables browsing and its direct editing on the structure of lexico-semantic relations and synsets. WordnetLooms works in a distributed environment, i.e. several linguists can work simulanuously from different sites on the same central database.

Use "WordnetLoom 2.0"

MT: Moses-SMT (1.0)

5 resources

Moses phrase-based statistical machine translation (Moses PBSMT) is a system which is used to develop and run machine translation models. It is distributed here as four packages: 1. Code from a github repository to train and run models. 2. Pretrained is-en system (Docker) 3. Pretrained en-is system (Docker) 4. Frontend to pre- and postprocess text for translation (Docker) The models here are not (exactly) the same as were used for human evaluation. These models have additionally been trained on open dictionaries to extend their vocabularies. Moses phrase-based statistical machine translation (Moses PBSMT) er kerfi til þess að þróa og keyra tölfræðilegar vélþýðingar. Hér er dreift fjórum pökkum: 1. Kóða af github geymslusvæði fyrir þjálfun og keyrslu á líkönum 2. Forþjálfuðu is-en vélþýðingarlíkani (Docker) 3. Forþjálfuðu en-is vélþýðingarlíkani (Docker) 4. Framenda til að for- og eftirvinna texta fyrir þýðingar (Docker) Líkönin sem eru sett hér eru ekki (nákvæmlega) þau sömu og voru notuð við mannlegt mat. Þessi líkön hafa aukalega verið þjálfuð á gögnum úr opnum orðabókum til þess að auka orðaforða.

Use "MT: Moses-SMT (1.0)"

ZRCola 2

2 resources

ZRCola is an input system designed mainly, although not exclusively, for linguistic use. It allows the user to combine basic letters with any diacritic marks and insert the resulting complex characters into the texts with ease. The system is comprised of an input program and a font, which can also be installed separately. The font is based on the Unicode standard and includes a vastly enlarged set of Latin, Cyrillic and other characters for Slavic writing systems in the Private Use Area.

Use "ZRCola 2"

GreynirT2T Serving - En--Is NMT Inference and Pre-trained Models (1.0)

3 resources

Code and models required to run the GreynirT2T Transformer NMT system for translation between English and Icelandic. Includes a Docker-Compose file that starts a REST web server making the translation models available to clients. Forrit og líkön til að keyra GreynirT2T Transformer vélþýðingarlíkön fyrir þýðingar á milli íslensku og ensku. Docker-Compose uppskrift keyrir upp REST vefþjón sem gerir líkönin aðgengileg netbiðlurum.

Use "GreynirT2T Serving - En--Is NMT Inference and Pre-trained Models (1.0)"

RÚV-DI Speaker Diarization (20.09)

2 resources

These are a set of speaker diarization recipes which depend on the speech toolkit Kaldi. There are two types of recipes here. First are recipes used for decoding unseen audio. The second type of recipes are for training diarization models on the Rúv-di data. This tool also lists the DER for the Rúv-di dataset on most of the recipes. All DERs within this tool have no unscored collars and include overlapping speech

Use "RÚV-DI Speaker Diarization (20.09)"

GreynirT2T - En--Is NMT with Tensor2Tensor (1.0)

2 resources

A program library for training English-Icelandic neural machine translation systems, built on top of Tensor2Tensor and Tensorflow. Supports training with or without back-translated data. Forritasafn til að þjálfa þýðingarlíkön sem þýða milli íslensku og ensku. Uppsetningin er byggð á Tensor2Tensor og Tensorflow. Safnið styður þjálfun með og án bakþýðingargagna.

Use "GreynirT2T - En--Is NMT with Tensor2Tensor (1.0)"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Organisation

Project

Keywords

Active filters:

Search results

WebRICE - An Open Source Web Reader (21.06)

UDPipe

Debiasing Algorithm through Model Adaptation

Corpus extraction tool LIST 1.3

WordnetLoom 2.0

MT: Moses-SMT (1.0)

ZRCola 2

GreynirT2T Serving - En--Is NMT Inference and Pre-trained Models (1.0)

RÚV-DI Speaker Diarization (20.09)

GreynirT2T - En--Is NMT with Tensor2Tensor (1.0)

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Organisation

Project

Keywords

Active filters:

Search results

Session recording