CLARIN Tool Portal

GlowTTS models for Talrómur1 (22.10)

6 resources

This release contains GlowTTS models for four different voices from the Talrómur 1 [1] corpus. The models were trained using the Coqui TTS library after it was adapted for Icelandic. Included is the model, model configuration, log file for the training and the recipe used for each model. Þessi útgáfa inniheldur þjálfuð GlowTTS módel fyrir fjórar mismunandi raddir úr Talrómur 1 [1] gagnasafninu. Módelin voru þjálfuð með Coqui TTS verkfærakistunni sem búið var að aðlaga fyrir íslensku. Innifalið fyrir hverja rödd er módelið, skjal með stillingum á módelinu, þjálfunarsaga og forskriftin sem var notuð. [1] http://hdl.handle.net/20.500.12537/104

Use "GlowTTS models for Talrómur1 (22.10)"

Semi-supervised Icelandic-Polish Translation System (22.09)

8 resources

This Icelandic-Polish translation model (bi-directional) was trained using fairseq (https://github.com/facebookresearch/fairseq) by means of semi-supervised translation by starting with the mBART50 model. The model was then trained using a multi-task curriculum to first learn to denoise sentences. Then the model was trained to translate using aligned parallel texts. Finally the model was provided with monolingual texts in both Icelandic and Polish with which it iteratively creates back-translations. For the PL-IS direction the model achieves a BLEU score of 27.60 on held out true parallel training data and 15.30 on the out-of-domain Flores devset. For the IS-PL direction the model achieves a score of 27.70 on the true data and 13.30 on the Flores devset. -- Þetta íslensk-pólska þýðingarlíkan (tvíátta) var þjálfað með fairseq (https://github.com/facebookresearch/fairseq) með hálf-sjálfvirkum aðferðum frá mBART50 líkaninu. Líkanið var þjálfað á þremur verkefnum, afruglun, samhliða þýðingum og bakþýðingum sem voru myndaðar á þjálfunartíma. Fyrir PL-IS áttina fæst BLEU skor 27.60 á raun gögnum sem voru tekin til hliðar og 15.30 á Flores þróunargögnunum. Fyrir IS-PL áttina fæst skor 27.70 á raun gögnunum og 13.30 á Flores þróunargögnunum.

Use "Semi-supervised Icelandic-Polish Translation System (22.09)"

Piper TTS (VITS) models for Talrómur1

6 resources

Trained models for four voices from the Talrómur [1] corpus trained with VITS [2] and exported to the onnxruntime [3] for Piper TTS [4]. The four voices are Búi, Salka, Steinn and Ugla. Módel fyrir fjórar raddir úr Talrómi [1]. Raddirnar eru þjálfaðar með VITS [2] og varpað í onnxruntime [3] skrá fyrir Piper TTS [4] verkefnið. Raddirnar fjórar eru Búi, Salka, Steinn og Ugla. [1] http://hdl.handle.net/20.500.12537/104 [2] https://github.com/jaywalnut310/vits/ [3] https://onnxruntime.ai/ [4] https://github.com/rhasspy/piper

Use "Piper TTS (VITS) models for Talrómur1"

Multi-speaker GlowTTS model for Talrómur 2 (prerelease) (22.10)

3 resources

This release includes a partially trained multi-speaker model using the GlowTTS architecture in the Coqui TTS library [1]. The model is trained on all of the speakers in the Talrómur 2 [2] corpus. The release includes the model, training log, model configuration file and the recipe used to train the model. The model included here is the best model available during the training at the time of publishing. At run time it is possible to choose any of the voices to produce a similar sounding synthesized voice. Þessi útgáfa inniheldur módel þjálfað á mörgum röddum með notkun GlowTTS nálgunarinnar í Coqui TTS verkfærakistunni [1]. Módelið er þjálfað á öllum röddum í Talrómur 2 [2] gagnasafninu. Innifalið í pakkanum er módelið, þjálfunarsaga, skjal með stillingum fyrir módelið og forskriftin sem var notuð til að þjálfa módelið. Módelið sem er hér inni er besta módelið í þjálfunarferlinu á þeim tíma sem þetta er gefið út. Þegar módelið er keyrt er hægt að velja hvaða rödd sem er úr Talrómur 2 gagnasafninu til að búa til upptöku með sambærilegri rödd. [1] https://github.com/cadia-lvl/coqui-ai-TTS/releases/tag/M9 [2] http://hdl.handle.net/20.500.12537/167

Use "Multi-speaker GlowTTS model for Talrómur 2 (prerelease) (22.10)"

RÚV-DI Speaker Diarization v5 models (21.05)

2 resources

English This archive contains files generated from the recipe in kaldi-speaker-diarization/v5/. Its contents should be placed in a similar directory type, with symbolic links to diarization/, sid/, steps/, etc. It was created when Kaldi's master branch was at git commit 321d3959dabf667ea73cc98881400614308ccbbb. v5 These models are trained on the Althingi Parliamentary Speech corpus available on malfong.is. It uses MFCCS, x-vectors, PLDA and AHC. The recipe uses the Icelandic Rúv-di corpus as two hold out sets for tuning parameters. The Icelandic Rúv-di corpus is currently not publicly available. Íslenska Þetta skjalasafn inniheldur skrár frá kaldi-speaker-diarization v5. Innihaldi skjalasafnsins ætti að setja í eins möppu, með hlekki (symlinks) á diarization, sid, steps, o.s.frv. Notast var við Kaldi af master grein og Git commit 321d3959dabf667ea73cc98881400614308ccbbb. v5 Þessi líkön eru þjálfuð á gagnasafninu Alþingisræður til talgreiningar sem er aðgengilegt á malfong.is. Þau nota MFCC, x-vigra, PLDA, og AHC. Uppskriftin notar RÚV-di gagnasafnið sem hold-out gagnasöfn til að stilla forsendur. Eins og er þá er RÚV-di gagnasafnið ekki aðgengilegt almenningi.

Use "RÚV-DI Speaker Diarization v5 models (21.05)"

Samrómur-Adolescents Kaldi Recipe 22.06

2 resources

The "Samrómur-Adolescents Kaldi Recipe 22.06" is a code recipe intended to show how to integrate the adolescent portion of the corpus "Samrómur Children's Icelandic Speech Data 21.09" [1] and the "Icelandic Language Models with Pronunciations 22.01" [2] to create automatic speech recognition systems using the Kaldi toolkit [3].

Use "Samrómur-Adolescents Kaldi Recipe 22.06"

Samrómur DeepSpeech Recipe 22.06

2 resources

The "Samrómur DeepSpeech Recipe 22.06" is a code recipe intended to show how to integrate the corpus "Samromur 21.05" [1] and the "DeepSpeech Scorer for Icelandic 22.06" [2] to create automatic speech recognition systems using the Mozilla's DeepSpeech recognizer [3].

Use "Samrómur DeepSpeech Recipe 22.06"

Multilabel Error Classifier (Icelandic Error Corpus categories) for Sentences (22.01)

1 resources

The Icelandic Error Corpus (IEC) was used to fine tune the Icelandic language model IceBERT for sentence classification. The objective was to train grammatical error detection models that could classify whether a sentence contains a particular error type. The model can mark sentences as including one or more of the following issues: coherence, grammar, orthography, other, style and vocabulary. The overall F1 score is a modest 64%. --- Íslenska villumálheildin (IEC) var notuð til að fínþjálfa íslenska mállíkanið IceBERT fyrir flokkun á setningum. Markmiðið var að þjálfa líkan sem getur greint hvort setning innihaldi ákveðna villutegund. Líkanið getur merkt við setningar með einum eða fleiri mörkum af eftirfarandi: coherence, grammar, orthography, other, style og vocabulary. F1 yfir heildina er 64%.

Use "Multilabel Error Classifier (Icelandic Error Corpus categories) for Sentences (22.01)"

Error Classifier (Icelandic Error Corpus categories) for Tokens (22.05)

1 resources

The Icelandic Error Corpus (http://hdl.handle.net/20.500.12537/73) was used to fine tune the Icelandic language model IceBERT-xlmr-ic3 for token classification. The objective was to train grammatical error detection models that could classify whether a token range contains a particular error type. The model can mark tokens as including one of the following issue categories: coherence, grammar, orthography, other, style and vocabulary. The overall F1 score is 71 and for individual categories as follows: coherence: 0; grammar: 63; orthography: 86; other: 0; vocabulary: 15.2.

Use "Error Classifier (Icelandic Error Corpus categories) for Tokens (22.05)"

Binary Error Classifier for Icelandic Sentences (22.09)

6 resources

The model is a fine-tuned byT5-base Transformer model for error detection in natural language. It is tuned for sentence classification using parallel synthetic error data and real error data from the iceErrorCorpus (IceEC, http://hdl.handle.net/20.500.12537/73) and the three specialised error corpora (L2: http://hdl.handle.net/20.500.12537/131, dyslexia: http://hdl.handle.net/20.500.12537/132, child language: http://hdl.handle.net/20.500.12537/133). The synthetic error data (35M lines of parallel data) was created by filtering and then scrambling the Icelandic Gigaword Corpus (IGC, http://hdl.handle.net/20.500.12537/192) to simulate real grammatical and typographical errors. The pretrained byT5 model was trained on the synthetic data and finally fine-tuned on the real error data from IceEC. The objective was to train a grammatical error detection model that could classify whether a sentence contains an error or not. The overall F1 score is 72.8% (precision: 76.3, recall: 71.7). --- Líkanið er byT5-base Transformer-líkan þjálfað til setningaflokkunar á samhliða gervivillugögnum og raunverulegum villum úr íslensku villumálheildinni (http://hdl.handle.net/20.500.12537/73) og sérhæfðu villumálheildunum þremur (íslenska sem erlent mál: http://hdl.handle.net/20.500.12537/131, lesblinda: http://hdl.handle.net/20.500.12537/132, barnatextar: http://hdl.handle.net/20.500.12537/133). Gervivillugögnin (35 milljón línur af samhliða gögnum) voru búin til með því að sía og svo rugla íslensku Risamálheildinni (http://hdl.handle.net/20.500.12537/192) með því að nota margs konar villumynstur til að líkja eftir raunverulegum málfræði- og ritunarvillum. Forþjálfaða byT5-líkanið var þjálfað á gervivillugögnunum og svo fínþjálfað á raungögnum úr villumálheildunum. Tilgangurinn var að þjálfa líkan sem gæti sagt til um hvort líklegt væri að setning innihéldi villu eða ekki. F1 fyrir líkanið er 72,8% (nákvæmni: 76,3, heimt: 71,7).

Use "Binary Error Classifier for Icelandic Sentences (22.09)"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Organisation

Project

Keywords

Active filters:

Search results

GlowTTS models for Talrómur1 (22.10)

Semi-supervised Icelandic-Polish Translation System (22.09)

Piper TTS (VITS) models for Talrómur1

Multi-speaker GlowTTS model for Talrómur 2 (prerelease) (22.10)

RÚV-DI Speaker Diarization v5 models (21.05)

Samrómur-Adolescents Kaldi Recipe 22.06

Samrómur DeepSpeech Recipe 22.06

Multilabel Error Classifier (Icelandic Error Corpus categories) for Sentences (22.01)

Error Classifier (Icelandic Error Corpus categories) for Tokens (22.05)

Binary Error Classifier for Icelandic Sentences (22.09)

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Organisation

Project

Keywords

Active filters:

Search results

Session recording