Result filters

Metadata provider

Language

Resource type

Availability

Loading...
703 record(s) found

Search results

  • RÚV-DI Speaker Diarization (20.09)

    These are a set of speaker diarization recipes which depend on the speech toolkit Kaldi. There are two types of recipes here. First are recipes used for decoding unseen audio. The second type of recipes are for training diarization models on the Rúv-di data. This tool also lists the DER for the Rúv-di dataset on most of the recipes. All DERs within this tool have no unscored collars and include overlapping speech
  • Grapheme-to-phoneme (g2p) module for Icelandic (22.10)

    ENGLISH: Grapheme-to-phoneme (g2p) module for Icelandic. The module can be used to transcribe Icelandic in four pronunciation variants (standard pronunciation, north Iceland, north-east Iceland, south Iceland), with different levels of detail and in four different phonetic alphabets. Default output is X-SAMPA phonetic alphabet without syllabification or stress labeling, according to standard pronunciation. The module transcribes English words using the Icelandic phoneset but close to English transcription rules. A transcription dictionary is also a part of the package. The package can be installed from PyPI: pip install ice-g2p ICELANDIC: Hljóðritunarforrit (g2p) fyrir íslensku. Forritið má nota til þess að hljóðrita íslensku skv. fjórum framburðartilbrigðum (hefðbundnum framburði, harðmæli, rödduðum framburði og hv-framburði), með mismiklum upplýsingum og í fjórum mismunandi hljóðritunarstafrófum. Séu engar stillingar sérvaldar þá skilar forritið úttaki í X-SAMPA hljóðritunarstafrófinu, án atkvæðaskiptinga eða áherslumerkinga, skv. hefðbundnum framburði. Forritið hljóðritar ensk orð með íslenskum hljóðritunartáknum en eins nálægt enskum reglum og mögulegt er. Framburðarorðabók fylgir pakkanum. Hægt er að sækja pakkann á PyPI: pip install ice-g2p
  • Tokenizer for Icelandic text (3.4.2) (22.10)

    Tokenizer is a compact pure-Python 3 executable program and module for tokenizing Icelandic text. It converts input text to streams of tokens, where each token is a separate word, punctuation sign, number/amount, date, e-mail, URL/URI, etc. It also segments the token stream into sentences, considering corner cases such as abbreviations and dates in the middle of sentences. More information at: https://github.com/icelandic-lt/Tokenizer Tokenizer er pakki fyrir Python 3, ásamt skipanalínutóli, sem sér um tilreiðslu íslensks texta. Pakkinn umbreytir inntakstexta í tókastraum. Hver tóki er stakt orð, greinarmerki, tala/upphæð, dags-/tímasetning, netfang, vefslóð o.s.frv. Tólið skiptir tókastraumnum einnig í setningar og tekur tillit til jaðartilvika eins og skammstafana og dagsetninga í miðjum setningum. Frekari upplýsingar á: https://github.com/icelandic-lt/Tokenizer
  • Editor for pronunciation dictionaries

    A web application for the editing of pronunciation dictionaries. The tool offers detailed annotation of entries, e.g. on compounds, prefixes, dialects and part-of-speech. Exports dictionaries in .tsv format for use in speech applications. Vefviðmót til þess að vinna með framburðarorðabækur. Tólið býður upp á að merkja upplýsingar með hverju orði, t.d. hvort orðið sé samsett, byrji á forskeyti, framburðartilbrigði og orðflokk. Unna orðalista er svo hægt að flytja út á .tsv formi til notkunar í taltæknihugbúnaði.
  • KAMOKO-Digitalizer

    This editor was developed especially for the needs of the KAMOKO project (https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-3261). The editor allows the quick entry of example sentences and sentence variants as well as the corresponding speaker ratings.
  • ABLTagger (PoS) - 2.0.0

    A Part-of-Speech (PoS) tagger for Icelandic. In this submission, you will find ABLTagger v2.0.0. This is a PoS tagger that works with the revised tagset and achieves an accuracy of 96.95% on MIM-Gold (cross-validation). For additional details, error analysis and categorization of this tagger and other taggers (including a previous version of ABLTagger), see I4 report for M4 (2021) in Language Technology Programme for Icelandic 2019-2023. For installation, usage, and other instructions see https://github.com/cadia-lvl/POS/releases/tag/m4 You should also check if a newer version is out (see README.md - versions) on CLARIN: - Model files - Docker image, version 2.0.0 ------------------------------------------------------------------------------------------- Markari fyrir íslensku. Í þessum pakka er ABLTagger v2.0.0. Þetta er markari sem virkar fyrir nýja markamengið og nær 96,95% nákvæmni á MÍM-Gull (krossprófanir). Fyrir nánari upplýsingar, villugreiningu og villuflokkun fyrir þennan markara og aðra (ásamt fyrri útgáfu af þessum markara), sjá I4 skýrslu fyrir vörðu 4 (2021) í Máltækniáætlun fyrir íslensku 2019-2023. Fyrir uppsetningar-, notenda- og aðrar leiðbeiningar sjá https://github.com/cadia-lvl/POS/releases/tag/m4 Einnig er gott að athuga þar hvort ný útgáfa sé komin út (sjá README.md - versions) Á CLARIN: - Líkan - Docker mynd, útgáfa 2.0.0
  • CEC6-Converter

    Diese Software erlaubt eine Konvertierung von *.cec6.gz-Dateien in 24 Formate, die in der Korpuslinguistik / NLProc üblich sind. Die Ausführung ist unter allen modernen Betriebssystemen möglich (Windows, Linux, MacOS). Die Binärdateien wurden für die x64-Architektur kompiliert. Sollten Sie einen Prozessor (CPU) verwenden, der eine x86- oder ARM-Architektur hat, dann nutzen Sie bitte die Anleitung: andere Betriebssysteme bzw. x86 / ARM / ARM64. --- This software allows the conversion of *.cec6.gz files into 24 formats that are commonly used in corpus linguistics / NLProc. Execution is possible under all modern operating systems (Windows, Linux, MacOS). The binary files have been compiled for the x64 architecture. If you are using a processor (CPU) with x86 or ARM architecture, please use the instructions for "other operating systems or x86 / ARM / ARM64".