CLARIN Tool Portal

Active filters:

Project: Development of Slovene in a Digital Environment

33 record(s) found

Search results

Slovene Text Denormalizator RSDO-DS2-DENORM 1.0

2 resources

This Text Denormalisator converts Slovene spoken-form text into written-form text. Typically it is used as a post-processing step in Automatic Speech Recognition, which traditionally outputs spoken-form text. As input it accepts text in either string form, list of tokens, or a list of dictionaries with a mandatory "text" field. The output is a dictionary. Example of use: denormalize("Danes, osmega sedmega dva tisoč dvaindvajset, je lep sončen dan, saj je zunaj prijetnih petindvajset stopinj Celzija.") {'denormalized_content': [{'text': 'Danes', 'index': [0]}, {'text': ',', 'index': [1]}, {'text': '8.', 'index': [2]}, {'text': '7.', 'index': [3]}, {'text': '2022', 'index': [4, 5, 6]}, {'text': ',', 'index': [7]}, {'text': 'je', 'index': [8]}, {'text': 'lep', 'index': [9]}, {'text': 'sončen', 'index': [10]}, {'text': 'dan', 'index': [11]}, {'text': ',', 'index': [12]}, {'text': 'saj', 'index': [13]}, {'text': 'je', 'index': [14]}, {'text': 'zunaj', 'index': [15]}, {'text': 'prijetnih', 'index': [16]}, {'text': '25', 'index': [17]}, {'text': '°C', 'index': [18, 19]}, {'text': '.', 'index': [20]}], 'denormalized_string': 'Danes, 8. 7. 2022, je lep sončen dan, saj je zunaj prijetnih 25 °C.'}

Use "Slovene Text Denormalizator RSDO-DS2-DENORM 1.0"
Slovene Conformer CTC BPE E2E Automated Speech Recognition model RSDO-DS2-ASR-E2E 2.0

2 resources

This Conformer CTC BPE E2E Automated Speech Recognition model was trained following the NVIDIA NeMo Conformer-CTC recipe (for details see the official NVIDIA NeMo NMT documentation, https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/intro.html, and NVIDIA NeMo GitHub repository https://github.com/NVIDIA/NeMo). It provides functionality for transcribing Slovene speech to text. The training, development and test datasets were based on the Artur dataset and consisted of 630.38, 16.48 and 15.12 hours of transcribed speech in standardised form, respectively. The model was trained for 200 epochs and reached WER 0.0429 on the development and WER 0.0558 on the test dataset.

Use "Slovene Conformer CTC BPE E2E Automated Speech Recognition model RSDO-DS2-ASR-E2E 2.0"
Q-CAT Corpus Annotation Tool 1.2

2 resources

The Q-CAT (Querying-Supported Corpus Annotation Tool) is a computational tool for manual annotation of language corpora, which also enables advanced queries on top of these annotations. The tool has been used in various annotation campaigns related to the ssj500k reference training corpus of Slovenian (http://hdl.handle.net/11356/1210), such as named entities, dependency syntax, semantic roles and multi-word expressions, but it can also be used for adding new annotation layers of various types to this or other language corpora. Q-CAT is a .NET application, which runs on Windows operating system. Version 1.1 enables the automatic attribution of token IDs and personalized font adjustments. Version 1.2 supports the CONLL-U format and working with UD POS tags.

Use "Q-CAT Corpus Annotation Tool 1.2"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

Slovene Text Denormalizator RSDO-DS2-DENORM 1.0

Slovene Conformer CTC BPE E2E Automated Speech Recognition model RSDO-DS2-ASR-E2E 2.0

Q-CAT Corpus Annotation Tool 1.2

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

Slovene Text Denormalizator RSDO-DS2-DENORM 1.0

Slovene Conformer CTC BPE E2E Automated Speech Recognition model RSDO-DS2-ASR-E2E 2.0

Q-CAT Corpus Annotation Tool 1.2

Session recording