CLARIN Tool Portal

698 record(s) found

Search results

Automatic Transcription of Oral History Interviews

1 resources

This webservice and web application uses automatic speech recognition to provide the transcriptions of recordings spoken in Dutch. You can upload and process only one file per project. For bulk processing and other questions, please contact Henk van den Heuvel at h.vandenheuvel@let.ru.nl.
TQE: Transcription Quality Evaluation

1 resources

The Transcription Quality Evaluation (TQE) tool is an instrument that automatically evaluates the quality of phonetic transcriptions. The application makes it possible to upload pairs of files consisting of an audio file and a transcription file and process them as follows: the audio signal and the phonetic transcription are aligned, segment boundaries are derived for each phone, and for each segment-phone combination it is determined how well they fit together, i.e. for each phone a TQE measure (a confidence measure) is determined, a number ranging from 0-100%, indicating how good the fit is, i.e. the quality of the phone transcription. The higher the number, the better the fit. The output of the TQE tool consists of a TQE measure and the segment boundaries for each phone in the corpus. The TQE tool thus makes it possible to find (sequences of) segments for which the match of the phone symbols with the audio signal is not optimal, in other words, the TQE tool can be used to check the quality of phonetic transcriptions. This can be useful for validating (manual) phonetic transcriptions, but also to compare and select (‘competing’) transcriptions, e.g. to study pronunciation variation. The TQE tool can thus be usefully applied in all research – in various (sub-) fields of humanities and language and speech technology (L&ST) – in which audio and phonetic transcriptions are involved.
PILNAR: Pilgrimage Narratives - a corpus for studying the profile of the modern pilgrim

1 resources

A corpus of pilgrimage narratives with Dutch texts written after ca. 2000 that present the thoughts and impressions of pilgrims to Santiago de Compostela. The PILNAR corpus is a source for research for a variety of (sub)disciplines: culture studies, ritual and religious studies, but also media and e-culture studies (cf the use of blogs and other social media for the self-presentation of experiences). Only for authorized users. The PILNAR corpus contains six subcorpora: - Volumes of De Jacobsstaf 1986-: 84 pdf files; - Volumes of De Pelgrim of the Flemish Society of Santiago de Compostella nos. 1-4 (16mb, 10mb, 16mb) (both societies work collaborate closely); - Volumes of Ultreia, a newsletter; 3 issues available now: January, February, April 2011; - Pilgrimage accounts and blogs by pilgrims available via the Societies Netherlands: circa 140 files; Flemish: circa 138 files; - A corpus of pilgrimage narratives compiled on the occasion of the exhibition in Museum Catharijneconvent held in collaboration with the Society: www.pelgrimsverhalen.nl; (link is external) already on the site now: about 180 fields (as of July 2011); - Accounts and narratives that come in after a specially targeted notice via the site and periodical by the Society (De Jacobsstaf), with perhaps a Flemish companion piece (De Pelgrim).
AVResearcherXL: Exploring audiovisual metadata in historical context

1 resources

AVResearcherXL is a tool for exploring radio and television programme descriptions, television subtitles and general newspaper articles. The interface searches across the catalogue "iMMix" of the Netherlands Institute for Sound and Vision and a selection of newspapers of KB Royal Archive of the Netherlands. By the end of 2014, the data used by AVResearcherXL are: iMMix 932,035 broadcasts indexed 18,124 broadcasts with subtitles 1 January 1900 is the date of the first broadcast in the index 26 October 2013 is the date of the last broadcast in the index KB newspapers 25,811,413 articles indexed 16,294,029 articles are of type "artikel" 8,483,542 articles are of type "advertentie" 630,929 articles are of type "illustratie met onderschrift" 402,913 articles are of type "familiebericht" 1 January 1900 is the date of the first article in the index 30 November 1994 is the date of the last article in the index AVResearcherXL is financially supported by CLARIN-NL within the QuaMeRDES-project and by CLARIAH-SEED within the Research Instruments for Media Studies-project. AVResearcherXL is an extended version of MeRDES, the tool developed in 2012 by the NWO-CATCH project BRIDGE. MeRDES was further developed into AVResearcher by the Netherlands Institute for Sound and Vision in 2013. AVResearcherXL is a collaborative project of Centre for Television in Transition (Utrecht University), Intelligent Systems Lab Amsterdam (University of Amsterdam) and the Netherlands Institute for Sound and Vision. The partners worked together with Dispectu for the development of the interface and back-end, and with Frontwise for the styling of the interface.

Bron, M., Gorp, J. van, Nack, F., Rijke, M. de, Vishneuski, Andrei and Leeuw, J.S. de (2012). A Subjunctive Exploratory Search Interface to Support Media Studies Researchers. SIGIR '12: 35th international ACM SIGIR conference on Research and development in information retrieval Portland, Oregon: ACM.

Huurnink, B., Bronner, A., Bron, M., Gorp, J. van, Goede, B. de and Wees, J. van (2013). AVResearcher: Exploring Audiovisual Metadata. DIR 2013: Dutch-Belgian Information Retrieval Conference Delft: DIR.
Alpino: a dependency parser for Dutch (CLST web service and application)

1 resources

This is a web service and web application to the Alpino dependency parser for Dutch, developed in the context of the PIONIER Project Algorithms for Linguistic Processing.

git clone --depth 1 git://urd.let.rug.nl/Alpino.git
BNM-I: Linked Data on Middle Dutch Sources Kept Worldwide

1 resources

Web application for consultation, using facetted search, and collaborative editing of the curated e-BNM collection of textual, codicological and historical information about thousands of Middle Dutch manuscripts kept world wide.The Bibliotheca Neerlandica Manuscripta and Impressa collects and makes available information on medieval manuscripts produced in the Netherlands regardless where they are kept. Documentation activities concentrate on the Middle-Dutch texts and their authors that have been transmitted in these manuscripts, on the individuals and institutions that have been involved in the manuscript production (scribes, illuminators, monasteries) and on the former and present manuscript owners. Since 1991 two-thirds of this ‘paper’ information, checked and supplemented with information from recent publications, has been converted into electronic data and incorporated in a database ( BNM-I ), which can be searched online. In 2013 this database was converted in the e-BNM+ project into a flexible datastructure that turned BNM-I into a key open access resource to which many other resources can easily be linked. The new BNM-I: - will be freely accessible for every user, anywhere in the world; - can easily implement new contributions or corrections by scientists; - can easily be linked to related databases - in the near future cross searching several databases in one interface will be possible; - will be prepared for the inclusion of new data, like: research data on Middle Dutch texts that were printed before 1541 and the books in which they are preserved; - articles on Middle Dutch texts and their authors (associated with the current thesaurised information).
OpenConvert

1 resources

The OpenConvert tools convert to TEI or FOLiA from a number of input formats (alto, text, word, HTML, ePub). The tools are available as a Java command line tool, a web service and a web application.The OpenConvert Tools were created by IVDNT in the OpenConvert project. The OpenConvert tools convert to TEI or FOLiA from a number of input formats (alto, text, word, HTML, ePub). The tools are available as a Java command line tool, a web service and a web application. Furthermore, as a proof of concept, the website currently provides two annotation tools: a simple Tokenizer for TEI files and a modern Dutch part of speech tagger.

The tool service can be called as a REST webservice which returns responses in XML, allowing it to be part of a webservice tool chain.

Input TEI, plain text, HTML

ALTO XML input

ePub input

directory containing files of a valid input type

zip file (with extension .zip) containing files of a valid input type

Free for academic use. Non-applicable for commercial parties

CLARIN based login required. The Clarin federation accepts login from many europian institutions. please seehttp://www.clarin.eu/content/service-provider-federation for more details

input file name (File upload)

Format of input file

Format of output file

to specify the tagger or tokeniser

input file mimetype is application/tei+xml

input file mimetype is text/html

input file mimetype is text/alto+xml

input file mimetype is application/msword

input file mimetype is application/epub+zip

input file mimetype is text/plain

output file mimetype is application/tei+xml

output file mimetype is text/folia+xml

Basic tagger-lemmatizer for modern Dutch

a TEI tokenizer
ePistolarium: A Web-based Humanities’ Collaboratory on Correspondences

1 resources

Circulation of Knowledge and Learned Practices in the 17th-century Dutch Republic (CKCC) investigates the circulation of knowledge in the 17th-century Dutch Republic. A multi-disciplinary project team consisting of historians, literature researchers, linguists and computer scientists works together in this project and created a web-based Humanities’ Collaboratory on Correspondences. This project, is carried out thanks to a NWO Medium investment subsidy and with CLARIN subsidies to make the resources available withing the CLARIN domain. A consortium of Dutch universities and cultural heritage institutions is building a web-based collaboratory (an online space for asynchronous collaboration) around a corpus of 20.000 letters of scholars who lived in the 17th-century Dutch Republic to answer the research question: how did knowledge circulate in the 17th century? Hereto, it will be necessary to analyze this large amount of correspondence systematically. Based on this (extendable) corpus, we will implement a content processing workflow that consists of iterative cycles of conceptual analysis, enrichment with several layers of annotation and visualization. With advice from CLARIN-EU in the first stage of the project a demonstrator was developed which implements techniques of keyword extraction. The second stage consists of evaluating existing more complex tools en techniques that can tackle one or more aspects of the targeted grammatical, content-related, and network complexity analysis, annotation, and visualization. The phase shall identify a set of tools that can be readily utilized in CKCC, as well as tools that need to be adapted or extended to the needs of CKCC; in short, by the end of this phase resources, requirements and risks shall become clear (deadline: December 2010). In the third stage the collaboratory is further developed according to the description in the CKCC project goals, centering around the technique of concept extraction. These three stages constitute the Work Package Analysis Tools, the core of the CKCC project, which was supported by CLARIN-NL. Other Work Packages provide data and software tools needed to create a complete system: the digital corpus of letters (WP6), the editing collaboratory that will contain the letters (WP1), and the archiving environment for data and software (WP2).

Ravenek, W, van den Heuvel, C and Gerritsen, G. 2017. The ePistolarium: Origins and Techniques. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 317–323. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.26. License: CC-BY 4.0
FESLI: Functional elements in Specific Language Impairment

1 resources

Tool for the quantitative and qualitative comparison of the acquisition of functional elements (morphological inflection, articles, pronouns etcetera) in a corpus with data from monolingual and bilingual children (Dutch - Turkish) with and without Specific Language Impairment (SLI). The FESLI-data come from two NWO-sponsored projects: BiSLI and Variflex. The numbers of children included in the resources are: - 12 bilingual children without language impairment (SLI); - 25 monolingual children with SLI; - 20 bilingual children with SLI. The children´s ages ranged from 6;0 to 8:5. For more precise information about the specific age distribution in each group, the reader is referred the dissertation written by Antje Orgassa (http://dare.uva.nl/document/147433 (link is external)). The non-impaired children were included in the Variflex project (data collected by Elma Blom) and also used in the BiSLI project; the data from the children with SLI were exclusive to the biSLI project. The technology used in the FESLI web application is based on modules of the COAVA web application.
WFT-GTB: Integrating the Wurdboek fan ˈe Fryske Taal into the Geïntegreerde TaalBank

1 resources

The Dictionary of the Frisian Language (Wurdboek fan de Fryske Taal) is online available via the GTB dictionary web application. The GTB also holds other major Dutch historical dictionaries, such as the Dictionary of Old Dutch (ONW), the Dictionary of early Middle Dutch (VMNW), the Dictionary of Middle Dutch (MNW), and the Dictionary of the Dutch language (WNT). The digital surrounding enables extensive forms of free and structured search queries, including comparative studies with Dutch materials. The Wurdboek fan de Fryske Taal (Dictionary of the Frisian Language)-project includes the vocabulary of Modern West Frisian from the period 1800-1975. The dictionary’s metalanguage is Dutch. A volume of 400 pages comes out every year, the first one in 1984. The editorial phase was finalized in 2009, the final editing and publication phase in 2010.

Modern Dutch Lemma and Frisian lemma

Describes the origin of a word

describes the meaning of a words

describes the structure of a word

describes the possible spellings of a word

Depuydt, K, de Does, J, Duijff, P and Sijens, H. 2017. Making the Dictionary of the Frisian Language Available in the Dutch Historical Dictionary Portal. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 151–165. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.13. License: CC-BY 4.0

Use "WFT-GTB: Integrating the Wurdboek fan ˈe Fryske Taal into the Geïntegreerde TaalBank"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

Automatic Transcription of Oral History Interviews

TQE: Transcription Quality Evaluation

PILNAR: Pilgrimage Narratives - a corpus for studying the profile of the modern pilgrim

AVResearcherXL: Exploring audiovisual metadata in historical context

Alpino: a dependency parser for Dutch (CLST web service and application)

BNM-I: Linked Data on Middle Dutch Sources Kept Worldwide

OpenConvert

ePistolarium: A Web-based Humanities’ Collaboratory on Correspondences

FESLI: Functional elements in Specific Language Impairment

WFT-GTB: Integrating the Wurdboek fan ˈe Fryske Taal into the Geïntegreerde TaalBank

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

Session recording