CLARIN Tool Portal

Blacklab AutoSearch Corpus Search

1 resources

This demonstrator allows users to define one or more corpora and upload data for the corpora, after which the corpora will be made automatically searchable in a private workspace. Users can upload text data annotated with lemma + part of speech tags in TEI or FoLiA format, either as a single XML file or as an archive (zip or tar.gz) containing several XML files. Corpus size is limited to begin with (25 MB limit per uploaded file; 500,000 token limit for an entire corpus), but these limits may be increased at a later point in time. The search application is powered by the INL BlackLab corpus search engine. The search interface is the same as the one used in for example the Corpus of Contemporary Dutch / Corpus Hedendaags Nederlands.

Use "Blacklab AutoSearch Corpus Search"

Corpus of Contemporary Dutch

1 resources

The Corpus of Contemporary Dutch (Corpus Hedendaags Nederlands (CLARIN)) is a collection of texts consisting of more than 800,000 texts from newspapers, journals, TV News broadcasts and legal materials (1814-2013). The corpus was created by combining the older 5, 27 and 38 million words corpora and the Parole Corpus, supplemented by newspaper texts from NRC and De Standaard (until 2013). In addition, it contains corpus material from Suriname and the Dutch Antilles.

Corpus Hedendaags Nederlands (CLARIN) is een tekstverzameling van meer dan 800.000 teksten uit kranten, tijdschriften, journaaluitzendingen en juridisch materiaal (1814-2013). Het corpus is een samenvoeging van het oude 5, 27 en 38 Miljoen Woorden Corpus en het PAROLE Corpus, aangevuld met krantenteksten uit NRC en De Standaard (tot 2013). Daarnaast bevat het corpus materiaal uit Suriname en de Antillen.

Dictionary of Early Middle Dutch

1 resources

The dictionary of Old Dutch (ONW) online is the electronic version of the ONW. The dictionary describes describes the Old Dutch vocabulary from the period 500 to 1200.

Modern Dutch Lemma

Describes the origin of a word

describes the meaning of a words

describes the structure of a word

Use "Dictionary of Early Middle Dutch"

Dictionary of the Dutch Language

1 resources

Het Woordenboek der Nederlandsche Taal beschrijft de betekenis en geschiedenis van honderdduizenden woorden uit het geschreven Nederlands van 1500 tot 1976.

Modern Dutch Lemma

Describes the origin of a word

describes the meaning of a words

describes the structure of a word

Use "Dictionary of the Dutch Language"

Dictionary of Middle Dutch

1 resources

Search Application for the Middle Dutch Dictionary, which describes the vocabulary of the Dutch language as spoken from the 13th till the 16th century. .

Zoekapplicatie voor het Middelnederlandsch Woordenboek, dat de woordenschat beschrijft van het Nederlands dat in de dertiende tot de zestiende eeuw gesproken werd.

Modern Dutch Lemma

Describes the origin of a word

describes the meaning of a words

describes the structure of a word

Use "Dictionary of Middle Dutch"

Dictionary of Old Dutch

1 resources

The dictionary of Old Dutch (ONW) online is the electronic version of the ONW. The dictionary describes describes the Old Dutch vocabulary from the period 500 to 1200.

Modern Dutch Lemma

Describes the origin of a word

describes the meaning of a words

describes the structure of a word

Use "Dictionary of Old Dutch"

Frog: An advanced Natural Language Processing Suite for Dutch (Web Service and Application)

1 resources

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. It performs automatic linguistic enrichment such as part of speech tagging, lemmatisation, named entity recognition, shallow parsing, dependency parsing and morphological analysis. All NLP modules are based on TiMBL.

Iris Hendrickx, Antal van den Bosch, Maarten van Gompel, Ko van der Sloot and Walter Daelemans. 2016.Frog: A Natural Language Processing Suite for Dutch. CLST Technical Report 16-02, pp 99-114. Nijmegen, the Netherlands. https://github.com/LanguageMachines/frog/blob/master/docs/frogmanual.pdf

Van den Bosch, A., Busser, G.J., Daelemans, W., and Canisius, S. (2007). An efficient memory-based morphosyntactic tagger and parser for Dutch, In F. van Eynde, P. Dirix, I. Schuurman, and V. Vandeghinste (Eds.), Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting, Leuven, Belgium, pp. 99-114. http://ilk.uvt.nl/downloads/pub/papers/tadpole-final.pdf

Frog (plain text input)

Frog (folia+xml input)

Nederlab, online laboratory for humanities research on Dutch text collections

1 resources

The Nederlab project aims to bring together all digitized texts relevant to Dutch national heritage, the history of Dutch language and culture (c. 800 - present) in one user-friendly and tool-enriched open access web interface, allowing scholars to simultaneously search and analyze data from texts spanning the full recorded history of the Netherlands, its language and culture. The project builds on various initiatives: for corpora Nederlab collaborates with the scientific libraries and institutions, for infrastructure with CLARIN (and CLARIAH), for tools with eHumanities programmes such as Catch, IMPACT and CLARIN (TICCL, frog). Nederlab will offer a large number of search options with which researchers can find the occurrence of a particular term in a particular corpus or subcorpus. It'll also offer visualization of search results through line graphs, bar graphs, circle graphs, or scatter graphs. Furthermore, this online lab will offer a large set of tools, like tokenization tools, tools for spelling normalization, PoS-tagging tools, lemmatization tools, a computational historical lexicon and indices. Also, the use of (semi-) automatic syntactic parsing, tools for text mining, data mining and sentiment mining, Named Entity Recognition tools, coreference resolution tools, plagiarism detection tools, paraphrase detection tools and cartographical tools is offered The first version of Nederlab was launched in early 2015, it’ll be expanded until the end of 2017. Nederlab is financed by NWO, KNAW, CLARIAH and CLARIN-NL.

http://www.nederlab.nl/wp/?page_id=12

Metadata Editor, Browser and Organiser for IMDI and CMDI

1 resources

Arbil (Archive Builder) is a metadata editor, browser and organiser for metadata in IMDI and CMDI format. It is a Java desktop application that runs on most operating systems. Arbil can be used to create new metadata from scratch for resources on your local machine, or it can be used to download and modify metadata that are already in an archive. Arbil is a generic CMDI editor and therefore supports all CMDI profiles. It has a built-in file type verification tool that is configured to check files against the list of accepted file types for The Language Arhive, this can however be overruled for other archives.

Automatic Annotation of Multi-modal Language Resources

1 resources

The AAM-LR project provides a web service that helps field researchers to annotate audio- and video-recordings. At the top level the service marks the time intervals at which specific persons in the recording are speaking. In addition, the service provides a global phonetic annotation, using language independent phone models and phonetic features. Speech is separated from speaker noises such as laughing. Note: this service has been withdrawn and the URLs and PID do not resolve anymore!

Result filters

Metadata provider

Language

Resource type

Availability

Organisation

Project

Active filters:

Search results

Blacklab AutoSearch Corpus Search

Corpus of Contemporary Dutch

Dictionary of Early Middle Dutch

Dictionary of the Dutch Language

Dictionary of Middle Dutch

Dictionary of Old Dutch

Frog: An advanced Natural Language Processing Suite for Dutch (Web Service and Application)

Nederlab, online laboratory for humanities research on Dutch text collections

Metadata Editor, Browser and Organiser for IMDI and CMDI

Automatic Annotation of Multi-modal Language Resources

Result filters

Metadata provider

Language

Resource type

Availability

Organisation

Project

Active filters:

Search results

Session recording