Result filters

Metadata provider

Language

Resource type

Availability

  • Academic

Active filters:

  • Availability: Academic
  • Project: CLARIN-NL
Loading...
8 record(s) found

Search results

  • Blacklab AutoSearch Corpus Search

    This demonstrator allows users to define one or more corpora and upload data for the corpora, after which the corpora will be made automatically searchable in a private workspace. Users can upload text data annotated with lemma + part of speech tags in TEI or FoLiA format, either as a single XML file or as an archive (zip or tar.gz) containing several XML files. Corpus size is limited to begin with (25 MB limit per uploaded file; 500,000 token limit for an entire corpus), but these limits may be increased at a later point in time. The search application is powered by the INL BlackLab corpus search engine. The search interface is the same as the one used in for example the Corpus of Contemporary Dutch / Corpus Hedendaags Nederlands.
  • Corpus of Contemporary Dutch

    The Corpus of Contemporary Dutch (Corpus Hedendaags Nederlands (CLARIN)) is a collection of texts consisting of more than 800,000 texts from newspapers, journals, TV News broadcasts and legal materials (1814-2013). The corpus was created by combining the older 5, 27 and 38 million words corpora and the Parole Corpus, supplemented by newspaper texts from NRC and De Standaard (until 2013). In addition, it contains corpus material from Suriname and the Dutch Antilles.
    Corpus Hedendaags Nederlands (CLARIN) is een tekstverzameling van meer dan 800.000 teksten uit kranten, tijdschriften, journaaluitzendingen en juridisch materiaal (1814-2013). Het corpus is een samenvoeging van het oude 5, 27 en 38 Miljoen Woorden Corpus en het PAROLE Corpus, aangevuld met krantenteksten uit NRC en De Standaard (tot 2013). Daarnaast bevat het corpus materiaal uit Suriname en de Antillen.
  • Dictionary of Middle Dutch

    Search Application for the Middle Dutch Dictionary, which describes the vocabulary of the Dutch language as spoken from the 13th till the 16th century. .
    Zoekapplicatie voor het Middelnederlandsch Woordenboek, dat de woordenschat beschrijft van het Nederlands dat in de dertiende tot de zestiende eeuw gesproken werd.
    Modern Dutch Lemma
    Describes the origin of a word
    describes the meaning of a words
    describes the structure of a word
  • Dictionary of Old Dutch

    The dictionary of Old Dutch (ONW) online is the electronic version of the ONW. The dictionary describes describes the Old Dutch vocabulary from the period 500 to 1200.
    Modern Dutch Lemma
    Describes the origin of a word
    describes the meaning of a words
    describes the structure of a word
  • VU University Diachronic News text Corpus

    The diachronic corpus has been brought in line with current standards and formats as used in the STEVIN Nederlandstalig Referentiecorpus (SoNaR, under development), which has been adapted to the more general FoLiA format (documented by Van Gompel, 2012). These standards and formats have been extended with new layers of annotation. As a result the corpus adheres to the current day CLARIN infrastructure.
  • OpenSONAR: a 500 MW reference corpus of Contemporary Written Dutch

    SoNaR is a 500-million-word reference corpus of contemporary written Dutch for use in different types of linguistic (incl. lexicographic) and HLT research and the development of applications. The STEVIN funded SoNaR project (2008-2011) built on the results obtained in the D-Coi and Corea projects which were awarded funding in the first call of proposals within the STEVIN programme. SONAR contains over 500 million words (i.e. word tokens) of full texts from a wide variety of text types including both texts from conventional media and texts from the new media. All texts except for texts from the social media (Twitter, Chat, SMS) have been tokenized, tagged for part of speech and lemmatized, while in the same set the Named Entities have been labelled. All annotations were produced automatically, no manual verification took place. The texts are enriched with several annotations (Part of Speech and lemma information) and are available as FoLiA xml files (folia.xml). The system relies on BlackLab server as back-end and WhiteLab as user-interface. OpenSONAR is an online application for exploration of and searching in the SoNaR corpus.
    van de Camp, M, Reynaert,MandOostdijk, N. 2017.WhiteLab 2.0: AWeb Interface for Corpus Exploitation. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 231–243. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.19. License: CC-BY 4.0
    de Does, J, Niestadt, J and Depuydt, K. 2017. Creating Research Environments with BlackLab. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 245–257. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.20. License: CC-BY 4.0
    Oostdijk, N., Reynaert, M., Hoste, V., Schuurman, I. (2013) The Construction of a 500 Million Word Reference Corpus of Contemporary Written Dutch in: Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme (eds. P. Spyns, J. Odijk), Springer Verlag.