CLARIN Tool Portal

Manual Oral History Annotation Tool

1 resources

The Oral History Annotation tool, developed by the Centre for Language and Speech Technology (CLST) at the Radboud University Nijmegen, enables one to annotate and search in oral history resources. The tool has been used to enrich a corpus of 250 interviews from the Living Oral History Workbench with commentary . All 250 interviews are searchable through a fragment finder and can be annotated. These annotations can be shared with other researchers, making the interviews available and easier accessible for a much wider range of researchers in the humanities in general and in linguistics in particular. The Annotation Tool is only available for scientific research and only after approval by the Veterans Institute. Interview data can be used in a number of ways, such as comparative research, restudy or follow-up study, re-analysis / secondary analysis, research design and methodological advancement, replication and validation of published work, and for teaching and learning. Recent experiences with the re-use of interview data show that there is an enormous potential for this type of data. Especially in the field of interview data related to the Second World War and other military conflicts multidisciplinary research is carried out. This corpus consists of (about) 30 interviews that are fully transcribed from the Veteran Tapes VP project, and 250 interviews resulting from the Living Oral History Workbench project: - 120 World War II interviews presenting a range of experiences and frames of reference of Dutch soldiers between 1935-1945; - 100 interviews with veterans of the Dutch East Indies. This collection exhibits a large diversity in experiences at the local level in guerilla warfare; - 30 interviews with veterans of New Guinea. This is a relatively unknown conflict with very interesting elements (soldiers left in uncertainty and isolation, and the pressure of the international community to decolonize the area). Each interview lasts between 1 and 1.5 hours.

GrNe: Greek-Dutch dictionary

1 resources

Online dictionary (ancient) Greek - Dutch for the letter Pi. Search functions include searches for Greek lemmata, search of Greek declined or conjugated word-forms that lead to the correct lemma ('lemmatizer'), searches for Dutch words leading to different Greek lemmata, and etymological searches. The dictionary is linked to Logeion, the international website of Greek dictionaries at the University of Chicago. The developers estimate that a complete version of the dictionary will be finished by the end of 2015 and that it will be published by the end of 2016. A new dictionary ancient Greek – Dutch is currently under construction at Leiden University. The dictionary is being financed through the 2010 Spinoza award of project director Ineke Sluiter. CLARIN funding enabled the digital production of the letter Pi. Currently, the letters beta, gamma, zeta, pi and sigma are available online. The developers estimate that a complete first version of the dictionary will be finished by the end of 2015 and that it will be published by the end of 2016. The corpus that is being covered by this dictionary covers Greek literature from its beginnings (Homer) and consists of ca. 3.680.000 words (tokens); it includes all classical authors from the 5th and 4th centuries BCE, and a selection of later Greek (selection based on the likelihood that the text will be used by our target groups), but all of the New Testament, Lucian and Plutarch. The dictionary will eventually contain ca. 52.500 headwords. It is based on a thorough comparison of state of the art dictionaries, supplemented with the help of the material from the Thesaurus Linguae Graecae. Greek morphology is complicated. In order to use a dictionary effectively, a rather high level of initial language competence is necessary for the user to be able to relate the word-form s/he finds in a text to the correct basic lemma form, where the definition of the word can be found. This digital dictionary however has an added ‘lemmatizer’ function, which enables the user to type in the word as found in the text and to be redirected to the correct lemma. The digital resource enables both Greek-Dutch searches and searches for the possible Greek equivalents of Dutch terms. This also makes it possible to explore the relation of semantic fields in Dutch and Greek. E.g., it is possible to locate all Greek words that have ‘courage’ as part of their definition. Furthermore, the digital resource makes it possible to locate different Greek words with the same etymological roots. And finally, the dictionary is linked to the website of the University of Chicago, where a comparison of all Greek-x dictionaries is supported. Here, one can enter a Greek word and be provided with the equivalents and definitions in all the dictionaries that are linked on this website.

COBWWWEB: Connections Between Women and Writings Within European Borders

1 resources

The WomenWriters database includes biographical data on more than 4.000 authors and over 22.000 references to reception data found in sources like the periodical press, early literary history and private correspondences. A significant part of the dataset was collected in the NWO digitizing project The International Reception of Women’s Writing (2004-2007), focusing on authors received in the Netherlands. A second NWO internationalising project called New approaches to European Women’s Writing (2007-2010) and the subsequent COST Action Women Writers in History (2009‐2013) brought together a large international community of scholars and used the Dutch data collection as an example for other colleagues. COBWWWEB enables a connection between the various national projects on this subject into a growing international data network. A virtual research environment on top of this network makes all material from participating data providers accessible for European and interdisciplinary research.

PaQu - Parse and Query

1 resources

PaQu uses the Alpino parser to make treebanks of your own text corpus, and to search in these treebanks using an interface based on the LASSY Word Relations Search interface (http://dev.clarin.nl/node/1966). Several treebanks are already available in the application, such as: Lassy Klein (1M words, manually checked syntactic analysis) and Lassy Groot (700M words, syntactic analysis automatically assigned by Alpino). PaQu offers two ways to search through the syntactically annotated texts. The first option is to use the search bar to look for word pairs, optionally complemented by their syntactic relationship. The second search option is to use the query language XPath.

Odijk, J, van Noord, G, Kleiweg, P and Tjong Kim Sang, E. 2017. The Parse and Query (PaQu) Application. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 281–297. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.23. License: CC-BY 4.0

Arthurian Fiction

1 resources

This research tool provides information on medieval Arthurian narratives and the manuscripts in which they are transmitted throughout Europe. The tool discloses a database consists of linked records on over two hundred texts, more than thousand manuscripts and two hundred persons. The database is work in progress: a considerable number of records have yet to be completed, while fresh discoveries of narratives and manuscripts invite new entries. The compilers of the database hope that this tool will contribute to further research into Arthurian fiction as a pan-European phenomenon. The Arthurian Fiction web application enables searching for manuscripts, narratives and persons from the Arthurian Fiction narratives and manuscripts metadata database Arthurian Fiction Data. Each of these object types can be searched for using facets specific to the object type. These include: - for manuscripts: institute, date, origin, physical form, extant leave, leaf sizes, illustration type, scripts, scribe, patron and several more; - for narratives: date, origin, languages, cycle, manuscript, author, patron, verse type, meter, length, intertextuality properties and many more; - for persons: name, gender, subtype, background, manuscript, and narratives. The user can, if desired, select a subset of the facets to work with. In addition, keyword search is possible for all fields, query results can be sorted by a variety of keys and queries can be saved. There is also a web service with an API for the Arthurian Fiction narratives and manuscripts database. This web service makes use of SOLR queries via HTTP POST requests.

This movie is in Dutch with English subtitles.

Besamusca, A.A.M. and Quinlan, J. (2012). The Fringes of Arthurian Fiction. Arthurian literature, 29, 191-241.

Boot, P. (2012), Manuscripten koning Arthur op tafel, E-Data & Research 7(1), 2012.

Dalen-Oskam, K. van and Besamusca, B. (2011), Arthurian Fiction in Medieval Europe: Narratives and Manuscripts, presentation held at the CLARIN-NL Kick-off meeting Call 2, Utrecht, February 9, 2011.

Dalen-Oskam, K. van (2011), ArthurianFiction, presentation held at the Call 3 information session, Utrecht, August 25, 2011.

Usage

3 resources

The system here allows you to convert your book pages' images into editable text, presented in a particular text format called XML (eXtended Markup Language) of a particular type called Text-Encoding Initiative or TEI XML. This particular format was developed specifically for being able to mark-up or annotate the text you want to work on, i.e. to add all manner of further information to the actual text, e.g. to build a critical edition of it, which is most likely exactly what you want to do with your author's work.

Betti, A, Reynaert, M and van den Berg, H. 2017. @PhilosTEI: Building Corpora for Philosophers. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 379–392. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.32. License: CC-BY 4.0

VU University Diachronic News text Corpus

The diachronic corpus has been brought in line with current standards and formats as used in the STEVIN Nederlandstalig Referentiecorpus (SoNaR, under development), which has been adapted to the more general FoLiA format (documented by Van Gompel, 2012). These standards and formats have been extended with new layers of annotation. As a result the corpus adheres to the current day CLARIN infrastructure.

Use "VU University Diachronic News text Corpus"

OpenSONAR: a 500 MW reference corpus of Contemporary Written Dutch

SoNaR is a 500-million-word reference corpus of contemporary written Dutch for use in different types of linguistic (incl. lexicographic) and HLT research and the development of applications. The STEVIN funded SoNaR project (2008-2011) built on the results obtained in the D-Coi and Corea projects which were awarded funding in the first call of proposals within the STEVIN programme. SONAR contains over 500 million words (i.e. word tokens) of full texts from a wide variety of text types including both texts from conventional media and texts from the new media. All texts except for texts from the social media (Twitter, Chat, SMS) have been tokenized, tagged for part of speech and lemmatized, while in the same set the Named Entities have been labelled. All annotations were produced automatically, no manual verification took place. The texts are enriched with several annotations (Part of Speech and lemma information) and are available as FoLiA xml files (folia.xml). The system relies on BlackLab server as back-end and WhiteLab as user-interface. OpenSONAR is an online application for exploration of and searching in the SoNaR corpus.

van de Camp, M, Reynaert,MandOostdijk, N. 2017.WhiteLab 2.0: AWeb Interface for Corpus Exploitation. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 231–243. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.19. License: CC-BY 4.0

de Does, J, Niestadt, J and Depuydt, K. 2017. Creating Research Environments with BlackLab. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 245–257. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.20. License: CC-BY 4.0

Oostdijk, N., Reynaert, M., Hoste, V., Schuurman, I. (2013) The Construction of a 500 Million Word Reference Corpus of Contemporary Written Dutch in: Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme (eds. P. Spyns, J. Odijk), Springer Verlag.

Texcavator End-user Manual

WAHSP/BLAND has been succeeded by TexCavator: http://texcavator.surfsaralabs.nl/

Texcavator enables a researcher to use full-text search on the newspaper archive of the Dutch Royal Library. On top of that, it allows for visualizations like word clouds, time lines and heat maps. It also provides services to enhance your search experience like filtering, stopword removal, normalization and stemming. Texcavator also gives access to ShiCo (Shifting Concepts), developed by Carlos Martinez Ortiz (NL eScience Center).ShiCo is a tool for visualizing time shifting concepts. We refer to a concept as the set of words which are related to a given seed word. ShiCo uses a set of semantic models (word2vec) spanning a number of years to explore how concepts change over time -- words related to a given concept at time t=0 may differ from the words related to the same concept at time t=n . Texcavator originated from the earlier text mining applications WAHSP and BiLand. During the Translantis project, the application was renamed to Texcavator and further developed by the UvA (Fons Laan). In May 2014, development was taken over by the Netherlands eScience Center (Janneke van der Zwaan). From April 2015 onwards, Texcavator was developed at the Digital Humanities lab of Utrecht University (Julian Gonggrijp and Martijn van der Klis). ShiCo was created in cooperation with the NL eScience Center (Carlos Martinez Ortiz).

Snelders, S, Huijnen, P, Verheul, J, de Rijke, M and Pieters. T. 2017. A Digital Humanities Approach to the History of Culture and Science: Drugs and Eugenics Revisited in Early 20th-Century Dutch Newspapers, Using Semantic TextMining. In:Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 325–336. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.27. License: CC-BY 4.0

MIMORE: Microcomparative Morphosyntax Research Tool

With the MIMORE search engine one can search three databases together, with text strings, part of speech tags and syntactic variables. The researcher can combine categories and features into complex tags or use predefined tags. All categories and features are defined in ISOcat. Since all sentences have a location code, the morphosyntactic phenomena found in a set of sentences resulting from a search can be automatically plotted on a geographic map. It is possible to include more than one morphosyntactic phenomenon in one map, thus visualizing potential correlations between these phenomena. There is also a user-friendly function to export the data to a statistical program. The data in DynaSAND, the dynamic syntactic atlas of the Dutch dialects (http://www.meertens.knaw.nl/sand/ (link is external)), were collected between 2000 and 2005 by oral interviews (fieldwork and telephone) in about 300 locations across The Netherlands, Belgium and a small part of north-west France. Dialect speakers were asked to judge and/or translate some 150 test sentences. DynaSAND makes available the full recordings and transcriptions of these interviews. Together, the DynSAND data cover the syntactic variation in the Dutch language area in the left periphery of the clause (the complementizer system and complementizer agreement), variation in subject pronoun form depending on syntactic position, subject pronoun doubling, cliticization on YES/NO, the reflexive system, fronting constructions (Wh-clauses, relative clauses, topicalization), word order and morphological variation in verb clusters, negation and quantification. The data in DiDDD (Diversity in Dutch DP Design; http://www.meertens.knaw.nl/diddd/ (link is external)) were collected between 2005 and 2009 with oral and written interviews in about 200 locations in the Dutch language area, with a methodology highly parallel to DynaSAND. The data involve translations of and judgements on test sentences. For 29 interviews there are sound recordings which have been lined up with their transcriptions. The DIDDD data cover the morphosyntactic variation within nominal groups, in particular possessives, partitives, noun ellipsis, the demonstrative system, the numeral modification system, what-for constructions, quantitative er, adjectival inflection, negation and exclamatives. The data in GTRP (Goeman, Taeldeman, van Reenen Project; http://www.meertens.knaw.nl/mand/database/ (link is external)) were collected between 1979 and 2000 with oral interviews in about 600 locations in the Dutch language area. Informants were asked to translate words or short sentences. Part of the transcriptions have been lined up with the sound recordings. The morphological data in GTRP include plural forms of nouns, diminutives, gender on nouns and adjectives, comparatives, superlatives, verbal inflection including participles, subject, object and possessive pronouns.

S. Barbiers, M. van Koppen, H. Bennis, N. Corver, MIcrocomparative MOrphosyntactic REsearch (MIMORE): Mapping partial grammars of Flemish, Brabantish and Dutch. Lingua Vol. 178, 5-31. doi:10.1016/j.lingua.2015.10.018

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Organisation

Project

Keywords

Active filters:

Search results

Manual Oral History Annotation Tool

GrNe: Greek-Dutch dictionary

COBWWWEB: Connections Between Women and Writings Within European Borders

PaQu - Parse and Query

Arthurian Fiction

Usage

VU University Diachronic News text Corpus

OpenSONAR: a 500 MW reference corpus of Contemporary Written Dutch

Texcavator End-user Manual

MIMORE: Microcomparative Morphosyntax Research Tool

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Organisation

Project

Keywords

Active filters:

Search results

Session recording