Result filters

Metadata provider

Language

Resource type

Availability

Loading...
703 record(s) found

Search results

  • Fast and easy development of pronunciation lexicons for names

    The AUTONOMATA transcription tool set consists of a transcription tool and learning tools, with which one can enrich word lists with precise information on the pronunciation. Thee uses a general grapheme-to-phoneme converter (the g2p-converter).
    This STEVIN project is about the investigation of new pronunciation modeling technologies that can improve the automatic recognition of spoken names in the context of a POI (Point-of-Interest) information providing business service. Collaboration with RU (Nijmegen), UiL (Utrecht), Nuance and TeleAtlas.
    De AUTONOMATA-transcriptietoolset bestaat uit een transcriptietool en learning tools, waarmee men woordenlijsten kan verrijken met nauwkeurige uitspraakinformatie. De tool maakt gebruik van een algemene grafeem-naar-foneemomzetter (de g2p-omzetter).
  • COAVA: Cognition, Acquisition and Variation Tool

    In COAVA two sets of databases are made available in a standardized way: one with historical dialect data (the databases WBD and WLD with lexical data of the Brabantish and Limburgian dialect between 1880-1980) and one with first language acquisition data (four databases form the CHILDES project). The databases contain linguistic information (dialect form, standardised form (“Dutchified”), lexical meaning), geographical information (locality, dialect area, province) and information on the source (inquiry forms or monotopic dictionaries and the date of documentation). The visualisation of the first two sets of information will lead to lexical maps. The most typical way for the user to get to the data will be with the use of the browsable concept taxonomy. The databases are, in other words, approachable via search tools but also via a thematic taxonomy. This taxonomy was developed for the dialect databases and covers the general vocabulary. COAVA (COgnition, Acquisition and VAriation Tool) brings together two strange bedfellows: first language acquisition and historical dialectology. In historical linguistics there is the common assumption that language change in the past is due to the process of non-target like transmission of linguistic features between generations i.e. between parents and children. Despite this assumption, both disciplines remain isolated from each other due to, among others, different methods of data-collection and different types of resources with empirical data. The aim of the COAVA project was to demonstrate that the common assumption in historical linguistics, mentioned above, can be examined in detail with the help of Digital Humanities. This interdisciplinary research targets at the development of a tool for easily exploring the linguistic characteristics of concepts. In COAVA two sets of databases are made available in a standardized way: one with historical dialect data (the databases WBD and WLD with lexical data of the Brabantish and Limburgian dialect between 1880-1980) and one with first language acquisition data (four databases form the CHILDES project).
    Leonie Cornips, Jos Swanenberg, Wilbert Heeringa, Folkert de Vriend (2016). The relationship between first language acquisition and dialect variation: Linking resources from distinct disciplines in a CLARIN-NL project. Lingua, Vol. 178, 07.2016, p. 32-45. doi:10.1016/j.lingua.2015.11.007
    Cornips, L., Swanenberg, J., Vriend, F. de, Heeringa, W. (2012), Is what we have acquired early, less vulnerable to variation? A comparison between data from dialectlexicography and data from first language acquisition. http://www.meertens.knaw.nl/coavasite/wp-content/uploads/2012/10/Abstract-SIDG-2-JS.pdf
    Cornips, L., Kemps-Snijders, M., Snijders, M., Swanenberg, J. and Vriend, F. de (2011). Bridging the Gap between First Language Acquisition and Historical Dialectology with the Help of Digital Humanities. SDH Copenhagen. http://www.meertens.knaw.nl/coavasite/wp-content/uploads/2011/11/Paper-SDH.pdf
  • Namescape Named Entity Recognition

    Searching and visualizing Named Entities in modern Dutch novels. The named entity (NE) tagging and resolution in NameScape enables quantitative and repeatable research where previously only guesswork and anecdotal evidence was feasible. The visualisation module enables researchers with a less technical background to draw conclusions about functions of names in literary work and help them to explore the material in search of more interesting questions (and answers). Users from other communities (sociolinguistics, sentiment analysis, …) also benefit from the NE tagged data, especially since the NE recognizer is available as a web service, enabling researchers to annotate their own research data. Datasets in NameScape (total of 1.129 books): Corpus Sanders: A corpus of 582 Dutch novels written and published between 1970 and 2009 will. Corpus Huygens: Consists of 22 novels manually tagged with detailed named entity information. IPR for this corpus do not allow distribution. Corpus eBooks: Consists of 7000+ Dutch eBooks tagged automatically with basic NER features and person name Part information. IPR for this corpus do not allow distribution. Corpus SoNaR Books: 105 Dutch books; NE tagged. Corpus Gutenberg Dutch: Consists of 530 NE tagged TEI files converted from the Epub versions of the corresponding Gutenberg documents. Recent research has conclusively proven names in literary works can only be put fully into perspective when studied in a wider context (landscape) of names either in the same text or in related material (the onymic landscape or “namescape”). Research on large corpora is needed to gain a better understanding of e.g. what is characteristic for a certain period, genre, author or cultural region. The data necessary for research on this scale simply does not exist yet. NameScape aims to fill the need by providing a substantial amount of literary works annotated with a rich tag set, thereby enabling researchers to perform their research in more depth than previously possible. Several exploratory visualization tools help the scholar to answer old questions and uncover many more new ones, which can be addressed using the demonstrator.
    de Does, J, Depuydt, K, van Dalen-Oskam, K and Marx, M. 2017. Namescape: Named Entity Recognition from a Literary Perspective. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 361–370. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.30. License: CC-BY 4.0
    Karina van Dalen-Oskam (2013), Nordic Noir: a background check on Inspector Van Veeteren, 31 May 2012, http://blog.namescape.nl/?p=47
  • AVResearcherXL: Exploring audiovisual metadata in historical context

    AVResearcherXL is a tool for exploring radio and television programme descriptions, television subtitles and general newspaper articles. The interface searches across the catalogue "iMMix" of the Netherlands Institute for Sound and Vision and a selection of newspapers of KB Royal Archive of the Netherlands. By the end of 2014, the data used by AVResearcherXL are: iMMix 932,035 broadcasts indexed 18,124 broadcasts with subtitles 1 January 1900 is the date of the first broadcast in the index 26 October 2013 is the date of the last broadcast in the index KB newspapers 25,811,413 articles indexed 16,294,029 articles are of type "artikel" 8,483,542 articles are of type "advertentie" 630,929 articles are of type "illustratie met onderschrift" 402,913 articles are of type "familiebericht" 1 January 1900 is the date of the first article in the index 30 November 1994 is the date of the last article in the index AVResearcherXL is financially supported by CLARIN-NL within the QuaMeRDES-project and by CLARIAH-SEED within the Research Instruments for Media Studies-project. AVResearcherXL is an extended version of MeRDES, the tool developed in 2012 by the NWO-CATCH project BRIDGE. MeRDES was further developed into AVResearcher by the Netherlands Institute for Sound and Vision in 2013. AVResearcherXL is a collaborative project of Centre for Television in Transition (Utrecht University), Intelligent Systems Lab Amsterdam (University of Amsterdam) and the Netherlands Institute for Sound and Vision. The partners worked together with Dispectu for the development of the interface and back-end, and with Frontwise for the styling of the interface.
    Bron, M., Gorp, J. van, Nack, F., Rijke, M. de, Vishneuski, Andrei and Leeuw, J.S. de (2012). A Subjunctive Exploratory Search Interface to Support Media Studies Researchers. SIGIR '12: 35th international ACM SIGIR conference on Research and development in information retrieval Portland, Oregon: ACM.
    Huurnink, B., Bronner, A., Bron, M., Gorp, J. van, Goede, B. de and Wees, J. van (2013). AVResearcher: Exploring Audiovisual Metadata. DIR 2013: Dutch-Belgian Information Retrieval Conference Delft: DIR.
  • Namescape Search

    Searching and visualizing Named Entities in modern Dutch novels. The named entity (NE) tagging and resolution in NameScape enables quantitative and repeatable research where previously only guesswork and anecdotal evidence was feasible. The visualisation module enables researchers with a less technical background to draw conclusions about functions of names in literary work and help them to explore the material in search of more interesting questions (and answers). Users from other communities (sociolinguistics, sentiment analysis, …) also benefit from the NE tagged data, especially since the NE recognizer is available as a web service, enabling researchers to annotate their own research data. Datasets in NameScape (total of 1.129 books): Corpus Sanders: A corpus of 582 Dutch novels written and published between 1970 and 2009 will. Corpus Huygens: Consists of 22 novels manually tagged with detailed named entity information. IPR for this corpus do not allow distribution. Corpus eBooks: Consists of 7000+ Dutch eBooks tagged automatically with basic NER features and person name Part information. IPR for this corpus do not allow distribution. Corpus SoNaR Books: 105 Dutch books; NE tagged. Corpus Gutenberg Dutch: Consists of 530 NE tagged TEI files converted from the Epub versions of the corresponding Gutenberg documents. Recent research has conclusively proven names in literary works can only be put fully into perspective when studied in a wider context (landscape) of names either in the same text or in related material (the onymic landscape or “namescape”). Research on large corpora is needed to gain a better understanding of e.g. what is characteristic for a certain period, genre, author or cultural region. The data necessary for research on this scale simply does not exist yet. NameScape aims to fill the need by providing a substantial amount of literary works annotated with a rich tag set, thereby enabling researchers to perform their research in more depth than previously possible. Several exploratory visualization tools help the scholar to answer old questions and uncover many more new ones, which can be addressed using the demonstrator.
    de Does, J, Depuydt, K, van Dalen-Oskam, K and Marx, M. 2017. Namescape: Named Entity Recognition from a Literary Perspective. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 361–370. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.30. License: CC-BY 4.0
    Karina van Dalen-Oskam (2013), Nordic Noir: a background check on Inspector Van Veeteren, 31 May 2012, http://blog.namescape.nl/?p=47
  • PILNAR: Pilgrimage Narratives - a corpus for studying the profile of the modern pilgrim

    A corpus of pilgrimage narratives with Dutch texts written after ca. 2000 that present the thoughts and impressions of pilgrims to Santiago de Compostela. The PILNAR corpus is a source for research for a variety of (sub)disciplines: culture studies, ritual and religious studies, but also media and e-culture studies (cf the use of blogs and other social media for the self-presentation of experiences). Only for authorized users. The PILNAR corpus contains six subcorpora: - Volumes of De Jacobsstaf 1986-: 84 pdf files; - Volumes of De Pelgrim of the Flemish Society of Santiago de Compostella nos. 1-4 (16mb, 10mb, 16mb) (both societies work collaborate closely); - Volumes of Ultreia, a newsletter; 3 issues available now: January, February, April 2011; - Pilgrimage accounts and blogs by pilgrims available via the Societies Netherlands: circa 140 files; Flemish: circa 138 files; - A corpus of pilgrimage narratives compiled on the occasion of the exhibition in Museum Catharijneconvent held in collaboration with the Society: www.pelgrimsverhalen.nl; (link is external) already on the site now: about 180 fields (as of July 2011); - Accounts and narratives that come in after a specially targeted notice via the site and periodical by the Society (De Jacobsstaf), with perhaps a Flemish companion piece (De Pelgrim).
  • RemBench - a Digital Workbench for Rembrandt Research

    RemBench enables one to search and browse for works of art, artists, primary sources and library sources related to Rembrandt, using faceted search by location, author/artist name, author/artist type, and date range, and/or by both exact and fuzzy keyword search. It offers both a web application and a RESTful web service. RemBench combines the content of four different databases behind one search interface: RKDartists and RKDimages, two databases maintained by the Netherlands Institute for Art History (RKD); RemDoc, a collection of original documents related to Rembrandt van Rijn from the period between 1475 to circa 1750; RUQuest, a library system that provides access to full text articles, as well as the complete collection of (e-)books and journals from the Radboud University Library Catalogue. RemBench does not influence the content of these databases.
    Verberne, S, van Leeuwen, R, Gerritsen, G and Boves, L. 2017. RemBench: A Digital Workbench for Rembrandt Research. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 337–350. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.28. License: CC-BY 4.0
  • Evaluating Repetitions, or how to Improve your Multilingual ASR System by doing Nothing

    A demo of a speech recognizer for POIs (Points of Interest). This demo recognizes stay-over addresses and eateries in some big cities (inter alia Amsterdam, Antwerpen, Gent, Rotterdam).
    This STEVIN project is about the investigation of new pronunciation modeling technologies that can improve the automatic recognition of spoken names in the context of a POI (Point-of-Interest) information providing business service. Collaboration with RU (Nijmegen), UiL (Utrecht), Nuance and TeleAtlas.
    Een demo van een spraakherkenner voor POIs (Points of Interest). Deze demo herkent overnachtingsadressen en eetgelegenheden in enkele grote steden (o.a. Amsterdam, Antwerpen, Gent, Rotterdam).
  • Manual Oral History Annotation Tool

    The Oral History Annotation tool, developed by the Centre for Language and Speech Technology (CLST) at the Radboud University Nijmegen, enables one to annotate and search in oral history resources. The tool has been used to enrich a corpus of 250 interviews from the Living Oral History Workbench with commentary . All 250 interviews are searchable through a fragment finder and can be annotated. These annotations can be shared with other researchers, making the interviews available and easier accessible for a much wider range of researchers in the humanities in general and in linguistics in particular. The Annotation Tool is only available for scientific research and only after approval by the Veterans Institute. Interview data can be used in a number of ways, such as comparative research, restudy or follow-up study, re-analysis / secondary analysis, research design and methodological advancement, replication and validation of published work, and for teaching and learning. Recent experiences with the re-use of interview data show that there is an enormous potential for this type of data. Especially in the field of interview data related to the Second World War and other military conflicts multidisciplinary research is carried out. This corpus consists of (about) 30 interviews that are fully transcribed from the Veteran Tapes VP project, and 250 interviews resulting from the Living Oral History Workbench project: - 120 World War II interviews presenting a range of experiences and frames of reference of Dutch soldiers between 1935-1945; - 100 interviews with veterans of the Dutch East Indies. This collection exhibits a large diversity in experiences at the local level in guerilla warfare; - 30 interviews with veterans of New Guinea. This is a relatively unknown conflict with very interesting elements (soldiers left in uncertainty and isolation, and the pressure of the international community to decolonize the area). Each interview lasts between 1 and 1.5 hours.