CLARIN Tool Portal

698 record(s) found

Search results

WIP: War in Parliament

An advanced search engine for the OCR-ed scanned image collection of proceedings of the Dutch Hansard (Handelingen der Staten-Generaal 1930-1995). These proceedings are available as a fully annotated semi-structures dataset for historical and social science research. The output of the search engine can be restricted by speaker name, party, date range, and other criteria. References to the Second World War (WW II) have shaped political debate in the Netherlands for many decades. However, we have no systematic knowledge of why, how often, when, by whom or from which political party, and in which context, these references were made. Nor do we know the meanings politicians ascribed to the war years, the lessons the war was supposed to teach, and how all of this influenced political decision-making. WIP helps answering these questions and will help us better understand the complex legacies of WW II. The WIP project bridges the gap between historical and social science practices and the possibilities offered by using large corpora and language resources, in particular Clarin tools for Dutch. The dataset - de Handelingen der Staten-Generaal (Dutch Hansard) - are made compliant with Clarin, ISOCAT and ISO/TC 37/SC 4 standards. The search engine for this dataset uses an intuitive and powerful query language based on XPath, and its output can be fed directly into further analysis programs like SPSS. Integrating this technology with important historical research questions will directly contribute to new and innovative ways of writing about history. The search engine results can be exported in a CSV-format (comma seperated values). This makes it easy to calculate statistics offline from a result set and apply further filters.

Marx, M. (2011), Oorlog in de Kamer, NRC, March 3, 2011

‘Waarom politici graag over de oorlog praten’, NRC-Handelsblad, 25 februari 2011

‘Zoekmachine vindt relevante WO2-verwijzingen in Handelingen der Staten Generaal. Dat doet denken aan de oorlog’, in: E-data & research, Jaargang 6, nummer 2, oktober 2011 http://www.edata.nl/0602_011011/pdf/0602_011011_1.pdf

‘NIOD ontwikkelt zoekmachine die verwijzingen naar de oorlog opspoort’, in: Informatie Professional. Vakblad voor informatiewerkers, nr. 11 (2011)

L. Buitinck en M. Marx (2012), ‘Two-stage named entity recognition using averaged perceptrons’, in: Proc. NLDB 2012, pp. 171-176
Dupira; the Dutch Parser for IR Applications

Dupira is a rule-based parser, generated by means of the AGFL parser generator from the Dupira grammar, lexicon and fact tables. By means of transductions which are specified in the grammar (and can be modified), the parser transduces sentences to dependency graphs. Dupira was developed for practical applications in Information Retrieval and for Information Systems needing a Natural Language interface. Its intended users are computer scientists and computer professionals rather than linguists.

Use "Dupira; the Dutch Parser for IR Applications"
DSS: Dutch Ships and Sailors

A tool chain and methodology for converting legacy datasets in the area of maritime history. Set up to facilitate over 25 data sets, the initial population consists of 4 selected maritime-historical datasets. The maritime industry has been central to regional and global economic, social and cultural exchange. It is also one of the best historically documented sectors of human activity. Many aspects of it have been recorded by shipping companies, governments, newspapers and other institutions. In the past few decades, much of the data in the preserved historical source material has been digitized. Among the most interesting data are those on shipping movement and crew members. The Dutch Republic in the 17th and 18th centuries had to rely to a large extent on immigration to man its fleet. Especially in Asian waters, it also relied on Asian crews. Often information that deals with the same shipping movements and crew composition is spread over several historical sources and hence over several databases. The data often refers to the same ‘places’, ‘ships’, ‘persons’ and ‘events’. By linking the different available databases, the data complements and amplifies each other, and new research possibilities open up. Ideally, we would want to follow a ship from port to port, and crew members pursuing their careers from ship to ship. The Dutch Ships and Sailors project provides a tool chain and methodology for converting legacy datasets. The infrastructure includes common vocabularies to normalize and enrich existing data. Links are established between the datasets and to other relevant datasets. In doing so, Dutch Ships and Sailors builds a (semantic) web-based structure that aims to function as a future platform and infrastructure for maritime historical datasets. Initially, this portal contains the following datasets: - Historische Kranten of the Koninklijke Bibliotheek; - The Monsterrollen databases contains elaborate data on the crew composition of ships from the Northern Netherlands (c. 1800-1930) and provides information on the sailors involved, such as the places of origin, wage and age; - The databases VOC Opvarenden, providing extensive data on crews of VOC ships leaving the Republic; - The database Dutch-Asiatic Shipping, providing data on all inter-continental voyages of VOC ships; - The database Generale Zeemonsterrollen, providing data on the crew composition and sometimes location of VOC ships stationed in Asia and not engaged in inter-continental shipping.

Victor de Boer, Jur Leinenga, Matthias van Rossum and Rik Hoekstra. Dutch Ships and Sailors Linked Data Cloud. AcIn Proceedings of the International Semantic Web Conference (ISWC 2014), 19-23 October, Riva del Garda, Italy, 2014.

A. Bravo Balado. Information extraction on newspaper archives for historical research. a dutch maritime history case study. M.Sc. thesis VU University Amsterdam (forthcoming), 2014.

Andrea Bravo Balado, Victor de Boer, and Guus Schreiber. Linking historical ship records to a newspaper archive. Proceedings of the 6th International Conference on Social Informatics (workshops). LNCS. ed. Luca Maria Aiello, Daniel McFarland, 2014.

R.Ponstein. Reconciling dutch ships and sailors. M.Sc. thesis VU University Amsterdam, 2014.

Use "DSS: Dutch Ships and Sailors"
WAHSP/BILAND: web application for (bilingual) historical sentiment mining in public media

WAHSP/BLAND has been succeeded by TexCavator: http://texcavator.surfsaralabs.nl/

WAHSP/BILAND is a research tool for historians that uses textual data of news media from the period 1863-1940 of the Koninklijke Bibliotheek and Staatsbibliothek zu Berlin as input material. One can search with single query terms or with combinations thereof. Apart from showing the articles that match the query, the results can be visualized by word clouds of single articles together with sentiment words highlighted, or by a word cloud of the whole result set together with newspaper statistics derived from their metadata. WAHSP/BILAND enables historians to collect and process large bi-lingual (Dutch and German) sets of opinionated text-data from news media and extract discourse identity and intensity patterns in two different countries with different scripts (e.g. Latin and Gothic). This tool offers a unique opportunity for non-technical humanities researchers to perform a new kind of historical e-research for studying changing opinions, notions and perceptions regarding public health and policy issues. The text mining tools for opinion/sentiment extraction that form the technological base for WAHSP/BILAND have been developed within the NTU/STEVIN DuOMAn project. The technology includes algorithms and tools for identification of polarity (positive/support or negative/criticism), sources (opinion-holders), frequency of items and specific targets of discourses. The tools and subjectivity lexicons are implemented as modules of ‘Fietstas’ 2, an web service for text analysis. Fietstas also provides other essential text processing modules (morphological normalization, format and encoding reconciliation, named entity recognition and normalization, etc.) and visualization modules (interactive word clouds and timelines). Fietstas has been developed and is being used for processing of large-scale datasets in the context of several projects, such as DuOMAn. A text translation service based on Machine Learning can be used to translate existing lexicons and documents between Dutch and German (both directions). The web application uses this functionality of Fietstas to leverage interactive creation, expansion and refinement of lexicons specific to the user’s research questions and needs. For BILAND new bilingual and biscriptural lexicons have been developed. The application uses the visualization features of Fietstas to allow users to examine the research domain along the dimensions of time, context, and the identity and frequency of the discourse. WAHSP/BILAND is meant to be generic and testable in all domains, where analysis of topics, contexts and attitudes in large volumes of text is needed.

Snelders, S, Huijnen, P, Verheul, J, de Rijke, M and Pieters. T. 2017. A Digital Humanities Approach to the History of Culture and Science: Drugs and Eugenics Revisited in Early 20th-Century Dutch Newspapers, Using Semantic TextMining. In:Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 325–336. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.27. License: CC-BY 4.0
Croatian Gemma-based Beta Large Language Model

7 resources

Use "Croatian Gemma-based Beta Large Language Model"
Croatian Pythia-based Beta Large Language Model

7 resources

Use "Croatian Pythia-based Beta Large Language Model"
HR-GPT Beta Large Language Model

3 resources

Use "HR-GPT Beta Large Language Model"
Ilvars - Latvian Male VITS Text-to-Speech Model (vers. 2023)

A neural model for text-to-speech (TTS) synthesis in Latvian. Trained using VITS on a 25-hour speech corpus of audiobooks read in a male voice. Available for academic and non-commercial purposes via an API. To get access to the API, please, send a request to info@ailab.lv.

Use "Ilvars - Latvian Male VITS Text-to-Speech Model (vers. 2023)"

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

WIP: War in Parliament

Dupira; the Dutch Parser for IR Applications

DSS: Dutch Ships and Sailors

WAHSP/BILAND: web application for (bilingual) historical sentiment mining in public media

Croatian Gemma-based Beta Large Language Model

Croatian Pythia-based Beta Large Language Model

HR-GPT Beta Large Language Model

Ilvars - Latvian Male VITS Text-to-Speech Model (vers. 2023)

Result filters

Metadata provider

Language

Resource type

Type of tool

Tool task

Field of study

Availability

Organisation

Project

Keywords

Search results

Session recording