VK: Verrijkt Koninkrijk (Enriched Kingdom)
Dr Loe de Jong’s Het Koninkrijk der Nederlanden in de Tweede Wereldoorlog remains the most appealing history of German occupied Dutch society (1940-1945). Published between 1969 and 1991, the 14 volumes, consisting of 30 parts and 18,000 pages combine the qualities of an authoritative work for a general audience, and an inevitable point of reference for scholars. In VK this corpus is enriched with:
- Tokenization, sentence splitting, part-of-speech tagging and lemmatization (done with the FROG software from Tilburg University);
- Named entity recognition (done using UvA's NE tagger (specially trained for Dutch within the Stevin DuoMan project));
- Polarity tagging (positive/negative connotation of words) (done using UvA's FietsTas software (developed for Dutch within the Stevin DuoMan project));
- Named entity reconciliation by linking to Wikipedia (done using software developed by Edgar Meij (UvA)).
REST web interface, HTTP GET
De Boer, V., J. van Doornik, L. Buitinck, K. Ribbens, and T. Veken. Enriched Access to a Large War Historical Text using the Back of the Book Index. Extended abstract presented at the Workshop on Semantic Web and Information Extraction (SWAIE 2012), Galway, Ireland, 9 october 2012
L. Buitinck and M.Marx, Two-Stage Named-Entity Recognition Using Averaged Perceptrons in proceedings of NDLB, Groningen, Netherlands, 2012. http://link.springer.com/chapter/10.1007%2F978-3-642-31178-9_17