CLARIN Tool Portal

Liner2.5

2 resources

Generic framework for information extraction tasks, including recognition of named entities, temporal expressions, spatial expressions and events.

Use "Liner2.5"

CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)

2 resources

The `corpipe23-corefud1.1-231206` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 23 (https://github.com/ufal/crac2023-corpipe). It is released under the CC BY-NC-SA 4.0 license. The model is language agnostic (no _corpus id_ on input), so it can be used to predict coreference in any `mT5` language (for zero-shot evaluation, see the paper). However, note that the empty nodes must be present already on input, they are not predicted (the same settings as in the CRAC23 shared task).

Use "CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)"

Keyword Extractor

1 resources

Tool for extracting key phrases for text, using TextRank algorithm.

Use "Keyword Extractor"

Cinderella - tool for Clustering and Classifications of Texts in Polish

2 resources

System for clustering and classifications of Texts in Polish. Source code.

Use "Cinderella - tool for Clustering and Classifications of Texts in Polish"

Paralela corpus and search engine

3 resources

Paralela is as an open-ended, opportunistic parallel corpus of Polish-English and English-Polish translations. It currently contains 262 million words in 10,877,000 translation segments. The Paralela online search engine supports the SlopeQ query syntax for bilingual Polish-English corpus queries for the full dataset. Both the full texts and query results can be accessed and exported through the online application at http://paralela.clarin-pl.eu.

Use "Paralela corpus and search engine"

PELCRA for National Corpus of Polish Search Engine 2

2 resources

The PELCRA for NKJP search engine 2 provides access to the full National Corpus of Polish dataset (over 1.5 billion word tokens). In addition to linguistically motivated corpus queries, it supports a number of data exploration and visualisation features. Most of the functionality of the search engine is available through a REST web service. Access to the API is available upon request.

Use "PELCRA for National Corpus of Polish Search Engine 2"

Universal Dependencies 2.6 models for UDPipe 2 (2020-08-31)

2 resources

Tokenizer, POS Tagger, Lemmatizer and Parser models for 99 treebanks of 63 languages of Universal Depenencies 2.6 Treebanks, created solely using UD 2.6 data (https://hdl.handle.net/11234/1-3226). The model documentation including performance can be found at https://ufal.mff.cuni.cz/udpipe/2/models#universal_dependencies_26_models . To use these models, you need UDPipe version 2.0, which you can download from https://ufal.mff.cuni.cz/udpipe/2 .

Use "Universal Dependencies 2.6 models for UDPipe 2 (2020-08-31)"

Corpus2MWE

2 resources

A CCL reader (Corpus2) with MWE detection.

Use "Corpus2MWE"

KPWr n82 NER model (on Polish RoBERTa base)

2 resources

The named entity recognition model for fine-grained categories of entities (82 types) was trained on the KPWr corpus using Polish RoBERTa base language model. Details can be found on the following page: https://github.com/mczuk/xlm-roberta-ner

Use "KPWr n82 NER model (on Polish RoBERTa base)"

SuperMatrix

2 resources

SuperMatrix is a system to support automatic extraction of semantic relations, based on the analysis of large text corpora. System was developed as a tool for expansion of Polish wordnet (Słowosieć).Expansion consist of two steps: system suggests a potential links between lexical units. Linguist verify these suggestions and decide which form will go to wordnet. This speeded up the work and preserve the integrity of data entry.

Use "SuperMatrix"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Organisation

Project

Keywords

Active filters:

Search results

Liner2.5

CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)

Keyword Extractor

Cinderella - tool for Clustering and Classifications of Texts in Polish

Paralela corpus and search engine

PELCRA for National Corpus of Polish Search Engine 2

Universal Dependencies 2.6 models for UDPipe 2 (2020-08-31)

Corpus2MWE

KPWr n82 NER model (on Polish RoBERTa base)

SuperMatrix

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Organisation

Project

Keywords

Active filters:

Search results

Session recording