Web based system for natural language processing of texts in Polish. It allows running complex workflows of language and machine learning tools. Making it avaliable via REST Web Services.
TimeAssign is a program which recognizes temporal expressions and assigns TimeML labels to words in Polish text using a Bi-LSTM based neural net and wordform embeddings.
Tokenizer, POS Tagger, Lemmatizer and Parser models for 90 treebanks of 60 languages of Universal Depenencies 2.4 Treebanks, created solely using UD 2.4 data (http://hdl.handle.net/11234/1-2988). The model documentation including performance can be found at http://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_24_models .
To use these models, you need UDPipe binary version at least 1.2, which you can download from http://ufal.mff.cuni.cz/udpipe .
In addition to models itself, all additional data and value of hyperparameters used for training are available in the second archive, allowing reproducible training.
Przygotował: Michał Marcińczuk <marcinczuk@gmail.com>
Data: 25.05.2016
Projekt: Clarin-PL (http://clarin-pl.eu)
Autorzy: Michał Marcińczuk, Jan Kocoń, Michał Krautforst
Modele do narzędzia Liner2.5 do rozpoznawania jednostek identyfikacyjnych.
Narzędzie Liner2.5 dostępne jest pod linkiem http://hdl.handle.net/11321/231.
Paczka zawiera trzy modele:
1. config-nam.ini -- granice jednostek identyfikacyjnych,
2. config-top9.ini -- granice i ogólna kategoryzacja jednostek (9 kategorii),
3. config-n82.ini -- granice i szczegółowa kategoryzacja jednostek (82 kategorie).
WiKNN is an online text classifier service for Polish and English texts. It supports hierarchical labelled classification of user-submitted texts with Wikipedia categories. WiKNN is available through a web-based interface (http://pelcra.clarin-pl.eu/tools/classifier/) and as a REST service with interactive documentation available at http://clarin.pelcra.pl/apidocs/wiknn.
The `corpipe24-corefud1.2-240906` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 24 (https://github.com/ufal/crac2024-corpipe). It is released under the CC BY-NC-SA 4.0 license.
The model is language agnostic (no corpus id on input), so it can be in theory used to predict coreference in any `mT5` language.
This model jointly predicts also the empty nodes needed for zero coreference. The paper introducing this model also presents an alternative two-stage approach first predicting empty nodes (via https://www.kaggle.com/models/ufal-mff/crac2024_zero_nodes_baseline/) and then performing coreference resolution (via http://hdl.handle.net/11234/1-5673), which is circa twice as slow but slightly better.
Integrated parser is an application that combines and normalizes outputs of several parsers for Polish. It is based on ENIAM processing stream extended with Polish Dependency Parser, Świgra and POLFIE. Particular parsers may turned on and off according to the user requirements.
DG-POLFIE is a prototypical parser that tries to merge parse fragments generated by POLFIE using Polish Dependency Parser
DG-POLFIE aims to improve the coverage of the POLFIE parser (i.e. the percentage of sentences with at least one analysis). In order to increase the number of Polish sentences and constructions that could be parsed with the POLFIE-based parser, DG-POLFIE defines some rules that use depenency structure to build full parse from the FRAGMENTS provided by POLFIE.
The version of the Tool Portal that you are currently using
is recording the behaviour of its user for testing purposes.
By pressing "Continue" below, you agree to the recording of your
actions while using this site. If you do not wish to agree to this,
please navigate away from this site.