CLARIN Tool Portal

ENIAMtoolkit

2 resources

ENIAMtoolkit is a collection of libraries that: - perform tokenization, lemmatization, part of speech tagging; - detect MWE and abbreviations; - split text into sentences.

Use "ENIAMtoolkit"

ENIAMtoolkit (2017-03-06)

2 resources

ENIAMtoolkit is a collection of libraries that: - perform tokenization, lemmatization, part of speech tagging; - detect MWE and abbreviations; - split text into sentences; - LCG parsing.

Use "ENIAMtoolkit (2017-03-06)"

Ucto tokenizes text files: it separates words from punctuation, and splits sentences. This is one of the first tasks for almost any Natural Language Processing application. Ucto offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation.

Use "ucto"

Tokenizer for Icelandic text (3.4.1) (2022-05-31)

3 resources

Tokenizer is a compact pure-Python (2.7 and 3) executable program and module for tokenizing Icelandic text. It converts input text to streams of tokens, where each token is a separate word, punctuation sign, number/amount, date, e-mail, URL/URI, etc. It also segments the token stream into sentences, considering corner cases such as abbreviations and dates in the middle of sentences. More information at: https://github.com/mideind/Tokenizer

Use "Tokenizer for Icelandic text (3.4.1) (2022-05-31)"

tokenizer

4 resources

Tokenizer is a tool with wich one can design dedicated tokenizers for texts from some domain of interest.

Use "tokenizer"

Tokenizer for Icelandic text (2.3.1)

3 resources

Tokenizer is a compact pure-Python (2.7 and 3) executable program and module for tokenizing Icelandic text. It converts input text to streams of tokens, where each token is a separate word, punctuation sign, number/amount, date, e-mail, URL/URI, etc. It also segments the token stream into sentences, considering corner cases such as abbreviations and dates in the middle of sentences. More information at: https://github.com/mideind/Tokenizer Tokenizer er pakki fyrir Python 2.7 og 3, ásamt skipanalínutóli, sem sér um tilreiðslu íslensks texta. Pakkinn umbreytir inntakstexta í tókastraum. Hver tóki er stakt orð, greinarmerki, tala/upphæð, dags-/tímasetning, netfang, vefslóð o.s.frv. Tólið skiptir tókastraumnum einnig í setningar og tekur tillit til jaðartilvika eins og skammstafana og dagsetninga í miðjum setningum. Frekari upplýsingar á: https://github.com/mideind/Tokenizer

Use "Tokenizer for Icelandic text (2.3.1)"

Tokenizer for Icelandic text (3.4.2) (22.10)

2 resources

Tokenizer is a compact pure-Python 3 executable program and module for tokenizing Icelandic text. It converts input text to streams of tokens, where each token is a separate word, punctuation sign, number/amount, date, e-mail, URL/URI, etc. It also segments the token stream into sentences, considering corner cases such as abbreviations and dates in the middle of sentences. More information at: https://github.com/icelandic-lt/Tokenizer Tokenizer er pakki fyrir Python 3, ásamt skipanalínutóli, sem sér um tilreiðslu íslensks texta. Pakkinn umbreytir inntakstexta í tókastraum. Hver tóki er stakt orð, greinarmerki, tala/upphæð, dags-/tímasetning, netfang, vefslóð o.s.frv. Tólið skiptir tókastraumnum einnig í setningar og tekur tillit til jaðartilvika eins og skammstafana og dagsetninga í miðjum setningum. Frekari upplýsingar á: https://github.com/icelandic-lt/Tokenizer

Use "Tokenizer for Icelandic text (3.4.2) (22.10)"

Tokenizer for Icelandic text (3.3.3)

3 resources

Tokenizer is a compact pure-Python (2.7 and 3) executable program and module for tokenizing Icelandic text. It converts input text to streams of tokens, where each token is a separate word, punctuation sign, number/amount, date, e-mail, URL/URI, etc. It also segments the token stream into sentences, considering corner cases such as abbreviations and dates in the middle of sentences. More information at: https://github.com/mideind/Tokenizer Tokenizer er pakki fyrir Python 2.7 og 3, ásamt skipanalínutóli, sem sér um tilreiðslu íslensks texta. Pakkinn umbreytir inntakstexta í tókastraum. Hver tóki er stakt orð, greinarmerki, tala/upphæð, dags-/tímasetning, netfang, vefslóð o.s.frv. Tólið skiptir tókastraumnum einnig í setningar og tekur tillit til jaðartilvika eins og skammstafana og dagsetninga í miðjum setningum. Frekari upplýsingar á: https://github.com/mideind/Tokenizer

Use "Tokenizer for Icelandic text (3.3.3)"

Tokenizer for Icelandic text (2.0.3)

2 resources

Tokenizer is a compact pure-Python (2 and 3) executable program and module for tokenizing Icelandic text. It converts input text to streams of tokens, where each token is a separate word, punctuation sign, number/amount, date, e-mail, URL/URI, etc. It also segments the token stream into sentences, considering corner cases such as abbreviations and dates in the middle of sentences.

Use "Tokenizer for Icelandic text (2.0.3)"

IceNLP Natural Language Processing toolkit

3 resources

IceNLP is an open source Natural Language Processing (NLP) toolkit for analyzing and processing Icelandic text. The toolkit is implemented in Java. IceNLP er safn málgreiningartóla, gefið út með opnu leyfi, til þess að greina og vinna íslenskan texta. Tólin eru unnin í Java.

Use "IceNLP Natural Language Processing toolkit"

Result filters

Metadata provider

Language

Resource type

Tool task

Field of study

Availability

Organisation

Project

Keywords

Active filters:

Search results

ENIAMtoolkit

ENIAMtoolkit (2017-03-06)

ucto

Tokenizer for Icelandic text (3.4.1) (2022-05-31)

tokenizer

Tokenizer for Icelandic text (2.3.1)

Tokenizer for Icelandic text (3.4.2) (22.10)

Tokenizer for Icelandic text (3.3.3)

Tokenizer for Icelandic text (2.0.3)

IceNLP Natural Language Processing toolkit

Result filters

Metadata provider

Language

Resource type

Tool task

Field of study

Availability

Organisation

Project

Keywords

Active filters:

Search results

Session recording