Result filters

Metadata provider

Language

Resource type

Availability

  • Share-alike

Active filters:

  • Availability: Share-alike
Loading...
193 record(s) found

Search results

  • CLARIN Concept Registry

    The CCR is a concept registry according to the W3C SKOS recommendation. It was chosen by CLARIN to serve as a semantic registry to overcome semantic interoperability issues with CMDI metadata and different annotation tag sets used for linguistic annotation. The CCR is part of the CMDI metadata infrastructure. The W3C SKOS recommendation, and the OpenSKOS implementation thereof, provides the means for ‘data-sharing, bridging several different fields of knowledge, technology and practice’. According to this model, each concept is assigned a unique administrative identifier, together with information on the status or decision-making process associated with the concept. In addition, concept specifications in the CCR contain linguistic descriptions, such as definitions and examples, and can be associated with a variety of labels. .
  • CMDI to RDF conversion

    There is growing amount of on-line information available in RDF format as Linked Open Data (LOD) and a strong community very actively promotes its use. The publication of information as LOD is also considered an important signal that the publisher is actively searching for information sharing with a world full of new potential users. Added advantages of LOD, when well used, are the explicit semantics and high interoperability. But the problematic modelling by non-expert users offsets these advantages, which is a reason why modelling systems as CMDI are used. The CMDI2RDF project aims to bring the LOD advantages to the CMDI world and make the huge store of CMDI information available to new groups of users and at the same time offer CLARIN a powerful tool to experiment with new metadata discovery possibilities. The CMD2RDFservice was created to allow connection with the growing LOD world, and facilitate experiments within CLARIN merging CMDI with other, RDF based, information sources. One of the promises of LOD is the ease to link data sets together and answer queries based on this ‘cloud’ of LOD datasets. Thus in the enrichment and use cases part of the project we looked at other datasets to link to the CLARIN joint metadata domain. We used the WALS N3 RDF dump for one of the use cases. Although it is in the end relatively easy to go from a specific typological feature to the CMD records via a shared URI, it still showcased a weakness of the Linked Data approach. One has to carefully inspect the property paths involved. And in this case the path was broken as there was no clear way to go from the WALS feature data to the WALS language info except for extracting the WALS language code from the feature URI pattern and insert it the language URI pattern. This showcases that although the big LOD cloud shows potential for knowledge discovery by crossing dataset boundaries, design decisions in the individual datasets can still hamper algorithms and manual inspection is needed. The CMD2RDF service was developed at the TLA/MPI for Psycholinguistics and DANS and later moved to Meertens Institute where the expertise remains.
  • Assamese POS Tagger

    Assamese POS tagger is a CRF++ based POS Tagger. CRF++ is a customizable open source Conditional Random Fields for tagging/labeling continuos text. CRF++ is implemented for generic purpose and can be applied to any natural language provided the tagset. CRF++ tool is designed in C++ language. ------- 1. These Assamese NLP resources including the Tools and Applications are developed during Research and Development Projects as well as Masters and Ph.D. thesis works. 2. These are mainly developed or generated at Gauhati University Department of Computer Science and Department of Information Technology. 3. These resources are used by students and researchers for further studies, researches, as well as for design and development of tools and applications. 4. Computational Linguistics in Assamese is not rich, and Natural Language Processing works have mainly started during last two decades, and most of the resources are first generation resources, and with ample scope for upgrading, enriching, and purifying. 5. These are very good and essential resources for all the researchers in Assamese NLP, as the language requires more and more NLP works to make Assamese a rich media for the digital world. 6. Anyone interested, or in need of such resources may express their interest for the required resources, and the way of availability will be advised/informed accordingly. 7. These are purely research materials and could only be used for further research only. 8. Researchers may visit the NLP Lab of Department of Information Technology, Gauhati University, Guwahati, India or contact us. 9. Researchers interested in collaborative works, and also students for project works, are welcome. 10. Contact person is Professor Shikhar Kr. Sarma, Department of Information Technology, Gauhati University, Guwahati 781014, Assam, India. Email- sks@gauhati.ac.in