You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W0324
Greek biomedical corpus
The Greek biomedical corpus contains 11.5 million word-forms from periodical articles and conference papers in modern Greek. Around 6,250 documents were collected from the Internet (in htm and pdf format).
Annotation includes structural data, morphosyntactic and semantic tagging, biomedical words and multi-word terms identification. The corpus is annotated in the XML format, following TEI guidelines.
It was collected in the framework of the IATROLEXI project, which aims at developping NLP applications in the biomedicine area, especially for text indexing, information extraction, information retrieval, question answering systems. This project intends to build tools for annotation in the Greek language, as well as lexicon and ontologies for the biomedical terminology.
Production
Project :
IATROLEXI
Applications
Applications possible :
Information retrieval
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Greek (Greece)
Document source :
Internet
Annotation Scheme : TEI
Annotation language : XML
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4