ELRA - ELRA-U-W0324 : Greek biomedical corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W0324

Greek biomedical corpus

The Greek biomedical corpus contains 11.5 million word-forms from periodical articles and conference papers in modern Greek. Around 6,250 documents were collected from the Internet (in htm and pdf format).

Annotation includes structural data, morphosyntactic and semantic tagging, biomedical words and multi-word terms identification. The corpus is annotated in the XML format, following TEI guidelines.

It was collected in the framework of the IATROLEXI project, which aims at developping NLP applications in the biomedicine area, especially for text indexing, information extraction, information retrieval, question answering systems. This project intends to build tools for annotation in the Greek language, as well as lexicon and ontologies for the biomedical terminology.

Production

Project : IATROLEXI

Applications

	Applications possible : Information retrieval
application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : Greek (Greece)
Document source : Internet
Annotation Scheme : TEI
Annotation language : XML