You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W0371
Corpus of Clinical Data
This is a large corpus of discharged letters collected from a medical system used in a hospital in Sweden. It consists of database 'posts' (taken from tables of special interest for text and data mining such as 'clinical history', 'final diagnoses').
This corpus has been automatically annotated for named entities (NE), using the XML identifiers ENAMEX, TIMEX and NUMEX. Annotation includes seven types of NEs: persons, locations, organizations, names of drugs and diseases, time expressions and different types of measure expressions (like 'age', 'temperature', ...).
The aim of annotating the corpus for NEs was to allow easy anonymising of a clinical corpus for further text and data mining tasks.
Size: ~1GB
Production
Creation date :
2007
Applications
Applications possible :
Information retrieval
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
Swedish
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4