ELRA - ELRA-U-W0371 : Corpus of Clinical Data

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W0371

Corpus of Clinical Data

This is a large corpus of discharged letters collected from a medical system used in a hospital in Sweden. It consists of database 'posts' (taken from tables of special interest for text and data mining such as 'clinical history', 'final diagnoses').

This corpus has been automatically annotated for named entities (NE), using the XML identifiers ENAMEX, TIMEX and NUMEX. Annotation includes seven types of NEs: persons, locations, organizations, names of drugs and diseases, time expressions and different types of measure expressions (like 'age', 'temperature', ...).

The aim of annotating the corpus for NEs was to allow easy anonymising of a clinical corpus for further text and data mining tasks.

Size: ~1GB

Production

Creation date : 2007

Applications

	Applications possible : Information retrieval
application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : Swedish