Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-W0371
Corpus of Clinical Data
This is a large corpus of discharged letters collected from a medical system used in a hospital in Sweden. It consists of database 'posts' (taken from tables of special interest for text and data mining such as 'clinical history', 'final diagnoses').

This corpus has been automatically annotated for named entities (NE), using the XML identifiers ENAMEX, TIMEX and NUMEX. Annotation includes seven types of NEs: persons, locations, organizations, names of drugs and diseases, time expressions and different types of measure expressions (like 'age', 'temperature', ...).

The aim of annotating the corpus for NEs was to allow easy anonymising of a clinical corpus for further text and data mining tasks.

Size: ~1GB
Production
Creation date : 2007
Applications
Applications possible : Information retrieval
application Area : Research
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4