Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-W0316
Berlin central station Corpus
This is an English corpus of 1,068 web pages related to the "Berlin central station" ("Berlin Hauptbahnhof" in German), collected from the English results of Google queries.

It contains plain text files converted from HTML and PDF. The 55,255 sentences have been annotated for Name Entities (NE) and 10,773 relation instances have been automatically extracted from these sentences.

This corpus was built in the framework of a study in the area of Information Extraction (IE). It can be used as a test corpus for training of IE systems (for relation extraction, NE recognition, coreference resolution, ...)
Applications
application Area : Training
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4