You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W0316
Berlin central station Corpus
This is an English corpus of 1,068 web pages related to the "Berlin central station" ("Berlin Hauptbahnhof" in German), collected from the English results of Google queries.
It contains plain text files converted from HTML and PDF. The 55,255 sentences have been annotated for Name Entities (NE) and 10,773 relation instances have been automatically extracted from these sentences.
This corpus was built in the framework of a study in the area of Information Extraction (IE). It can be used as a test corpus for training of IE systems (for relation extraction, NE recognition, coreference resolution, ...)
Applications
application Area :
Training
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
English
Document source :
Internet
Annotation language : HTML
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4