ELRA - ELRA-U-W 0146 : The LOGON parallel tourist corpus of Norwegian-English texts

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W 0146

The LOGON parallel tourist corpus of Norwegian-English texts

The LOGON corpus is a collection of Norwegian-English parallel texts from the domain of tourism. It is composed of several subcorpora:

- one subcorpus of general tourist texts: 180,000 words in each language, the quality of the translation varies a lot from one text to another.
- three subcorpora based on published books in the hiking domain: the Jotunheimen texts (30,000 words in Norwegian), the Turglede texts (40,000 words) and the Preikestolen texts (4,000 words). The translation of those texts is of high quality.

Texts have been aligned using IMS Corpus Workbench, and tagged with Oslo-Bergen Tagger (for Norwegian) and TreeTagger (for English).

It was designed to serve as training and testing material for the LOGON machine translation project.

Production

Project : LOGON project

Applications


application Area : Research#Tourism

Contents

Click on the arrow to display content.

written corpus
Number of languages : Bilingual
Language(s) : NorwegianEnglish
Alignment : Sentence
Number of tokens : 255,000
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Morphological
Annotation Mode : Automatic