You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W 0146
The LOGON parallel tourist corpus of Norwegian-English texts
The LOGON corpus is a collection of Norwegian-English parallel texts from the domain of tourism. It is composed of several subcorpora:
- one subcorpus of general tourist texts: 180,000 words in each language, the quality of the translation varies a lot from one text to another.
- three subcorpora based on published books in the hiking domain: the Jotunheimen texts (30,000 words in Norwegian), the Turglede texts (40,000 words) and the Preikestolen texts (4,000 words). The translation of those texts is of high quality.
Texts have been aligned using IMS Corpus Workbench, and tagged with Oslo-Bergen Tagger (for Norwegian) and TreeTagger (for English).
It was designed to serve as training and testing material for the LOGON machine translation project.
Production
Project :
LOGON project
Applications
application Area :
Research#Tourism
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Bilingual
Language(s) :
NorwegianEnglish
Alignment :
Sentence
Number of tokens :
255,000
Annotation Coverage : Full
Annotation Granularity : Word
Annotation level : Morphological
Annotation Mode : Automatic
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4