You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W0357
Indonesian - English Parallel Corpus
The Indonesian - English Parallel Corpus (PANL-BPPT) is a sentence-aligned corpus of 1 million words in English and Bahasa Indonesian.
It contains:
- 500,000 words from the Penn Treebank Corpus manually translated into Bahasa Indonesian,
- 500,000 words from various online sources translated into English.
It is in the XML format.
Part-of-Speech tags are provided for 500,000 words in Bahasa Indonesian.
Production
Project :
Pan Localization
Applications
application Area :
Research
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Bilingual
Language(s) :
English <<< >>> Indonesian
Alignment :
Sentence
Number of tokens :
1 million words
Annotation Granularity : Word
Annotation level : Morphological
Annotation language : XML
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4