You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-U-W0306
SAWA Corpus
This is a parallel corpus of English and Swahili which contains about a million words for each language.
The SAWA Corpus consists of parallel texts, collected from various bilingual documents :
- extracts from the Bible (New Testament),
- extracts from the Quran,
- the UN Declaration of Human Rights,
- movie subtitles,
- example sentences from a bilingual dictionnary English-Swahili,
- bilingual investment reports,
- texts from a local Kenyan translator.
It was tokenized, UTF-8 converted and word-aligned.
Production
Project :
SAWA BOF UA-2007
Creation date :
2009
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Parallel
Language(s) :
English <<< >>> Swahili
Character set :
UTF-8
Alignment :
Word
Saturday 23 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4