COMPARA is a bi-directional parallel corpus based on an open-ended collection of Portuguese-English and English-Portuguese source-texts and translations (so far covering published fiction data).
It can be used to study translation and automatically compare and contrast English and Portuguese.
This corpus is divided into two different subsets : the "balanced corpus" constituted by various types of texts and the "financial corpus" which only contains texts belonging to the economic-financial domain.
It's a collection of 27 translations (1-3 pages long) from the German, English, Italian or Spanish press. It consists of 15,000 words or 637 sentences.
The Aarhus Corpus of Tagged Old Danish Texts (ACOD) is an approximately 36,000 word morphologically tagged corpus of Old Danish texts from the period 1174-ca.1400.