Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 241 to 260 (of 730 products) Result Pages: [<< Prev]  ... 11  12  13  14  15 ...  [Next >>] 

ELRA-U-W 0233
The Swedish Immigrant Newspaper corpus 


The Swedish Immigrant Newspaper Corpus is available in nine different languages: Swedish, Albanian, Arabic, English, Finnish, Persian, Polish, Serbo-Croatian and Spanish.
Language(s) : Swedish - Albanian - Arabic - English - Finnish - Persian - Polish - Spanish - Sardinian

Click here for
more information


ELRA-U-W 0234
Swedish-Turkish Parallel Corpus 


This Swedish Turkish parallel corpus is a balanced corpus composed of fiction and technical documents.

Source language: Swedish (150,000 words)
Target language: Turkish (100,000 words)

Texts are annotated with POS and morphological features. They are automatically aligned at sentence and word levels.
Language(s) : SwedishTurkish

Click here for
more information


ELRA-U-W 0235
Swedish Political Texts 


The Swedish Political Texts are texts from the Swedish government. It is a parallel corpus in 5 languages: German, English, Spanish, French, Swedish. It contains 11,000 words per language.
Language(s) : SwedishGerman - SwedishEnglish - SwedishSpanish - SwedishFrench

Click here for
more information


ELRA-U-W 0236
Syntax-oriented Corpus of Portuguese Dialects (CORDIAL-SIN)


The CORDIAL-SIN is a 500,000 word corpus of European Portuguese. It consists of the transcription of spontaneous and semi-directed oral data collected all over the country in various projects. The aim was to gather a representative corpus of dialects spoken in Portugal.
Data are available in four types: verbatim transcription, normalized transcription, with POS annotation and with syntactic annotation.
Language(s) : Portuguese (Portugal)

Click here for
more information


ELRA-U-W 0237
CORLEX 


CORLEX is a Portuguese corpus that was designed in the objective of compiling a lexicon for European Portuguese. CORLEX was extracted from the CRPC; it contains 6,210,438 words and gathers texts of different types and topics.
Language(s) : Portuguese (Portugal)

Click here for
more information


ELRA-U-W 0238
MiniCors 


MiniCors is a semantically tagged Spanish corpus with 13,477 sentences and 565,782 words. It is partially tagged according to criteria of frequency and polysemy degree.
Language(s) : Spanish

Click here for
more information


ELRA-U-W 0239
MiniCors-Cat 


MiniCors-Cat is a semantically tagged Catalan corpus with 6,722
tagged examples, covering 45,509 sentences and 1,451,778 words. The tagging was made with the dictionary MiniDir-Cat.
Language(s) : Catalan

Click here for
more information


ELRA-U-W 0240
Sejong Corpus 


The Sejong corpus is a Korean raw corpus composed of written and spoken texts. It contains 57 million words plus additional 75 millions of already existing electronic texts.
Language(s) : Korean

Click here for
more information


ELRA-U-W 0241
DGT Multilingual Translation Memory of the Acquis Communautaire (DGT-TM)


The DGT-TM is a translation memory created from the text collection of the Acquis Communautaire. A translation memory is a collection of small text segments (sentences or sentence parts) and their translation.
Language(s) : Bulgarian - Czech - Danish - Dutch - English - Estonian - German - Greek - Finnish - French - Italian - Hungarian - Latvian - Lithuanian - Maltese - Polish - Portuguese - Romanian - Slovak - Spanish - Swedish - Slovene

Click here for
more information


ELRA-U-W 0242
Reference Corpus of Present-day Galician Language (CORGA)


CORGA is a corpus of contemporary Galician (from 1975 to nowadays). It includes 23 million words of different genres.
Language(s) : Galician (Spain)

Click here for
more information


ELRA-U-W 0243
ESF Database 


The ESF Database is a collection of data from five European countries: France, Germany, Great Britain, The Netherlands and Sweden.
It contains transcriptions of second language data from adult immigrant workers living in Western Europe.
Language(s) : English - German - French - Dutch - Swedish

Click here for
more information


ELRA-U-W 0244
ANDES Corpus 


The ANDES corpus is a collection of recorded and transcribed language materials from the Andes.
Language(s) : Quechua - Spanish

Click here for
more information


ELRA-U-W 0245
INTERA Multilingual Corpus 


The INTERA corpus contains 12 million written words in various domains: law, health, education, tourism, environment, politics, finance.
It is a comparable corpus in which texts are aligned at sentence level (TMX standard), annotated at sentence level, morphologically tagged and lemmatized (XCES).

Language pairs: Bulgarian - English, Greek - English, Serbian - English and Slovene - English.
Language(s) : BulgarianEnglish - SerbianEnglish - SloveneEnglish - GreekEnglish

Click here for
more information


ELRA-U-W 0246
CINTIL Corpus 


CINTIL is a linguistically interpreted corpus of Portuguese. It contains 1 million annotated tokens and has been manually verified by linguistic experts.
Language(s) : Portuguese

Click here for
more information


ELRA-U-W 0247
Treebank of Old Indo-European Languages 


This is a parallel treebank of old Indo-European versions of the New Testament. It concerns the Greek, Latin, Gothic, Armenian, Old Church Slavonic languages.
Currently 10,037 sentences have been annotated.
Language(s) :

Click here for
more information


ELRA-U-W 0248
Swiss Text Corpus 


This is a corpus of the German language. It contains German texts of different types (20th century Switzerland).
Language(s) : German (Switzerland)

Click here for
more information


ELRA-U-W 0249
TUNA Reference Corpus 


The TUNA corpus contains descriptions of objects and people in English. It is annotated at the semantic level with a domain representation.
Language(s) : English

Click here for
more information


ELRA-U-W 0250
GREC Corpus 


The GREC corpus contains 2,000 short introductory texts from
Wikipedia entries, including about 18,000 annotated referring expressions.
Language(s) : English

Click here for
more information


ELRA-U-W 0251
Bijankhan corpus 


The Bijankhan corpus is a Persian (Farsi) corpus containing daily news and common texts for a total of 2,6 million words. It has been manually tagged.
Language(s) : Farsi

Click here for
more information


ELRA-U-W 0252
Hamshahri Corpus 


The Hamshahri corpus is a Persian (Farsi) text collection that comprises news texts from the Hamshahri daily newspaper from 1996 to 2002. It contains more that 160,000 news articles about various subjects.
Language(s) : Persian

Click here for
more information


Displaying 241 to 260 (of 730 products) Result Pages: [<< Prev]  ... 11  12  13  14  15 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4