Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Written Corpora
Displaying 41 to 60 (of 730 products) Result Pages: [<< Prev]   1  2  3  4  5 ...  [Next >>] 

ELRA-U-W 0032
The Corpus of Estonian Literary Language (CELL)


This Estonian corpus (mostly newspapers and fiction) is divided in decade, from 1890 to 1999. The texts come in two versions : annotated according to the TEI or unannotated.
Language(s) : Estonian (Estonia)

Click here for
more information


ELRA-U-W 0033
The FIDA Corpus of Slovene Language (FIDA)


The FIDA corpus is a reference corpus of the Slovene language containing 100,000,000 words. It gathers contemporary written texts and transcripts of speech data, from various genres (from literary to scientific texts).
Language(s) : Slovenian (Slovenia)

Click here for
more information


ELRA-U-W 0034
CESS-ESP Spanish Corpus 


This corpus contains 188,650 words of Spanish which have been syntactically annotated within the framework of the CESS-ECE project.
Language(s) : Spanish (Spain)

Click here for
more information


ELRA-U-W 0035
CESS-CAT Catalan Corpus 


This corpus contains 492,846 words of Catalan which have been syntactically annotated within the framework of the CESS-ECE project.
Language(s) : Catalan (Spain)

Click here for
more information


ELRA-U-W 0036
AnCora-ESP Spanish Corpus 


AnCora-ESP is a Spanish corpus of 188,513 words which has been semantically annotated (still under development, aim: 500,000 words).
Language(s) : Spanish (Spain)

Click here for
more information


ELRA-U-W 0037
AnCora-CAT Catalan Corpus 


AnCora-CAT is a Catalan corpus of 395,379 words which have been semantically annotated (still under development, aim: 500,000 words).
Language(s) : Catalan (Spain)

Click here for
more information


ELRA-U-W 0038
CESS-EUS Basque Corpus 


This corpus contains 350,000 words of Basque which have been syntactically annotated within the framework of the CESS-ECE project (still under development).
Language(s) : Basque (Spain)

Click here for
more information


ELRA-U-W 0039
The Europarl Corpus 


The Europarl Corpus is a multilingual collection of texts extracted from the proceedings of the European Parliament. It concerns 11 languages: Danish, German, Greek, English, Spanish, Finnish, French, Italian, Dutch, Portuguese, Swedish. The number of words is close to 55 millions for each language.
Language(s) : Danish (Denmark) - German (Germany) - Greek (Greece) - English (United Kingdom) - Spanish (Spain) - Finnish (Finland) - French (France) - Italian (Italy) - Dutch (Netherlands) - Swedish (Sweden) - Portuguese (Portugal)

Click here for
more information


ELRA-U-W 0040
Danish-English Europarl Corpus 


This Danish-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,684,664 aligned sentences, 43,692,760 words in L1 and 46,282,519 words in L2.
Language(s) : Danish (Denmark)English (United Kingdom)

Click here for
more information


ELRA-U-W 0041
Greek-English Europarl Corpus 


This Greek-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 960,356 aligned sentences.
Language(s) : Greek (Greece)English (United Kingdom)

Click here for
more information


ELRA-U-W 0042
Spanish-English Europarl Corpus 


This Spanish-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,689,850 aligned sentences, 48,860,242 words in L1 and 46,843,295 words in L2.
Language(s) : Spanish (Spain)English (United Kingdom)

Click here for
more information


ELRA-U-W 0043
Finnish-English Europarl Corpus 


This Finnish-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,646,143 aligned sentences, 32,355,142 words in L1 and 45,136,552 words in L2.
Language(s) : Finnish (Finland)English (United Kingdom)

Click here for
more information


ELRA-U-W 0044
French-English Europarl Corpus 


This French-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,723,705 aligned sentences, 51,708,806 words in L1 and 47,915,991 words in L2.
Language(s) : French (France)English (United Kingdom)

Click here for
more information


ELRA-U-W 0045
Italian-English Europarl Corpus 


This Italian-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,635,140 aligned sentences, 46,380,851 words in L1 and 47,236,441 words in L2.
Language(s) : Italian (Italy)English (United Kingdom)

Click here for
more information


ELRA-U-W 0046
Dutch-English Europarl Corpus 


This Dutch-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,715,710 aligned sentences, 47,477,378 words in L1 and 47,166,762 words in L2.
Language(s) : Dutch (Netherlands)English (United Kingdom)

Click here for
more information


ELRA-U-W 0047
Portuguese-English Europarl Corpus 


This Portuguese-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,681,991 aligned sentences, 47,621,552 words in L1 and 47,000,805 words in L2.
Language(s) : Portuguese (Portugal)English (United Kingdom)

Click here for
more information


ELRA-U-W 0048
Swedish-English Europarl Corpus 


This Swedish-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,570,411 aligned sentences, 38,537,243 words in L1 and 42,810,628 words in L2.
Language(s) : Swedish (Sweden)English (United Kingdom)

Click here for
more information


ELRA-U-W 0049
German-English Europarl Corpus 


This German-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,581,107 aligned sentences, 41,587,670 words in L1 and 43,848,958 words in L2.
Language(s) : German (Germany)English (United Kingdom)

Click here for
more information


ELRA-U-W 0050
CAST3LB Spanish Treebank 


CAST3LB is a Spanish treebank of 100,000 words corresponding to 4,000 sentences. The annotation concerns: POS for morphosyntactic information, constituents and functions for syntactic information.
Language(s) : Spanish (Spain)

Click here for
more information


ELRA-U-W 0051
CAT3LB Catalan Treebank 


CAT3LB is a Catalan treebank of 100,000 words corresponding to 2,600 sentences. The annotation concerns: POS for morphosyntactic information, constituents and functions for syntactic information.
Language(s) : Catalan (Spain)

Click here for
more information


Displaying 41 to 60 (of 730 products) Result Pages: [<< Prev]   1  2  3  4  5 ...  [Next >>] 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4