Universal Catalogue

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Written Corpora

Displaying 41 to 60 (of 730 products)

Result Pages: [<< Prev] 1 2 3 4 5 ... [Next >>]

ELRA-U-W 0032

The Corpus of Estonian Literary Language (CELL)

This Estonian corpus (mostly newspapers and fiction) is divided in decade, from 1890 to 1999. The texts come in two versions : annotated according to the TEI or unannotated.
Language(s) : Estonian (Estonia)

Click here for
more information

ELRA-U-W 0033

The FIDA Corpus of Slovene Language (FIDA)

The FIDA corpus is a reference corpus of the Slovene language containing 100,000,000 words. It gathers contemporary written texts and transcripts of speech data, from various genres (from literary to scientific texts).
Language(s) : Slovenian (Slovenia)

Click here for
more information

ELRA-U-W 0034

CESS-ESP Spanish Corpus

This corpus contains 188,650 words of Spanish which have been syntactically annotated within the framework of the CESS-ECE project.
Language(s) : Spanish (Spain)

Click here for
more information

ELRA-U-W 0035

CESS-CAT Catalan Corpus

This corpus contains 492,846 words of Catalan which have been syntactically annotated within the framework of the CESS-ECE project.
Language(s) : Catalan (Spain)

Click here for
more information

ELRA-U-W 0036

AnCora-ESP Spanish Corpus

AnCora-ESP is a Spanish corpus of 188,513 words which has been semantically annotated (still under development, aim: 500,000 words).
Language(s) : Spanish (Spain)

Click here for
more information

ELRA-U-W 0037

AnCora-CAT Catalan Corpus

AnCora-CAT is a Catalan corpus of 395,379 words which have been semantically annotated (still under development, aim: 500,000 words).
Language(s) : Catalan (Spain)

Click here for
more information

ELRA-U-W 0038

CESS-EUS Basque Corpus

This corpus contains 350,000 words of Basque which have been syntactically annotated within the framework of the CESS-ECE project (still under development).
Language(s) : Basque (Spain)

Click here for
more information

ELRA-U-W 0039

The Europarl Corpus

The Europarl Corpus is a multilingual collection of texts extracted from the proceedings of the European Parliament. It concerns 11 languages: Danish, German, Greek, English, Spanish, Finnish, French, Italian, Dutch, Portuguese, Swedish. The number of words is close to 55 millions for each language.
Language(s) : Danish (Denmark) - German (Germany) - Greek (Greece) - English (United Kingdom) - Spanish (Spain) - Finnish (Finland) - French (France) - Italian (Italy) - Dutch (Netherlands) - Swedish (Sweden) - Portuguese (Portugal)

Click here for
more information

ELRA-U-W 0040

Danish-English Europarl Corpus

This Danish-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,684,664 aligned sentences, 43,692,760 words in L1 and 46,282,519 words in L2.
Language(s) : Danish (Denmark)English (United Kingdom)

Click here for
more information

ELRA-U-W 0041

Greek-English Europarl Corpus

This Greek-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 960,356 aligned sentences.
Language(s) : Greek (Greece)English (United Kingdom)

Click here for
more information

ELRA-U-W 0042

Spanish-English Europarl Corpus

This Spanish-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,689,850 aligned sentences, 48,860,242 words in L1 and 46,843,295 words in L2.
Language(s) : Spanish (Spain)English (United Kingdom)

Click here for
more information

ELRA-U-W 0043

Finnish-English Europarl Corpus

This Finnish-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,646,143 aligned sentences, 32,355,142 words in L1 and 45,136,552 words in L2.
Language(s) : Finnish (Finland)English (United Kingdom)

Click here for
more information

ELRA-U-W 0044

French-English Europarl Corpus

This French-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,723,705 aligned sentences, 51,708,806 words in L1 and 47,915,991 words in L2.
Language(s) : French (France)English (United Kingdom)

Click here for
more information

ELRA-U-W 0045

Italian-English Europarl Corpus

This Italian-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,635,140 aligned sentences, 46,380,851 words in L1 and 47,236,441 words in L2.
Language(s) : Italian (Italy)English (United Kingdom)

Click here for
more information

ELRA-U-W 0046

Dutch-English Europarl Corpus

This Dutch-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,715,710 aligned sentences, 47,477,378 words in L1 and 47,166,762 words in L2.
Language(s) : Dutch (Netherlands)English (United Kingdom)

Click here for
more information

ELRA-U-W 0047

Portuguese-English Europarl Corpus

This Portuguese-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,681,991 aligned sentences, 47,621,552 words in L1 and 47,000,805 words in L2.
Language(s) : Portuguese (Portugal)English (United Kingdom)

Click here for
more information

ELRA-U-W 0048

Swedish-English Europarl Corpus

This Swedish-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,570,411 aligned sentences, 38,537,243 words in L1 and 42,810,628 words in L2.
Language(s) : Swedish (Sweden)English (United Kingdom)

Click here for
more information

ELRA-U-W 0049

German-English Europarl Corpus

This German-English parallel corpus is extracted from the proceedings of the European Parliament (04/1996-10/2009). It contains 1,581,107 aligned sentences, 41,587,670 words in L1 and 43,848,958 words in L2.
Language(s) : German (Germany)English (United Kingdom)

Click here for
more information

ELRA-U-W 0050

CAST3LB Spanish Treebank

CAST3LB is a Spanish treebank of 100,000 words corresponding to 4,000 sentences. The annotation concerns: POS for morphosyntactic information, constituents and functions for syntactic information.
Language(s) : Spanish (Spain)

Click here for
more information

ELRA-U-W 0051

CAT3LB Catalan Treebank

CAT3LB is a Catalan treebank of 100,000 words corresponding to 2,600 sentences. The annotation concerns: POS for morphosyntactic information, constituents and functions for syntactic information.
Language(s) : Catalan (Spain)

Click here for
more information

Displaying 41 to 60 (of 730 products)

Result Pages: [<< Prev] 1 2 3 4 5 ... [Next >>]