Universal Catalogue

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Written Corpora

Displaying 621 to 640 (of 730 products)

Result Pages: [<< Prev] ... 31 32 33 34 35 ... [Next >>]

ELRA-WC0001

BulTreeBank

This is a treebank for Bulgarian annotated with detailed syntactic information.
Language(s) : Bulgarian

Click here for
more information

ELRA-WC2

BLIS Parallel Text (Hong Kong Hansards)

Hong Kong Hansards contains excerpts from the Official Record of Proceedings (hansards) of the Legislative Council of Hong Kong.
Language(s) : Chinese - English

Click here for
more information

ELRA-WC3

EFE News Text

Written data from the EFE news agency.
Language(s) : Spanish

Click here for
more information

ELRA-WC300

Monolingual Web Corpus

3TB of data downloaded from the web and filtered
Language(s) : English

Click here for
more information

ELRA-WC301

MUSE Corpus

It amounts to 300k and consists of annotated data in the domain of news politics.
Language(s) : Greek

Click here for
more information

ELRA-WC302

TimeBank Corpus

The corpus contains 186 news report documents, with a total of 68.5K words.
Language(s) : English

Click here for
more information

ELRA-WC304

Italian Newspaper Corpus

This corpus contains 17 articles for a total of 10,000 words, from the Italian newspaper "il Sole-24 Ore".
Language(s) : Italian

Click here for
more information

ELRA-WC305

MEDLEX Corpus

This is a medical corpus of 50,000 documents for a total of 20 millions tokens.
Language(s) : Swedish (Sweden)

Click here for
more information

ELRA-WC306

Reference Corpus of Written Dutch

The project is still on-going and the corpus is not constructed yet. The aim is to product a 500 million word reference corpus of written Dutch.
Language(s) : Dutch

Click here for
more information

ELRA-WC307

DiaCORIS Corpus

The aim of the project is to extend the CORIS/CODIS Corpus. Thus, the DiaCORIS Corpus will include Italian texts produced between 1861 and 1945. The total side will be 15 million words.
Language(s) : Italian

Click here for
more information

ELRA-WC308

CORIS/CODIS Corpus

The CORIS/CODIS corpus is a reference corpus for modern Italian. It contains texts from the last two decades of the 20th century, for a total of 100-million words.
Language(s) : Italian

Click here for
more information

ELRA-WC309

OVI (Opera del Vocabolario Italiano) Database

It contains about 19 millions words of literary and non literary texts in prose and poetry written in early/old Italian from the beginning of the XIII century to 1375.
Language(s) : Italian

Click here for
more information

ELRA-WC310

BIVIO (Biblioteca Virtuale On-Line) Corpus

It consists of texts in the domain of the history of Italian renaissance fine arts: about 200 literary and essayistic works by about 60 authors of the XV-XVII centuries.
Language(s) : Italian

Click here for
more information

ELRA-WC311

LIZ (Letteratura italiana Zanichelli) Corpus

The corpus contains literary texts, that is to say 1000 works in poetry or prose from the XIII to the XX-century.
Language(s) : Italian

Click here for
more information

ELRA-WC312

The Italian Section of the Biblioteca Digitale IntraText

It consists of 2575 texts, mainly in the domain of religion, thelogy and moral.
Language(s) : Italian

Click here for
more information

ELRA-WC313

The Progetto Manuzio Corpus

It contains about 1200 literary and non literary texts.
Language(s) : Italian

Click here for
more information

ELRA-WC314

Annotated Czech-English Aligned Corpus

"515 sentences from the Prague Czech-English Dependency
Treebank were manually annotated."
Language(s) : Czech - English

Click here for
more information

ELRA-WC315

Hebrew Cantillation Tree Bank

In the Masoretic text of the Hebrew Bible, the cantillation marks the division and subdivision of each verse. This structural information of every verse has been represented as a tree in XML format, constituting a cantillation tree bank.
Language(s) : Hebrew

Click here for
more information

ELRA-WC316

Hunglish Corpus Written Resources

This is a sentence-aligned English–Hungarian parallel corpus. It contains 23.7 million English and 29.4 million Hungarian words in 2.07 million sentence pairs from 5 genres of text.
Language(s) : Hungarian - English

Click here for
more information

ELRA-WC317

Hungarian Webcorpus

This corpus contains 1,48 billion words (589 million were fully filtered) extracted from 18 million pages downloaded from the .hu domain.
Language(s) : Hungarian

Click here for
more information

Displaying 621 to 640 (of 730 products)

Result Pages: [<< Prev] ... 31 32 33 34 35 ... [Next >>]