|
Language Resources |
|
|
|
Search Catalogue |
|
|
|
Send us information |
|
|
|
Languages |
|
|
|
|
|
Displaying 521 to 540 (of 730 products) |
Result Pages: 27 |
This is a 800,000-word skeleton-parsed corpus of computer manuals.
Language(s) : English
|
|
|
|
A subsample of the Associated Press Corpus, containing American newswire reports, annotated to show the reference of pronouns and lexical cohesion. It contains approximately 100,000 words.
Language(s) : English (USA)
|
|
|
|
It consists of 30 million words of written English taken from literature, magazines, papers and more ephemeral materials such as leaflets and packaging.
Language(s) : English
|
|
|
|
It consists of approximately 1,250,000 words of each language and contains EC offical documents on telecommunications. The corpus is part-of-speech tagged and lemmatized.
Language(s) : English - French
|
|
|
|
This is an 1,000,000-word trilingual corpus aligned at the sentence level. It is made up of texts from the telecommunications domain. It has been part-of-speech tagged in all three languages.
Language(s) : English - French - Spanish
|
|
|
|
It contains samples from texts covering the Old, Middle, and Early Modern English periods. It consists of 1,500,000 words in total.
Language(s) : English
|
|
|
|
It consists of approximately one million words of English pamphlet literature covering the years 1640-1740. It is being tagged for part-of-speech and lemmatized.
Language(s) : English
|
|
|
|
It consists of approximately 400,000 words for each language from literary, technical, scientifc and instutitional domains.
Language(s) : English - French
|
|
|
|
It consists of transcribed conversations in family contexts.
Language(s) : Bulgarian
|
|
|
|
It consists of transcripted recordings of broadcasts from the debates of the 7th Great National Assembly on 31 October, 1990.
Language(s) : Bulgarian
|
|
|
|
This Part-of-Speech tagged corpus is a Thai text corpus with syntactic word class annotation. It contains approximately 2,560,000 words.
Language(s) : Thai
|
|
|
|
It contains three texts annotated with prevalently lexical notes.
Language(s) : Bulgarian
|
|
|
|
This electronic literary text archive contains 301 poems in HTML format.
Language(s) : Bulgarian
|
|
|
|
It consists of approximately 275 000 words and represents text genres such as news, legal and poetry, encoded with SGML according to the Corpus Encoding Standard (CES).
Language(s) : Bulgarian
|
|
|
|
It contains various kinds of texts, dating from 1833 to 1988: literary (theatre, poetry…), and no literary texts (legal documents, scientific articles, etc.).
Language(s) : Catalan
|
|
|
|
It consists of chinese e-texts ranging from the classical pre-Qin and Song to the Qing and modern ones: electronic versions of Chinese philosophical texts created by the Confucian Etext Project, electronic versions of Chinese philosophical texts from other sources and information on and links to more information on the preparation and use of these texts.
Language(s) : English
|
|
|
|
This is a corpus of Chinese text segmented into words and annotated with part-of-speech labels and syntactic bracketing, modeled on the English TreeBank. It contains 500 thousand words (over 824K Chinese characters).
Language(s) : English
|
|
|
|
The goal of the Penn Chinese Proposition Bank project is to create a corpus of text annotated with information about basic semantic propositions. Predicate-argument relations are being added to the syntactic trees of the Penn Chinese Treebank.
Language(s) : English
|
|
|
|
It contains 56 Apache texts and their English translation, collected in the summers of 1930 and 1931, and in the spring of 1934.
Language(s) : English
|
|
|
|
This corpus has been compiled in order to study the syntactic and semantic structure of "up/on" and its equivalents in French ("sur") and Dutch ("op"). It consists of 2 million words and it is divided into two subcorpora (fiction and non-fiction).
Language(s) : Dutch - English - French
|
|
|
|
Displaying 521 to 540 (of 730 products) |
Result Pages: 27 |
|
|