Universal Catalogue  
  You are here » Universal Catalogue » Written Resources » Written Corpora
Language Resources
Search Catalogue
 
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Anglais
Catalog Reference : ELRA-U-W0334
The W3C Corpus
The W3C corpus contains data collected from a crawl of the World Wide Web Consortium’s sites (w3c.org). This includes mailing lists, public webpages (html), and some text derived from other types of files (pdf, ...)

W3C data has been annotated for QA (question/answering) topic relevance for use in TREC Enterprise 2005 and 2006.

The mailing list subset contains nearly 200,000 documents, structured in more than 50,000 threads of emails (reply-to relations and subject overlap).
Contents Click on the arrow to display content.
 written corpus 
 

Joint Copyright © 2008 ELRA & ELDA
Universal Catalogue 1.0.4