This is a computer learner corpus for French as a foreign language containing approximately 400 texts (100,000 words) written by Swedish learners of French with different levels of proficiency and by French native speakers.
Two different task types were used: story telling tasks based on picture sequences, and descriptive narratives based on personal experiences.
This is a character-based Chinese language resource. It represents the ideographic writing system of Chinese.
Possible applications of Hantology are: studying the development of Chinese lexicalization, improving Chinese language processing, studying the conceptual structure of Chinese and English lexicons (by comparison with WordNet), comparing different ideographic writing systems.
The collection consists of 34,651 printed documents which were converted to document images by scanning them. The topics include historical, philosophical, cultural, and political.
The BNC is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to offer a wide coverage of spoken and written British English.
Lebanon; leading Arabic publishing house specialized in publishing scientific books in Arabic, especially IT and new technology books. windows-1256 Arabic.