ELRA - ELRA-U-W0386 : Japanese Textbook Corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-U-W0386

Japanese Textbook Corpus

The Japanese Textbook Corpus contains 1,478 samples of text extracted from 127 textbooks (about 1,000,000 characters). It consists of digitized parts of textbooks from elementary schools, junior high schools and high schools (grade 1 to 13) in Japan.

It was compiled as a criterion for readability assessment and to create a method of readability measurement.

Production

Project : Daigo Project (Natural Language Processing Technologies to Enhance Readability)

Creation date : 2008

Applications


application Area : Research

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : Japanese
Number of tokens : 1,000,000 characters