ELRA - ELRA-WC365 : Bio1 Corpus

You are here » Universal Catalogue » Written Resources » Written Corpora

Language Resources

Search Catalogue

Send us information

Would you like to collaborate ?
Contact Us

Languages

Catalog Reference : ELRA-WC365

Bio1 Corpus

This biomedical English dataset includes 27,476 tokens. The corpus has been annotated for protein names (plus DNA, RNA, cell line, cell type, mono-organism, multi-celled organism, virus, sublocation and tissue).

Contents

Click on the arrow to display content.

written corpus
Number of languages : Monolingual
Language(s) : English
Number of tokens : 27,476 tokens