You are here
»
Universal Catalogue
»
Written Resources
»
Written Corpora
Language Resources
Search Catalogue
Use keywords to find the product you are looking for.
Advanced Search
Send us information
Would you like to collaborate ?
Contact Us
Languages
Catalog Reference : ELRA-WC362
BioNLP Coling 2004 Corpus
This biomedical English dataset contains 650,720 tokens. It has been tokenised with splits occurring at whitespace and sentence punctuation. The corpus has been annotated for protein names (plus DNA, RNA, cell line and cell type).
Contents
Click on the arrow to display content.
written corpus
Number of languages
: Monolingual
Language(s) :
English
Number of tokens :
650,720 tokens
Friday 01 November, 2024
Joint Copyright © 2008
ELRA
&
ELDA
Universal Catalogue 1.0.4