Extracted concepts from CORD-19 Dataset by knowledge base (semantic type) showing the number of unique terms found and the total number of extracted concepts from each knowledge base, as well as the number of papers containing terms from that knowledge base and the percentage coverage across the entire data set
Knowledge base | Unique terms | Extracted concepts | Papers | Coverage (%) |
Body parts | 1332 | 172 438 | 77 400 | 37 |
Core knowledge base | 1434 | 338 552 | 102 037 | 49 |
Disease or syndrome | 7195 | 507 819 | 152 402 | 73 |
Finding | 5580 | 526 433 | 145 504 | 70 |
Genome | 9395 | 419 413 | 86 073 | 41 |
Immunological factor | 1845 | 130 996 | 45 912 | 22 |
Pharmacological substance | 2599 | 58 308 | 30 494 | 15 |
Symptoms and side effects | 8883 | 630 116 | 144 063 | 69 |
Therapeutic or preventive procedure | 4923 | 332 260 | 111 277 | 54 |
Virus | 1308 | 240 993 | 84 325 | 41 |
Total | 44 494 | 3 357 328 | 195 958 | 94 |
Papers may contain multiple extracted concepts, and concepts may be found in multiple papers within the knowledge base; hence, we provide both all extracted concepts using the natural language processing tool in addition to the number of unique terms.
CORD-19, COVID-19 Open Research Dataset.