Table 1

Extracted concepts from CORD-19 Dataset by knowledge base (semantic type) showing the number of unique terms found and the total number of extracted concepts from each knowledge base, as well as the number of papers containing terms from that knowledge base and the percentage coverage across the entire data set

Knowledge baseUnique termsExtracted conceptsPapersCoverage (%)
Body parts1332172 43877 40037
Core knowledge base1434338 552102 03749
Disease or syndrome7195507 819152 40273
Finding5580526 433145 50470
Genome9395419 41386 07341
Immunological factor1845130 99645 91222
Pharmacological substance259958 30830 49415
Symptoms and side effects8883630 116144 06369
Therapeutic or preventive procedure4923332 260111 27754
Virus1308240 99384 32541
Total44 4943 357 328195 95894
  • Papers may contain multiple extracted concepts, and concepts may be found in multiple papers within the knowledge base; hence, we provide both all extracted concepts using the natural language processing tool in addition to the number of unique terms.

  • CORD-19, COVID-19 Open Research Dataset.