Bản đồ Pharmome- bộ dữ liệu công khai toàn diện để mô hình hóa tương tác thuốc-mục tiêu

Bản đồ Pharmome- bộ dữ liệu công khai toàn diện để mô hình hóa tương tác thuốc-mục tiêu

  • 5 min read
Bản đồ Pharmome- bộ dữ liệu công khai toàn diện để mô hình hóa tương tác thuốc-mục tiêu
Bản đồ Pharmome- bộ dữ liệu công khai toàn diện để mô hình hóa tương tác thuốc-mục tiêu

The Pharmome Map: A Comprehensive Public Dataset for Drug-Target Interaction Modeling

Published: November 18, 2025

By: Elaine McVey Houskeeper, Georgia Channing (Hugging Science)

The Missing Pharmome Map

Drugs are used by billions of people worldwide every year. Despite their widespread use, our understanding of their complete effects on the human body remains limited. Pharmaceutical companies primarily focus on developing drugs with a specific target in mind, and the process often overlooks a comprehensive understanding of all the ways a drug might interact with other proteins in the body.

Currently, our knowledge of drug-target interactions resembles a very sparse matrix, where we know the primary target of a drug and some off-target effects, but most of the matrix remains a mystery.

Imagine, instead, having a complete map that quantifies the interaction of every approved drug with every potential protein it can act upon – this is the pharmome map.

With this complete map, we can address crucial questions like:

  • What patterns of drug activity are linked to adverse drug events?
  • Are a drug’s effects driven by its primary target alone, or by interactions with multiple targets (polypharmacology)?
  • How do drug interactions in the pharmome map inform the effects of combining different medications (polypharmacy)?
  • Can we identify drugs suitable for treating new conditions (drug repurposing) by analyzing their target activity patterns?
  • Can we predict the target activity patterns of new compounds based on their structure (structure-activity relationship modeling)?

The pharmome map, though previously incomplete, can be enriched by vast amounts of data from sources like clinical trials, adverse event reports, individual health records, and various “-omics” databases. A complete pharmome map offers new insights into these outcomes and their relationship to drug activity.

Image of the Pharmome Map

Mapping is Underway!

EvE Bio is a non-profit organization (part of Convergent Research’s Focused Research Organization initiative) dedicated to creating and publicly sharing the pharmome map. They develop standardized assays for different classes of drug targets and conduct high-throughput screening to gather quantitative measurements. By prioritizing dataset creation, EvE Bio is able to produce a comprehensive, consistent dataset ideal for machine learning. Their publicly available dataset is already the largest of its kind and is continuously expanding.

Currently, EvE Bio is focusing on a 1,397-compound library, largely composed of FDA-approved small molecule drugs, tested against key classes of drug targets. These targets were chosen for their therapeutic relevance, small molecule druggability, and suitability for large-scale in vitro assays. The three main target classes are:

  • Nuclear Receptors (NRs): These receptors directly control gene expression and influence long-term cellular behavior. They are targets for over 10% of approved small molecule drugs and can be modulated in various ways (agonist, antagonist, inverse agonist). Their activity is measured through biochemical co-factor recruitment assays.
  • 7-Transmembrane Receptors (7TMs / GPCRs): These receptors on the cell surface detect extracellular signals and trigger intracellular responses. They are targets for over a third of FDA-approved drugs across various therapeutic areas. 7TMs are highly druggable and can be selectively activated. Their activity is measured using cell-based assays.
  • Protein Kinases (PKs): These enzymes regulate many cellular processes by catalyzing phosphorylation. They are increasingly targeted for drugs, especially in cancer treatment. Their activity is measured through biochemical competition-based ligand binding assays.

In 2026, the number of 7TM and PK targets in EvE Bio’s dataset will triple, including data on G-protein and β-arrestin for 7TMs, enabling the modeling of biased signaling – a key area for improving drug design.

In addition to these target classes, the dataset includes measurements of cell viability for each compound, reflecting cytotoxic effects. This data is crucial for interpreting 7TM antagonism results, as cell death can sometimes mimic antagonism in cell-based assays.

Data Structure

The core measurements in the dataset are compound activity and potency. For each compound-assay combination, we have:

  • outcome_is_active: Indicates if the compound showed activity.
  • outcome_max_activity: The maximum observed activity as a percentage of a reference standard.
  • is_quantified: Whether the compound’s potency could be measured within the tested concentration range.
  • outcome_potency_pxc50: Quantified potency measured as pXC50 (the negative log of the IC50/EC50 concentration). Higher pXC50 values indicate higher potency. The lowest quantifiable pXC50 in this dataset is 5.

Image illustrating Data Structure

EvE Bio uses a two-phase screening process:

  1. Screening Phase: All compound-assay combinations are tested at three concentrations, with two replicates.
  2. Profiling Phase: Compounds that meet specific criteria based on the screening phase advance to this phase, where a wider concentration range (10 μM to 10 pM) is tested to determine full concentration-response curves. Low-potency compounds may be reported as active but not quantified due to the concentration limits.

The dataset also includes flags for potential assay interference:

  • viability_flag: Indicates if cell viability was affected.
  • frequency_flag: Flags compounds that frequently appeared in a target class assay, which could suggest non-specific interference.

The dataset is structured with one row per target, compound, mode, and mechanism. NRs and 7TMs have two modes each, while PKs and cell viability have one. Various identifiers are provided for both compounds (SMILES, InChIkey, CAS #, UNII, DrugBank ID) and targets (gene, Uniprot ID, mutant/wildtype indicators).

Ready to Get Started?

You can access the dataset using the Hugging Face datasets library:

python from datasets import load_dataset

Login using e.g. huggingface-cli login to access this dataset

ds = load_dataset(“eve-bio/drug-target-activity”)

Alternatively, view the dataset on Hugging Face here.


Community Discussions

Start discussing this article!

(Sign up or log in to comment)

Recommended for You

QAT- Nghệ thuật phát triển một mô hình Bonsai

QAT- Nghệ thuật phát triển một mô hình Bonsai

Không có mô tả

Apriel-H1- Chìa khóa bất ngờ để chưng cất các mô hình suy luận hiệu quả

Apriel-H1- Chìa khóa bất ngờ để chưng cất các mô hình suy luận hiệu quả

Apriel-H1- Chìa khóa bất ngờ để chưng cất các mô hình suy luận hiệu quả