Back to news & events

Introducing Differential Features

Share

Introducing Differential Features

Enhancing Explainability in Human Foundation Model (HFM)

A frequent question from our users working with Deepcell’s Human Foundation Model (HFM) is: What does the model “see” to distinguish cells of different morphologies—or even cells with distinct biological characteristics? Dr. Tam, Head of Tech Watch at VIB, aptly summarized the challenge in a recent article:

“AI is both a blessing and a curse. If it confirms your hypothesis, great. If not, it raises more questions, especially when researchers can’t immediately access the underlying logic.”

To address this issue, we’ve introduced a new tool aimed at enhancing the explainability of HFM, which we call the differential features tool. This tool allows users to identify and rank the embedding dimensions that most differentiate one cell group from another. Embedding dimensions are numerical representations of key morphological and biological characteristics encoded by the model. Below, we outline how to use this tool and the intuitive methodology behind it.

 

Differential Features Workflow

The workflow for leveraging the differential features tool is straightforward. It begins with creating a visualization session in Axon – Deepcell’s proprietary data visualization and analysis tool, as illustrated in Figure 1. A visualization session can be created on Axon from a single run or an aggregate of multiple runs on Deepcell’s REM-i platform.

 

An example of an Axon visualization session with a Umap and 3 different cell groups labeled “Control”, “ Drug A”, and “Drug B”
Fig1: An example of an Axon visualization session with a Umap and 3 different cell groups labeled “Control”, “ Drug A”, and “Drug B”

 

Step 1) Define Cell Groups for Analysis

Use Axon’s visualization session to specify the cell groups you want to compare.

 

Step 2) Generate Top Differential Features

The tool identifies and ranks the embedding dimensions and morphometrics that most distinguish the selected cell groups.

Step 3) [Optional] Visualize Feature Distributions

Explore the distribution of values for each top-ranked feature to gain deeper insights into the differences.

Looking under the hood: Methodology

 

Our approach leverages Jensen-Shannon Divergence (JSD) to quantify the dissimilarity between two sets of cell embeddings. Below, we provide an overview of JSD and how it connects to our analysis.

 

What is Jensen-Shannon Divergence?

Jensen-Shannon Divergence (JSD) is a measure of similarity (or dissimilarity) between two probability distributions. It is based on the Kullback-Leibler (KL) divergence, another measure of how one distribution differs from another, but with some important tweaks that make it more intuitive and useful in practice.

Why is JSD Useful?

JSD is widely used in areas like machine learning, natural language processing, and bioinformatics because it has properties that make it better suited for real-world problems:

  1. Symmetry: Unlike KL divergence, which is not symmetric (comparing P to Q gives a different result than comparing Q to P), JSD is symmetric. This means it doesn’t matter which distribution is first.
  2. Bounded Value: JSD values range from 0 to 1 (or 0 to log(2), depending on the base of the logarithm used). A JSD of 0 means the distributions are identical, while higher values indicate greater dissimilarity.
  3. Smooth Behavior: JSD avoids some of the issues of KL divergence, like infinite values when one distribution assigns zero probability to an event that the other distribution considers likely.

How Does It Work?

To compute JSD between two probability distributions P and Q:

a) Find the Average Distribution: Compute the midpoint (or mixture) of P and Q:
This M represents a “reference distribution” that combines both P and Q.

b) Measure Divergence to M:

  • Compute how far P is from M:

  • Compute how far Q is from M:

c) Take the Weighted Average: The JSD is the average of these two divergences:


This process captures how much P and Q deviate from their shared midpoint, providing an intuitive measure of dissimilarity. For a detailed discussion review the paper by Frank Nielsen of Sony Computer Science Laboratories. 

 

Connecting JSD to Cell Morphology

In cellular analysis, embeddings (from deep learning models or morphometrics) may represent distinct morphological features, such as:

  • Size: Overall cell dimensions.
  • Shape: Roundness or irregularities.
  • Texture: Internal structures, like nuclear granularity.
  • Complexity: Combined traits of shape and texture.

By treating these embeddings as distributions, JSD identifies the features that differ most between datasets (e.g., healthy vs. diseased cells). High JSD values indicate features that  distinguish the groups, while low values highlight shared characteristics.

Analyzing Morphotypes with JSD

  1. Represent Morphotypes:
    Represent each cell group (morphotype) as a distribution over embeddings.
  2. Compute JSD for Each Feature:
    Compare the distributions of each embedding dimension between the cell groups (morphotypes).
  3. Rank Features by JSD:
    Identify which features most differentiate the groups, offering critical insights for further analysis: 

    1. A score of 0 to 0.3 indicates that the distributions are very similar (little to no difference between morphotypes for that feature).
    2. A score 0.6 to 1 suggests highly distinct distributions (the feature strongly differentiates the morphotypes)

A Step Towards Explainable AI

As Yuval Noah Harari eloquently stated:

“I think of AI not as artificial intelligence, but as alien intelligence… It processes information in ways fundamentally different from humans. This is both exciting and alarming.”

While AI’s potential to solve problems in unprecedented ways is thrilling, its lack of transparency can be daunting, especially when critical decisions rely on it. At Deepcell, we believe explainability is not just a feature—it’s a necessity. The Differential Features tool represents a step forward in demystifying AI’s decision-making processes, aligning with our mission to make AI more explainable, transparent, and trustworthy.

By enabling users to identify and interpret the key features driving AI-based insights, we hope to foster greater confidence in HFM’s capabilities and inspire researchers to push the boundaries of discovery.

 

Author: Mahyar Salek, CTO

 

Introducing Differential Features
Back to news & events

Share

Ready to get started ?

Start your free trial now