Mask Analysis Details
This document describes the quality control (QC) metrics calculated by the mask analysis module to assess sample quality in PhenoCycler scans. More technical details beyond this document are written in Overleaf upon request.
Overview
The segmentation analysis module (aegle/segmentation_analysis/segmentation_analysis.py) performs comprehensive quality assessment of cell and nucleus segmentation results. It analyzes both the original segmentation output and the repaired (matched) segmentation to evaluate improvements and identify potential issues.
The module performs three complementary types of analysis:
Segmentation Density & Concordance Analysis (SDCA)
This module implements a set of metrics designed to quantify segmentation mask properties. SDCA provides both global and local density metrics from cell and nucleus masks, and evaluates the concordance between cell and nucleus segmentation before and after repair. Together, these measures serve as sample-level quality indicators and help characterize tissue organization and segmentation performance.
The global metrics are calculated for the entire patch. The local metrics are mean of the statistics calculated by a sliding window approach. By default, the window size is 200 µm.
Recall that segmentation repair is introduced in Segmentation Details.
Inputs
cell_mask: Original cell masknucleus_mask: Original nucleus maskcell_matched_mask: Repaired (matched) cell masknucleus_matched_mask: Repaired (matched) nucleus masknucleus_unmatched_mask: Nuclei present in original segmentation but not matched after repairimage_mpp: Microns per pixel (default = 0.5)window_size_microns: Window size for local density calculation (default = 200 µm)
Outputs
SDCA produces a dictionary of metrics with three main sections for all five masks (cell_mask, nucleus_mask, cell_matched_mask, nucleus_matched_mask, nucleus_unmatched_mask):
-
Global Metrics (
metrics["global"])
Per-mask statistics:n_objects: Number of segmented objectsarea_mm2: Effective tissue area (mm²)density: Objects per mm²
-
Local Metrics (
metrics["local"])
Sliding-window statistics:mean,median,std,quantile_25,quantile_75
-
Local Density List (
metrics["local_density_list"])
Raw per-window statistics in Local Metrics.
Example structure:
{
"global": {
"cell": {"n_objects": 1500, "area_mm2": 0.85, "density": 1764.7},
"nucleus": {...},
"repaired_cell": {...},
"repaired_nucleus": {...},
"unmatched_nucleus": {...}
},
"local": {
"cell": {"mean": 1650.4, "median": 1600.0, "std": 220.5, ...},
"nucleus": {...},
...
},
"local_density_list": {...},
"window_size_microns": 200
}
Intensity Distribution Analysis
Spatial Density Analysis
Key Metrics
1. Matched Fraction
Description: The fraction of nuclei that have been successfully matched with corresponding cell boundaries.
Calculation: matched_fraction = number_of_matched_nuclei / total_number_of_nuclei
Interpretation:
- Higher values (close to 1.0) indicate better segmentation quality
- Low values may indicate segmentation issues or tissue regions with poor cell boundary detection
2. Cell Density Metrics
2.1 Global Density
Description: Overall cell density across the entire patch.
Metrics calculated:
cell_density: Number of cells per mm²nucleus_density: Number of nuclei per mm²matched_cell_density: Number of matched cells per mm²matched_nucleus_density: Number of matched nuclei per mm²unmatched_nucleus_density: Number of unmatched nuclei per mm²
Usage: Helps identify tissue regions and assess overall cellularity.
2.2 Local Density
Description: Cell density calculated at a local neighborhood level for each cell.
Metrics calculated:
local_cell_density: Local density around each celllocal_nucleus_density: Local density around each nucleus- Distribution statistics (mean, std, percentiles)
Usage: Identifies spatial heterogeneity and clustering patterns.
3. Intensity Analysis
3.1 Channel Intensity Bias
Description: Analyzes the intensity differences between original and repaired segmentation masks across all antibody channels.
Purpose:
- Identifies systematic biases introduced by segmentation repair
- Detects channels that may be affected by segmentation artifacts
Visualization: Box plots comparing intensity distributions between:
- Original cell masks vs. matched cell masks
- Original nucleus masks vs. matched nucleus masks
- Unmatched nuclei (nuclei without corresponding cells)
3.2 Intensity Distribution Analysis
Description: Comprehensive visualization of intensity distributions for each antibody channel.
Features:
- Log-scale histograms for better visualization of wide dynamic ranges
- Separate distributions for different mask types
- Channel-by-channel comparison
Configuration
The segmentation analysis can be configured in the main configuration file:
segmentation:
segmentation_analysis: True # Enable/disable the entire analysis
density_analysis:
calculate_global_density: True # Calculate patch-level density
calculate_local_density: True # Calculate cell-level local density
channels_per_figure: 10 # Number of channels per visualization figure
Output Files
1. Patch-level Outputs
Each informative patch generates its own subdirectory containing:
cell_density_visualization/: Density distribution plotschannel_intensity_repair_bias/: Bias analysis visualizationschannel_intensity_distributions_plots/: Intensity distribution plots
2. Dataset-level Outputs
codex_patches_segmentation_analysis.pickle: Complete analysis resultspatches_metadata.csv: Updated with all calculated metrics
Interpretation Guidelines
Red Flags in QC Metrics
- Very low matched fraction (< 0.5): Indicates poor segmentation quality
- Extreme density values: May indicate tissue artifacts or segmentation errors
- Large intensity biases: Suggests the repair process is altering biological signals
- Bimodal density distributions: May indicate distinct tissue regions requiring separate analysis
Best Practices
- Review metrics for outlier patches
- Compare metrics across different tissue types
- Use intensity bias analysis to validate marker expression patterns
- Consider local density patterns when analyzing spatial relationships
Technical Details
Parallel Processing
The analysis uses multiprocessing to handle large datasets efficiently:
- Configurable number of workers (default: 4)
- Each patch is processed independently
- Results are aggregated after all patches complete
Memory Considerations
- Large datasets may require memory management strategies
- Consider processing in batches for very large experiments
- Intensity data is stored per-patch to manage memory usage