Evaluation Details

This step is the automated segmentation quality assessment using the run_seg_evaluation function implemented in evaluation.py. This evaluation system provides comprehensive metrics to assess the quality of cell segmentation results.

Our function is based on single_method_eval from CellSegmentationEvaluator.

Evaluation Workflow

We use concurrent.futures.ProcessPoolExecutor to evaluate the segmentation results of patches in parallel. Each evaluation process (_process_single_patch) follows these key steps:

Patch Filtering: Only evaluates informative patches (marked as is_informative in metadata)
Cell Count Validation: Skips patches with fewer than 20 cells to ensure statistical reliability
Parallel Processing: Uses 2 worker processes for efficient evaluation across multiple patches
Comprehensive Metrics: Calculates 14 different quality metrics for each patch
Quality Score Generation: Combines metrics into a single quality score using PCA-based model

Key Functions

`run_seg_evaluation()`

Main orchestration function that:

Extracts repaired segmentation results and image data for informative patches
Reshapes image arrays from (batch, w, h, c) to (batch, c, w, h) format
Distributes evaluation tasks across parallel workers using ProcessPoolExecutor
Stores evaluation results in codex_patches.seg_evaluation_metrics

`_process_single_patch()`

Worker function that processes individual patches:

Validates cell count (minimum 20 cells required)
Prepares image dictionary with proper formatting
Calls evaluate_seg_single() for detailed metric calculation
Returns quality score and comprehensive metrics or NaN for failed evaluations

`evaluate_seg_single()`

This function is the core evaluation function that computes detailed quality metrics.

Takes matched cell masks, nucleus masks, and cytoplasm masks
Processes original segmentation results for comparison
Applies image thresholding and foreground/background separation

Quality metrics:

Cell Density Metrics
- NumberOfCellsPer100SquareMicrons: Cell density normalized by area
Coverage Metrics
- FractionOfForegroundOccupiedByCells: How well cells cover tissue regions
- 1-FractionOfBackgroundOccupiedByCells: Background cleanliness
- FractionOfCellMaskInForeground: Mask accuracy in tissue regions
Cell Size Uniformity
- 1/(ln(StandardDeviationOfCellSize)+1): Consistency of cell sizes
Cell-Nucleus Matching
- FractionOfMatchedCellsAndNuclei: Success rate of cell-nucleus pairing
Foreground Quality
- 1/(AvgCVForegroundOutsideCells+1): Uniformity of tissue background
- FractionOfFirstPCForegroundOutsideCells: Principal component analysis of background
Cell Type Clustering Metrics (for nucleus and cytoplasm compartments)
- 1/(AvgOfWeightedAvgCVMeanCellIntensitiesOver1~10NumberOfClusters+1): Cell type consistency
- AvgOfWeightedAvgFractionOfFirstPCMeanCellIntensitiesOver1~10NumberOfClusters: PCA-based clustering quality
- AvgSilhouetteOver2~10NumberOfClusters: Clustering separation quality

Quality Score Calculation

The final quality score is generated using a pre-trained PCA model (2Dv1.5) that:

Standardizes all 14 metrics using pre-computed mean and scale parameters
Projects metrics onto 2 principal components
Calculates exponential weighted score: exp(PC1 × variance_ratio_1 + PC2 × variance_ratio_2)
Returns a single quality score representing overall segmentation quality

Technical Details

Image Processing

Uses mean thresholding for foreground/background separation
Applies morphological operations with disk sizes (1, 2, 20, 10) and area sizes (20000, 1000)
Converts pixel sizes from micrometers to nanometers (config["data"]["image_mpp"] * 1000)

Parallel Processing

Uses concurrent.futures.ProcessPoolExecutor with 2 workers
Maintains result order through indexed futures mapping
Handles exceptions gracefully with NaN placeholders

Error Handling

Patches with insufficient cells (< 20) receive NaN quality scores
Failed evaluations are logged but don't interrupt the overall process
Results maintain consistent indexing with input patches

Output Format

The evaluation results are stored in codex_patches.seg_evaluation_metrics as a list of dictionaries, where each dictionary contains:

{
    "Matched Cell": {
        "NumberOfCellsPer100SquareMicrons": float,
        "FractionOfForegroundOccupiedByCells": float,
        "1-FractionOfBackgroundOccupiedByCells": float,
        "FractionOfCellMaskInForeground": float,
        "1/(ln(StandardDeviationOfCellSize)+1)": float,
        "FractionOfMatchedCellsAndNuclei": float,
        "1/(AvgCVForegroundOutsideCells+1)": float,
        "FractionOfFirstPCForegroundOutsideCells": float
    },
    "Nucleus (including nucleus membrane)": {
        "1/(AvgOfWeightedAvgCVMeanCellIntensitiesOver1~10NumberOfClusters+1)": float,
        "AvgOfWeightedAvgFractionOfFirstPCMeanCellIntensitiesOver1~10NumberOfClusters": float,
        "AvgSilhouetteOver2~10NumberOfClusters": float
    },
    "Cell Not Including Nucleus (cell membrane plus cytoplasm)": {
        "1/(AvgOfWeightedAvgCVMeanCellIntensitiesOver1~10NumberOfClusters+1)": float,
        "AvgOfWeightedAvgFractionOfFirstPCMeanCellIntensitiesOver1~10NumberOfClusters": float,
        "AvgSilhouetteOver2~10NumberOfClusters": float
    },
    "QualityScore": float
}

This comprehensive evaluation system enables automated quality assessment of segmentation results, helping identify well-segmented patches and potential issues in the segmentation pipeline. Currently, we use pickle to save the evaluation results.

with open(os.path.join(args.out_dir, "seg_evaluation_metrics.pkl.gz"), "wb") as f:
    pickle.dump(codex_patches.seg_evaluation_metrics, f)

Evaluation Workflow​

Key Functions​

run_seg_evaluation()​

_process_single_patch()​

evaluate_seg_single()​

Quality metrics:​

Quality Score Calculation​

Technical Details​

Image Processing​

Parallel Processing​

Error Handling​

Output Format​