Skip to main content

Outputs

This document provides a comprehensive guide to all outputs generated by the Main pipeline. Understanding these outputs is essential for downstream analysis and quality assessment.

Output Directory Structure

The Main pipeline generates outputs in a structured format:

experiment_output_directory/
├── cell_profiling/ # Single-cell analysis results
│ ├── patch-0-cell_by_marker.csv # Per-cell marker expression
│ ├── patch-0-cell_metadata.csv # Cell morphology and statistics
│ ├── patch-1-cell_by_marker.csv
│ └── patch-1-cell_metadata.csv
├── visualization/ # Optional visualization outputs
│ ├── whole_sample_rgb.png # Full image overview (generated during preprocess)
│ ├── patches_rgb/ # Individual patch visualizations
│ └── segmentation_overlays/ # Segmentation mask overlays
├── channel_stats.csv # Channel-level statistics
├── extracted_channel_patches.npy.zst # Processed image patches
├── original_seg_res_batch.pickle # Raw segmentation results
├── matched_seg_res_batch.pickle # Processed segmentation results
├── patches_metadata.csv # Patch-level metadata and QC
├── copied_config.yaml # Configuration record
└── processing_log.txt # Execution log and timing
└── seg_evaluation_metrics.pkl.gz # Segmentation evaluation results

Core Output Files

Single-Cell Profiling Data

The cell_profiling/ directory contains the primary analytical outputs for single-cell analysis.

patch-{i}-cell_by_marker.csv

Purpose: Per-cell marker expression quantification for downstream analysis

Structure: Each row represents a single cell, with columns for cell identifier and marker intensities.

ColumnTypeDescriptionExample Values
cell_idintegerUnique cell identifier within patch1, 2, 3, ...
DAPIfloatMean nuclear marker intensity145.23, 289.67
Pan-CytokeratinfloatMean membrane marker intensity78.45, 156.89
CD3floatMean T-cell marker intensity12.34, 234.56
CD8floatMean cytotoxic T-cell marker intensity5.67, 189.23
...floatAdditional marker intensities...

File Size: Typically 50KB - 5MB per patch depending on cell count and marker number Usage: Primary input for clustering, phenotyping, and statistical analysis

patch-{i}-cell_metadata.csv

Purpose: Comprehensive cell morphology and quality metrics for each cell

Structure: Each row represents a single cell with extensive morphological and statistical features.

Column CategoryColumnsTypeDescription
Identitycell_idintegerUnique cell identifier matching marker data
Basic MorphologyareafloatCell area in pixels
centroid_x, centroid_yfloatCell center coordinates
perimeterfloatCell boundary length in pixels
convex_areafloatArea of convex hull around cell
axis_major_length, axis_minor_lengthfloatMajor/minor axis lengths
eccentricityfloatShape eccentricity (0=circle, 1=line)
Intensity Quality{marker}_covfloatCoefficient of variation per marker
{marker}_laplacian_varfloatLocal intensity variation measure
Overall Qualitycell_entropyfloatShannon entropy of marker distribution

File Size: Typically 100KB - 10MB per patch Usage: Quality filtering, morphological analysis, spatial analysis

Image and Processing Data

channel_stats.csv

Purpose: Summary statistics for intensity values across all image channels

ColumnTypeDescriptionExample
ChannelstringChannel/marker name"DAPI", "CD3"
MinfloatMinimum intensity value0.0
MedianfloatMedian intensity value45.67
MaxfloatMaximum intensity value4095.0
95%float95th percentile intensity234.56
MeanfloatMean intensity value78.45
Std DevfloatStandard deviation123.45

File Size: Small (< 10KB) Usage: Quality control, intensity normalization, channel selection

extracted_channel_patches.npy.zst

Purpose: Processed image patches ready for segmentation Format: NumPy array with dimensions (num_patches, patch_height, patch_width, num_channels) Channels: Typically [nucleus, wholecell] channels Data Type: float32 or uint16 File Size: Large (100MB - 10GB depending on image size and patch count) Usage: Input for segmentation algorithms, visualization

patches_metadata.csv

Purpose: Quality control and metadata for each image patch

ColumnTypeDescriptionTypical Range
patch_idintegerUnique patch identifier0, 1, 2, ...
height, widthintegerPatch dimensions in pixels1000, 2000
nucleus_meanfloatMean nuclear channel intensity10-500
nucleus_stdfloatNuclear channel standard deviation5-200
nucleus_non_zero_percfloatFraction of non-zero nuclear pixels0.0-1.0
wholecell_meanfloatMean wholecell channel intensity5-300
wholecell_stdfloatWholecell channel standard deviation3-150
wholecell_non_zero_percfloatFraction of non-zero wholecell pixels0.0-1.0
is_emptybooleanPatch marked as empty tissuetrue/false
is_noisybooleanPatch marked as too noisytrue/false
is_bad_patchbooleanOverall patch quality flagtrue/false
is_informativebooleanPatch suitable for analysistrue/false

File Size: Small to medium (10KB - 1MB) Usage: Patch filtering, quality assessment, debugging

Segmentation Results

original_seg_res_batch.pickle

Purpose: Raw segmentation masks before post-processing Format: Pickled list of dictionaries, one per patch Contents per patch:

  • cell: Cell segmentation mask (uint16, shape (height, width))
  • nucleus: Nucleus segmentation mask (uint16, shape (height, width))

File Size: Medium to large (10MB - 1GB) Usage: Debugging, alternative post-processing, method comparison

matched_seg_res_batch.pickle

Purpose: Processed segmentation masks with cell-nucleus matching Format: Pickled list of dictionaries, one per patch Contents per patch:

  • cell_matched_mask: Cells successfully matched to nuclei (uint16)
  • nucleus_matched_mask: Nuclei successfully matched to cells (uint16)
  • cell_outside_nucleus_mask: Cell regions excluding nucleus (uint16)
  • matched_fraction: Fraction of cells successfully matched (float, 0-1)

File Size: Medium to large (10MB - 1GB) Usage: Final analysis, quality assessment, downstream processing

Configuration and Logs

copied_config.yaml

Purpose: Complete configuration record for reproducibility Content: Exact copy of configuration used for processing File Size: Small (< 10KB) Usage: Reproducibility, debugging, parameter tracking

processing_log.txt

Purpose: Detailed execution log with timing information Content: Timestamped processing steps, warnings, performance metrics File Size: Small to medium (10KB - 10MB) Usage: Debugging, performance analysis, quality control

Optional Visualization Outputs

When visualization features are enabled, additional outputs are generated:

visualization/whole_sample_rgb.png

  • Status: Generated by the preprocess overview module; the main pipeline skips this step.
  • Usage: RGB overview of the entire sample when preprocess visualization is enabled.

visualization/patches_rgb/patch-{i}.png

  • Purpose: Individual patch visualizations
  • Content: RGB composite of each patch
  • File Size: 1-10MB per patch
  • Configuration: visualization.visualize_patches: true

visualization/segmentation_overlays/patch-{i}_overlay.png

  • Purpose: Segmentation quality assessment
  • Content: Original image with segmentation masks overlaid
  • File Size: 5-50MB per patch
  • Configuration: visualization.visualize_segmentation: true

File Size Estimates (Placeholder numbers)

Output TypeSmall Image (<10K x 10K)Medium Image (10K-30K)Large Image (>30K x 30K)
Cell Profiling1-10MB10-100MB100MB-1GB
Image Patches100MB-1GB1-5GB5-20GB
Segmentation10-100MB100MB-1GB1-5GB
Visualization10-100MB100MB-1GB1-10GB
Total200MB-2GB2-10GB10-50GB