Skip to main content

Usage Guide

This guide walks you through the process of using the experiment configuration system to generate and manage configurations for your CODEX analysis experiments.

Quick Start

1. Set Script Parameters

Modify the following variables in config_generator.py:

# Experiment set name (corresponds to CSV file name)
# It will be the directory name of the generated configuration files.
experiment_set_name = "main_ft"

# Analysis step: "preprocess", "main", "analysis"
# It will select the template file to use.
analysis_step = "main"

# Base path of Experiment Configuration Module
# We have been using absolute path of the `exps` directory in the codebase.
base_dir = "/workspaces/codex-analysis/0-phenocycler-penntmc-pipeline/exps"

2. Create Experiment Design Table (CSV)

Create an experiment design table in the csvs/ directory for each experiment set. Currently, we design the experiments in this Google Sheet and export as .csv files.

The format of the .csv file is as follows:

exp_id,data::file_name,data::antibodies_file,channels::nuclear_channel,channels::wholecell_channel,...
D10_0,/path/to/image.tiff,/path/to/antibodies.tsv,DAPI,"Pan-Cytokeratin,E-cadherin",...
D10_1,/path/to/image2.tiff,/path/to/antibodies2.tsv,DAPI,"Pan-Cytokeratin,E-cadherin",...

Column Name Format Description:

  • Use :: separator to represent nested configuration levels
  • Example: data::file_name corresponds to data.file_name in YAML
  • Example: channels::wholecell_channel corresponds to channels.wholecell_channel

There are three major components of Aegle Pipeline so there are three different types of Design Tables. They are marked as prefix in the tab name of the Google Sheet as in preprocess_X, main_X, analysis_X. Such as preprocess_ft, main_ft, analysis_ft.

3. Check Configuration Templates

Ensure the corresponding template files exist located at exps/:

  • preprocess_template.yaml - Preprocessing step template
  • main_template.yaml - Main analysis step template
  • analysis_template.yaml - Analysis step template

They serve as the default configuration templates for the configuration generator. And also as a reference about the output of the configuration generator.

4. Run the Generator

Go to the base path of Experiment Configuration Module and run the generator.

cd /workspaces/codex-analysis/0-phenocycler-penntmc-pipeline/exps
python config_generator.py

Parameter Types

The configuration generator automatically converts CSV values into appropriate Python types before writing them into YAML configuration files. This ensures consistency between human-readable CSV inputs and machine-readable configuration files.

Type Conversions

Parameter TypeCSV FormatConverted TypeExample
ListComma-separatedList[str]"DAPI,Pan-Cytokeratin"["DAPI", "Pan-Cytokeratin"]
IntegerNumberint"512"512
FloatDecimalfloat"0.5"0.5
BooleanTRUE / FALSEbool"TRUE"true, "FALSE"false
Null"None"null"None"null
Python ExpressionStringEvaluated type"[128, 64]"[128, 64] (via ast.literal_eval)

If none of the above conversions apply, values remain as strings.

Conversions Rules

By default, we transform the string from CSV to float Python and write it into YAML.

The following are the rules for some keys that are explicitly handled with custom logic:

  • List Parameters

    • wholecell_channel: "A,B,C"["A", "B", "C"]
    • assign_sizes: "0.1,0.2"[0.1, 0.2] (list of floats)
  • Integer Parameters

    • patch_width, patch_height, patch_index, n_tissue, downscale_factor, min_area, output_dim
  • List of Integers

    • hidden_dims: evaluated with ast.literal_eval, e.g. "[128, 64]"[128, 64]
  • Boolean Flags

    • generate_channel_stats, visualize_whole_sample (handled in preprocess overview), visualize_patches, save_all_channel_patches,
      visualize_segmentation, save_segmentation_images, save_segmentation_pickle, save_disrupted_patches,
      compute_metrics, skip_viz, enhance_contrast, visualize, segmentation_analysis