YAML Configuration Guide#
The OMERO Annotate AI package uses structured YAML configuration files to define annotation workflows. This ensures reproducibility and allows fine-grained control over all aspects of the annotation process.
Configuration Overview#
The configuration system is built around a robust Pydantic model that validates all settings and provides sensible defaults. You can use configurations in two ways:
- Interactive widgets - Generate configurations through Jupyter notebook widgets
- YAML files - Define configurations directly in YAML for batch processing
Quick Start Example#
Here's a minimal working configuration:
# Basic OMERO micro-SAM Configuration
schema_version: "1.0.0"
name: "nuclei_segmentation_demo"
# OMERO data source
omero:
container_type: "dataset"
container_id: 123
source_desc: "HeLa cells for nuclei segmentation"
# Spatial processing settings
spatial_coverage:
channels: [0] # DAPI channel
timepoints: [0] # Single timepoint
z_slices: [0, 1, 2] # Process 3 z-slices
three_d: false # Process slice-by-slice
# AI model configuration
ai_model:
name: "micro-sam"
model_type: "vit_b_lm"
# Processing parameters
processing:
batch_size: 0 # Process all images
use_patches: true
patch_size: [512, 512]
patches_per_image: 4
# Training data split
training:
train_n: 5
validate_n: 2
Complete Configuration Schema#
Core Identification#
schema_version: "1.0.0" # Configuration schema version
name: "workflow_name" # Unique workflow identifier
version: "1.0.0" # Your workflow version
created: "2025-01-14T10:30:00Z" # Auto-generated timestamp
authors: # Optional author information
- name: "Your Name"
email: "your.email@institution.edu"
affiliation: "Your Institution"
OMERO Connection#
omero:
container_type: "dataset" # "dataset", "project", or "plate"
container_id: 123 # OMERO container ID
source_desc: "Description" # Human-readable description
Study Context (MIFA Compatible)#
study:
title: "Study title"
description: "Detailed study description"
keywords: ["nuclei", "segmentation", "fluorescence"]
organism: "Homo sapiens"
imaging_method: "fluorescence microscopy"
dataset:
source_dataset_id: "S-BIAD123" # BioImage Archive accession
source_dataset_url: "https://www.ebi.ac.uk/bioimaging/studies/S-BIAD123"
source_description: "Dataset description"
license: "CC-BY-4.0"
Spatial Coverage Settings#
The spatial coverage section defines which parts of your images to process:
spatial_coverage:
# Basic spatial selection
channels: [0, 1] # Channel indices to process
timepoints: [0] # Timepoint indices
z_slices: [0, 1, 2, 3, 4] # Z-slice indices
# Selection modes
timepoint_mode: "specific" # "all", "random", "specific"
z_slice_mode: "specific" # "all", "random", "specific"
# 3D processing
three_d: false # Enable 3D volumetric processing
z_range_start: 0 # Start z-slice for 3D (when three_d=true)
z_range_end: 10 # End z-slice for 3D (when three_d=true)
spatial_units: "pixels" # Spatial measurement units
2D vs 3D Processing#
2D Mode (Default):
3D Volumetric Mode:
spatial_coverage:
three_d: true
z_range_start: 0 # Process z-slices 0-10 as one volume
z_range_end: 10
# OR alternatively:
z_slices: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Annotation Methodology#
annotation_methodology:
annotation_type: "segmentation_mask" # "segmentation_mask", "bounding_box", "point", "classification"
annotation_method: "automatic" # "manual", "semi_automatic", "automatic"
annotation_criteria: "Complete nuclei boundaries based on DAPI staining"
annotation_coverage: "representative" # "all", "representative", "partial"
AI Model Configuration#
ai_model:
name: "micro-sam" # Model identifier
version: "latest" # Model version
model_type: "vit_b_lm" # "vit_b_lm", "vit_l_lm", "vit_h_lm"
framework: "pytorch" # AI framework
Processing Parameters#
processing:
batch_size: 0 # Number of images to process (0 = all)
use_patches: true # Extract patches vs full images
patch_size: [512, 512] # Patch dimensions [width, height]
patches_per_image: 4 # Number of patches per image
random_patches: true # Use random vs systematic patch extraction
Training Configuration#
training:
validation_strategy: "random_split" # "random_split", "expert_review", "cross_validation"
train_fraction: 0.7 # Training data fraction (auto-calculated if train_n specified)
train_n: 10 # Explicit number of training images
validation_fraction: 0.3 # Validation data fraction
validate_n: 5 # Explicit number of validation images
segment_all: false # Segment all objects vs sample
quality_threshold: 0.8 # Minimum quality score (optional)
Workflow Control#
workflow:
resume_from_table: false # Resume from existing annotation table
read_only_mode: false # Read-only mode for viewing results
Output Configuration#
output:
output_directory: "./annotations" # Output directory path
format: "ome_tiff" # "ome_tiff", "png", "numpy"
compression: null # Compression method (optional)
resume_from_checkpoint: false # Resume interrupted workflow
Metadata and Tracking#
# Workflow metadata (bioimage.io compatible)
documentation: "https://github.com/your-org/your-repo/docs"
repository: "https://github.com/your-org/your-repo"
tags: ["segmentation", "nuclei", "micro-sam", "AI-ready"]
# Annotation tracking (auto-populated during processing)
annotations: [] # List of ImageAnnotation records
Working with Configuration Files#
Loading Configurations#
from omero_annotate_ai.core.annotation_config import load_config
# From YAML file
config = load_config("my_config.yaml")
# From dictionary
config_dict = {...}
config = load_config(config_dict)
# Create default configuration
config = create_default_config()
Saving Configurations#
# Save to YAML file
config.save_yaml("my_config.yaml")
# Export as dictionary
config_dict = config.to_dict()
# Export as YAML string
yaml_string = config.to_yaml()
Configuration Templates#
Generate a complete template with all options:
from omero_annotate_ai.core.annotation_config import get_config_template
template = get_config_template()
print(template)
Advanced Configuration Examples#
Multi-channel Time Series#
name: "multi_channel_timeseries"
spatial_coverage:
channels: [0, 1, 2] # DAPI, GFP, RFP
timepoints: [0, 5, 10, 15] # Every 5th timepoint
z_slices: [2] # Middle focal plane
timepoint_mode: "specific"
3D Volumetric Processing#
name: "3d_organoid_segmentation"
spatial_coverage:
channels: [0]
timepoints: [0]
three_d: true
z_range_start: 5
z_range_end: 25 # Process 20 z-slices as volume
Large-scale Patch-based Processing#
name: "large_image_patches"
processing:
use_patches: true
patch_size: [1024, 1024] # Larger patches
patches_per_image: 16 # More patches per image
random_patches: false # Systematic grid sampling
training:
train_n: 100 # Process many images
validate_n: 20
High-throughput Screening#
name: "hts_plate_processing"
omero:
container_type: "plate"
container_id: 456
spatial_coverage:
channels: [0, 1] # Nuclei + marker
timepoints: [0]
z_slices: [0] # Single focal plane
processing:
batch_size: 50 # Process in batches
use_patches: false # Full images only
Configuration Validation#
The configuration system provides thorough validation:
- Type checking - All fields are validated against expected types
- Range validation - Numeric fields are checked against valid ranges
- Dependency validation - Related fields are checked for consistency
- 3D configuration validation - 3D settings are validated for completeness
Integration with Widgets#
Widget-generated configurations are fully compatible with YAML configurations:
# Generate config with widgets
workflow_widget = create_workflow_widget(connection=conn)
workflow_widget.display()
config = workflow_widget.get_config()
# Save widget config as YAML
config.save_yaml("widget_generated_config.yaml")
# Load and modify YAML config
config = load_config("widget_generated_config.yaml")
config.processing.batch_size = 10
config.save_yaml("modified_config.yaml")
Best Practices#
- Use descriptive names - Make workflow names meaningful for tracking
- Document your criteria - Specify clear annotation criteria
- Version your configs - Update version numbers when making changes
- Validate before running - Test configurations on small datasets first
- Keep configs with data - Store configuration files alongside results
- Use templates - Start from the provided template for completeness
Troubleshooting#
Common configuration issues and solutions:
3D Configuration Errors:
# ❌ Incorrect - missing z_range when three_d=true
spatial_coverage:
three_d: true
z_slices: [0, 1, 2]
# ✅ Correct - specify z_range for 3D
spatial_coverage:
three_d: true
z_range_start: 0
z_range_end: 2
Training Split Validation:
# ❌ Incorrect - fractions don't sum to 1.0
training:
train_fraction: 0.8
validation_fraction: 0.3
# ✅ Correct - use explicit counts or proper fractions
training:
train_n: 8
validate_n: 2
For more examples and tutorials, see the tutorial section for detailed workflow guides.