orchestration¶
Pipeline orchestration and hyperparameter management.
Overview¶
This module handles:
- Prefect flow orchestration
- Hyperparameter sweeps
- Debug utilities
prefect_utils
¶
pre_flow_prefect_checks
¶
Perform pre-flight checks before running a Prefect flow.
Logs the Hydra output directory, optionally starts the Prefect server, and checks CUDA availability with appropriate warnings.
| PARAMETER | DESCRIPTION |
|---|---|
prefect_cfg
|
Prefect configuration with SERVER.autostart setting.
TYPE:
|
Source code in src/orchestration/prefect_utils.py
pre_check_server
¶
Check and start the Prefect server if not running.
Attempts to start the Prefect server and logs status. If port 4200 is already in use, assumes server is running.
Notes
Dashboard is available at http://127.0.0.1:4200/dashboard when running.
Source code in src/orchestration/prefect_utils.py
pre_check_workpool
¶
Check and create a Prefect work pool if not exists.
Attempts to create a process-type work pool named 'my-work-pool'.
Notes
Work pools are used for distributed task execution in Prefect. See: https://docs.prefect.io/3.0/get-started/quickstart#create-a-work-pool
Source code in src/orchestration/prefect_utils.py
post_flow_prefect_housekeeping
¶
Perform cleanup tasks after a Prefect flow completes.
Placeholder for post-flow housekeeping such as stopping servers or cleaning up resources.
| PARAMETER | DESCRIPTION |
|---|---|
prefect_cfg
|
Prefect configuration dictionary.
TYPE:
|
Notes
Currently a stub - implementation pending.
Source code in src/orchestration/prefect_utils.py
hyperparameter_sweep_utils
¶
flatten_the_nested_dicts
¶
Flatten nested configuration dictionaries to a single level.
Extracts configurations from the first model's nested structure.
| PARAMETER | DESCRIPTION |
|---|---|
cfgs
|
Nested dictionary with model names as keys.
TYPE:
|
delimiter
|
Delimiter for flattened keys (currently unused). Default is '_'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Flattened configuration dictionary. |
Source code in src/orchestration/hyperparameter_sweep_utils.py
drop_other_models
¶
Drop all models from the config except the one specified.
Creates a copy of the configuration with only the specified model, ensuring exactly one model per configuration for hyperparameter sweeps.
| PARAMETER | DESCRIPTION |
|---|---|
cfg_model
|
Configuration containing multiple models.
TYPE:
|
model
|
Name of the model to keep.
TYPE:
|
task
|
Task type determining config key: 'outlier_detection', 'imputation', or 'classification'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DictConfig
|
Configuration with only the specified model. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If task is not recognized. |
AssertionError
|
If resulting config doesn't have exactly one model. |
Source code in src/orchestration/hyperparameter_sweep_utils.py
pick_cfg_key
¶
Get the configuration key for models based on task type.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Configuration dictionary (currently unused).
TYPE:
|
task
|
Task type: 'outlier_detection', 'imputation', or 'classification'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Configuration key: 'OUTLIER_MODELS', 'MODELS', or 'CLS_MODELS'. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If task is not recognized. |
Source code in src/orchestration/hyperparameter_sweep_utils.py
define_hyperparameter_search
¶
Define hyperparameter search configurations for all models.
Creates configuration variants for each model based on their defined search space (LIST or GRID method).
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Main configuration with model definitions.
TYPE:
|
task
|
Task type for naming conventions.
TYPE:
|
cfg_key
|
Key to access model configurations (e.g., 'MODELS').
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary mapping configuration names to their configs. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If search method is unknown or parameters are missing. |
Source code in src/orchestration/hyperparameter_sweep_utils.py
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | |
define_hyperparam_group
¶
Define a group of hyperparameter configurations for a task.
Main entry point for hyperparameter configuration generation. Either creates multiple configs for hyperparameter search or a single config for default parameters.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Main configuration with EXPERIMENT.hyperparam_search flag.
TYPE:
|
task
|
Task type: 'outlier_detection', 'imputation', or 'classification'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary mapping run names to their configurations. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If multiple models defined without hyperparameter search enabled, or if task is not recognized. |
NotImplementedError
|
If classification naming is requested (not yet implemented). |
Source code in src/orchestration/hyperparameter_sweep_utils.py
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 | |
hyperparamer_list_utils
¶
clean_param
¶
Convert snake_case parameter name to camelCase.
| PARAMETER | DESCRIPTION |
|---|---|
param
|
Parameter name in snake_case (e.g., 'd_ffn').
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Parameter name in camelCase (e.g., 'dFfn'). |
Source code in src/orchestration/hyperparamer_list_utils.py
create_hyperparam_name
¶
create_hyperparam_name(
param: str,
value_from_list,
i: int = 0,
j: int = 0,
n_params: int = 1,
value_key_delimiter: str = "",
param_delimiter: str = "_",
)
Create a standardized name string for a hyperparameter value.
| PARAMETER | DESCRIPTION |
|---|---|
param
|
Parameter name.
TYPE:
|
value_from_list
|
Value of the parameter.
TYPE:
|
i
|
Index in the value list (unused). Default is 0.
TYPE:
|
j
|
Current parameter index. Default is 0.
TYPE:
|
n_params
|
Total number of parameters. Default is 1.
TYPE:
|
value_key_delimiter
|
Delimiter between param name and value. Default is ''.
TYPE:
|
param_delimiter
|
Delimiter between parameters. Default is '_'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Formatted parameter key string (e.g., 'dFfn128_'). |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If parameter cleaning fails. |
Source code in src/orchestration/hyperparamer_list_utils.py
create_name_from_model_params
¶
Create a descriptive name from model name and parameters.
| PARAMETER | DESCRIPTION |
|---|---|
model_name
|
Base model name (e.g., 'SAITS').
TYPE:
|
param_cfg
|
Model parameter configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Combined name with model and parameters (e.g., 'SAITS_dFfn128_nLayers2'). |
Source code in src/orchestration/hyperparamer_list_utils.py
define_list_hyperparam_combos
¶
define_list_hyperparam_combos(
cfg_model: DictConfig,
param_dict: DictConfig,
model: str,
task: str,
cfg_key: str,
) -> dict
Generate configurations from parallel parameter lists.
Creates one configuration per index across all parameter lists, where all lists must have the same length.
| PARAMETER | DESCRIPTION |
|---|---|
cfg_model
|
Base model configuration.
TYPE:
|
param_dict
|
Dictionary mapping parameter names to value lists.
TYPE:
|
model
|
Model name.
TYPE:
|
task
|
Task type for naming.
TYPE:
|
cfg_key
|
Configuration key for model access.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary mapping configuration names to configs. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If parameter not found in model config or lists have different lengths. |
Source code in src/orchestration/hyperparamer_list_utils.py
define_grid_hyperparam_combos
¶
define_grid_hyperparam_combos(
cfg_model: DictConfig,
param_dict: DictConfig,
model: str,
task: str,
cfg_key: str,
) -> dict
Generate configurations for all combinations in a grid search.
Creates the Cartesian product of all parameter value lists.
| PARAMETER | DESCRIPTION |
|---|---|
cfg_model
|
Base model configuration.
TYPE:
|
param_dict
|
Dictionary mapping parameter names to value lists.
TYPE:
|
model
|
Model name.
TYPE:
|
task
|
Task type: 'outlier_detection' or 'imputation'.
TYPE:
|
cfg_key
|
Configuration key for model access.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary mapping configuration names to configs. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If task type is not recognized for naming. |
Source code in src/orchestration/hyperparamer_list_utils.py
debug_utils
¶
debug_classification_macro
¶
Reduce bootstrap iterations for faster debugging.
Modifies the configuration to use only 50 bootstrap iterations instead of the default (typically 1000).
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Configuration dictionary to modify.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DictConfig
|
Modified configuration with reduced bootstrap iterations. |
Source code in src/orchestration/debug_utils.py
pick_one_model
¶
Keep only one model in configuration for testing.
Reduces the model dictionary to contain only the specified model.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Configuration with MODELS dictionary.
TYPE:
|
model_name
|
Name of the model to keep. Default is 'SAITS'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DictConfig
|
Modified configuration with single model. |
Source code in src/orchestration/debug_utils.py
debug_train_only_for_one_epoch
¶
Reduce all epoch counts to 1 for quick debugging.
Recursively searches configuration for epoch-related keys and sets them to 1 for fast iteration during development.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Configuration dictionary to modify.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DictConfig
|
Modified configuration with all epoch counts set to 1. |
Notes
Modifies keys: 'epochs', 'max_epoch', 'train_epochs'.
Source code in src/orchestration/debug_utils.py
fix_tree_learners_for_debug
¶
Reduce tree-based model iterations for debugging.
Specifically handles MissForest by reducing max_iter to 2.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Configuration dictionary.
TYPE:
|
model_name
|
Name of the model to check.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DictConfig
|
Modified configuration if MissForest, otherwise unchanged. |
Source code in src/orchestration/debug_utils.py
tabm_hyperparams
¶
get_metric_of_runs
¶
Extract a specific evaluation metric from multiple runs.
| PARAMETER | DESCRIPTION |
|---|---|
metrics
|
Dictionary mapping run indices to their metric results.
TYPE:
|
eval_metric
|
Name of the evaluation metric to extract (case-insensitive).
TYPE:
|
split
|
Data split to get metrics from. Default is 'test'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
Array of metric values, one per run. |
Source code in src/orchestration/tabm_hyperparams.py
pick_the_best_hyperparam_metrics
¶
Select the best hyperparameter configuration based on evaluation metric.
| PARAMETER | DESCRIPTION |
|---|---|
metrics
|
Dictionary of metrics from each hyperparameter configuration run.
TYPE:
|
hparam_cfg
|
Hyperparameter search configuration with SEARCH_SPACE.GRID.
TYPE:
|
model_cfgs
|
List of model configurations corresponding to each run.
TYPE:
|
cfg
|
Main configuration dictionary.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
A tuple containing: - best_metrics : dict Metrics from the best-performing configuration. - best_choice : dict Dictionary with 'choice', 'choice_keys', and 'model_cfg' for the selected configuration. |
Source code in src/orchestration/tabm_hyperparams.py
get_grid_choices
¶
Generate all combinations from a grid search space.
| PARAMETER | DESCRIPTION |
|---|---|
hparam_cfg
|
Configuration with SEARCH_SPACE.GRID containing parameter lists.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
A tuple containing: - choice_keys : list List of hyperparameter names. - choices : list List of tuples, each representing one parameter combination. |
Source code in src/orchestration/tabm_hyperparams.py
create_tabm_grid_experiment
¶
Create configurations for TabM grid search experiment.
Generates a list of model configurations, one for each combination of hyperparameters in the grid search space.
| PARAMETER | DESCRIPTION |
|---|---|
hparam_cfg
|
Hyperparameter configuration with SEARCH_SPACE.GRID.
TYPE:
|
cls_model_cfg
|
Base classifier model configuration to modify.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list
|
List of configuration dictionaries, one per grid combination. |
Source code in src/orchestration/tabm_hyperparams.py
create_tabm_hyperparam_experiment
¶
Create hyperparameter experiment configurations for TabM.
Determines whether to run grid search based on configuration and run name, and generates appropriate configurations.
| PARAMETER | DESCRIPTION |
|---|---|
run_name
|
Name of the current run, used to determine if ground truth.
TYPE:
|
hparam_cfg
|
Hyperparameter configuration with search settings.
TYPE:
|
cls_model_cfg
|
Base classifier configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list
|
List of configurations to run. Returns single-element list with original config if grid search is disabled or not applicable. |
| RAISES | DESCRIPTION |
|---|---|
NotImplementedError
|
If LIST search space is specified (not yet implemented). |
ValueError
|
If search space type is unknown. |