ensemble¶
Ensemble methods for combining multiple models.
Overview¶
Ensemble approaches for:
- Outlier detection voting
- Imputation averaging
- Classification ensembles
ensemble_anomaly_detection
¶
Anomaly detection ensemble module.
Provides functionality to combine outlier detection masks from multiple models using majority voting and compute ensemble metrics across difficulty levels.
Cross-references: - src/anomaly_detection/outlier_sklearn.py for metric computation - src/ensemble/ensemble_utils.py for run retrieval
get_anomaly_results_per_model
¶
get_anomaly_results_per_model(
model_name: str,
outlier_artifacts: dict,
pred_masks: dict,
labels: dict,
idx: int,
)
Extract prediction masks and labels from model artifacts.
Handles different artifact structures for various model types (TimesNet, UniTS, MOMENT, sklearn-based).
| PARAMETER | DESCRIPTION |
|---|---|
model_name
|
Name of the model architecture.
TYPE:
|
outlier_artifacts
|
Loaded artifacts from MLflow.
TYPE:
|
pred_masks
|
Accumulated prediction masks (modified in place).
TYPE:
|
labels
|
Accumulated ground truth labels (modified in place).
TYPE:
|
idx
|
Index of current model (for logging).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Updated prediction masks. |
dict
|
Updated labels. |
Source code in src/ensemble/ensemble_anomaly_detection.py
write_granular_outlier_metrics
¶
Log granular outlier metrics to MLflow.
Writes metrics at different difficulty levels (easy, medium, hard) for both train and test splits.
| PARAMETER | DESCRIPTION |
|---|---|
metrics
|
Dictionary of metrics per split and granularity level.
TYPE:
|
Source code in src/ensemble/ensemble_anomaly_detection.py
get_granular_outlier_metrics
¶
Compute outlier metrics at different difficulty levels.
| PARAMETER | DESCRIPTION |
|---|---|
data_dict
|
Source data containing difficulty level information.
TYPE:
|
pred_masks
|
Prediction masks per split.
TYPE:
|
idx
|
Index of model to evaluate.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Metrics per split and difficulty level. |
Source code in src/ensemble/ensemble_anomaly_detection.py
compute_granular_metrics
¶
compute_granular_metrics(
run_id,
mlflow_run: Series,
model_name: str,
pred_masks: dict,
idx: int,
sources: dict,
debug_verbose: bool = True,
)
Compute and log granular outlier metrics if not already computed.
Checks if granular metrics exist in MLflow, computes them if not, and logs them to the original run.
| PARAMETER | DESCRIPTION |
|---|---|
run_id
|
MLflow run ID.
TYPE:
|
mlflow_run
|
MLflow run data.
TYPE:
|
model_name
|
Name of the model.
TYPE:
|
pred_masks
|
Prediction masks per split.
TYPE:
|
idx
|
Index of current model.
TYPE:
|
sources
|
Source data.
TYPE:
|
debug_verbose
|
If True, log debug information.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Computed metrics. |
Source code in src/ensemble/ensemble_anomaly_detection.py
get_anomaly_masks_and_labels
¶
Load and aggregate anomaly masks from all ensemble submodels.
| PARAMETER | DESCRIPTION |
|---|---|
ensembled_output
|
Dictionary mapping model names to MLflow run data.
TYPE:
|
sources
|
Source data.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Stacked prediction masks (n_models x n_subjects x n_timepoints). |
dict
|
Ground truth labels (from last model, should be same for all). |
Source code in src/ensemble/ensemble_anomaly_detection.py
ensemble_masks
¶
Combine prediction masks using averaging and thresholding.
| PARAMETER | DESCRIPTION |
|---|---|
preds_3d
|
Stacked prediction masks (n_models x n_subjects x n_timepoints).
TYPE:
|
method
|
Thresholding method: - 'over_0.5': Majority voting (>50% agreement) - 'over_0': Any positive vote
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
Ensembled binary mask (n_subjects x n_timepoints). |
Source code in src/ensemble/ensemble_anomaly_detection.py
compute_ensemble_anomaly_metrics
¶
Compute anomaly detection metrics for ensemble predictions.
Applies majority voting to combine masks, then evaluates at all difficulty levels.
| PARAMETER | DESCRIPTION |
|---|---|
pred_masks
|
Stacked 3D prediction masks per split.
TYPE:
|
labels
|
Ground truth labels per split.
TYPE:
|
sources
|
Source data with difficulty level information.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Metrics per split and difficulty level. |
See Also
src.anomaly_detection.anomaly_detection_metrics_wrapper.metrics_per_split
Source code in src/ensemble/ensemble_anomaly_detection.py
ensemble_anomaly_detection
¶
ensemble_anomaly_detection(
ensembled_output: dict,
cfg: DictConfig,
experiment_name: str,
ensemble_name: str,
sources: dict,
)
Create anomaly detection ensemble from multiple models.
Main entry point for anomaly detection ensembling. Loads masks from submodels, combines them via majority voting, and computes metrics.
| PARAMETER | DESCRIPTION |
|---|---|
ensembled_output
|
Dictionary mapping model names to MLflow run data.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
experiment_name
|
MLflow experiment name.
TYPE:
|
ensemble_name
|
Name for the ensemble.
TYPE:
|
sources
|
Source data.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Ensemble metrics per split and difficulty level. |
dict
|
Stacked prediction masks from all submodels. |
Source code in src/ensemble/ensemble_anomaly_detection.py
ensemble_classification
¶
Classification ensemble module.
Provides functionality to aggregate predictions from multiple classification models and compute ensemble metrics with bootstrap confidence intervals.
Cross-references: - src/ensemble/ensemble_utils.py for run retrieval - src/classification/bootstrap_evaluation.py for metric computation
import_model_metrics
¶
import_model_metrics(
run_id: str,
run_name: str,
model_name: str,
subdir: str = "baseline_model",
) -> dict[str, Any]
Load model metrics from MLflow artifact storage.
| PARAMETER | DESCRIPTION |
|---|---|
run_id
|
MLflow run ID.
TYPE:
|
run_name
|
MLflow run name.
TYPE:
|
model_name
|
Name of the model (for artifact path construction).
TYPE:
|
subdir
|
Subdirectory within artifacts to load from.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary containing model artifacts and metrics. |
Source code in src/ensemble/ensemble_classification.py
get_preds_and_labels_from_artifacts
¶
get_preds_and_labels_from_artifacts(
artifacts: dict[str, Any],
) -> tuple[
dict[str, ndarray],
dict[str, ndarray],
dict[str, ndarray],
]
Extract predictions and labels from model artifacts.
Retrieves mean predictions and variance from subjectwise statistics, which are averaged across bootstrap iterations.
| PARAMETER | DESCRIPTION |
|---|---|
artifacts
|
Dictionary containing model artifacts with 'subjectwise_stats'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Mean predicted probabilities per split. |
dict
|
Variance of predicted probabilities per split. |
dict
|
Ground truth labels per split. |
Source code in src/ensemble/ensemble_classification.py
import_model_preds_and_labels
¶
import_model_preds_and_labels(
run_id: str,
run_name: str,
model_name: str,
subdir: str = "metrics",
) -> tuple[
dict[str, ndarray],
dict[str, ndarray],
dict[str, ndarray],
]
Load predictions and labels from MLflow run.
| PARAMETER | DESCRIPTION |
|---|---|
run_id
|
MLflow run ID.
TYPE:
|
run_name
|
MLflow run name.
TYPE:
|
model_name
|
Name of the model.
TYPE:
|
subdir
|
Subdirectory within artifacts.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Mean predicted probabilities per split. |
dict
|
Variance of predicted probabilities per split. |
dict
|
Ground truth labels per split. |
Source code in src/ensemble/ensemble_classification.py
import_metrics_iter
¶
import_metrics_iter(
run_id: str,
run_name: str,
model_name: str,
subdir: str = "metrics",
) -> dict[str, Any]
Load per-iteration metrics from MLflow run.
| PARAMETER | DESCRIPTION |
|---|---|
run_id
|
MLflow run ID.
TYPE:
|
run_name
|
MLflow run name.
TYPE:
|
model_name
|
Name of the model.
TYPE:
|
subdir
|
Subdirectory within artifacts.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary of metrics per bootstrap iteration per split. |
Source code in src/ensemble/ensemble_classification.py
concentate_one_var
¶
concentate_one_var(
array_out: dict[str, ndarray] | None,
array_per_submodel: dict[str, ndarray],
) -> dict[str, ndarray]
Concatenate arrays from submodel into accumulated output.
Stacks submodel arrays along a new first axis.
| PARAMETER | DESCRIPTION |
|---|---|
array_out
|
Accumulated arrays (None for first submodel).
TYPE:
|
array_per_submodel
|
Arrays from current submodel, keyed by split.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Updated accumulated arrays. |
Source code in src/ensemble/ensemble_classification.py
concatenate_arrays
¶
concatenate_arrays(
preds_out: dict[str, ndarray] | None,
preds_var_out: dict[str, ndarray] | None,
_labels_out: dict[str, ndarray] | None,
y_pred_proba: dict[str, ndarray],
y_pred_proba_var: dict[str, ndarray],
label: dict[str, ndarray],
) -> tuple[
dict[str, ndarray],
dict[str, ndarray],
dict[str, ndarray],
]
Concatenate prediction arrays from multiple submodels.
| PARAMETER | DESCRIPTION |
|---|---|
preds_out
|
Accumulated predictions.
TYPE:
|
preds_var_out
|
Accumulated prediction variances.
TYPE:
|
_labels_out
|
Accumulated labels (not modified, labels are same across models).
TYPE:
|
y_pred_proba
|
Predictions from current submodel.
TYPE:
|
y_pred_proba_var
|
Prediction variances from current submodel.
TYPE:
|
label
|
Labels from current submodel.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Updated predictions. |
dict
|
Updated variances. |
dict
|
Labels (unchanged). |
Source code in src/ensemble/ensemble_classification.py
check_dicts
¶
check_dicts(
preds_out: dict[str, ndarray],
preds_var_out: dict[str, ndarray],
_labels_out: dict[str, ndarray] | None,
no_submodel_runs: int,
) -> None
Verify concatenated arrays have expected dimensions.
| PARAMETER | DESCRIPTION |
|---|---|
preds_out
|
Accumulated predictions.
TYPE:
|
preds_var_out
|
Accumulated variances.
TYPE:
|
_labels_out
|
Labels (unused, kept for API compatibility).
TYPE:
|
no_submodel_runs
|
Expected number of submodels.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
AssertionError
|
If array dimensions don't match expected submodel count. |
Source code in src/ensemble/ensemble_classification.py
compute_stats
¶
compute_stats(
preds_out: dict[str, ndarray],
preds_var_out: dict[str, ndarray],
) -> tuple[
dict[str, ndarray],
dict[str, ndarray],
dict[str, ndarray],
]
Compute ensemble statistics from stacked predictions.
| PARAMETER | DESCRIPTION |
|---|---|
preds_out
|
Stacked predictions (n_models x n_subjects).
TYPE:
|
preds_var_out
|
Stacked variances (n_models x n_subjects).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Mean predictions across models. |
dict
|
Standard deviation of predictions across models. |
dict
|
Mean of within-model standard deviations. |
Source code in src/ensemble/ensemble_classification.py
aggregate_pred_dict
¶
aggregate_pred_dict(
preds_out: dict[str, dict[str, list[Any]]],
preds_per_submodel: dict[str, dict[str, list[Any]]],
ensemble: bool = False,
) -> dict[str, dict[str, list[Any]]]
Aggregate prediction dictionaries from submodels.
Extends lists of predictions for each subject code.
| PARAMETER | DESCRIPTION |
|---|---|
preds_out
|
Accumulated predictions keyed by variable and subject code.
TYPE:
|
preds_per_submodel
|
Predictions from current submodel.
TYPE:
|
ensemble
|
If True, performs additional consistency checks.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Updated accumulated predictions. |
Source code in src/ensemble/ensemble_classification.py
aggregate_preds
¶
aggregate_preds(
preds_out: dict[str, ndarray],
preds_per_submodel: dict[str, ndarray],
) -> dict[str, ndarray]
Concatenate prediction arrays along iteration axis.
| PARAMETER | DESCRIPTION |
|---|---|
preds_out
|
Accumulated predictions (n_subjects x n_accumulated_iters).
TYPE:
|
preds_per_submodel
|
Predictions from current submodel.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Updated predictions with new iterations concatenated. |
Source code in src/ensemble/ensemble_classification.py
check_metrics_iter_preds_dict
¶
Validate prediction dictionary has consistent dimensions.
Checks that y_pred_proba, y_pred, and labels have same length.
| PARAMETER | DESCRIPTION |
|---|---|
dict_arrays
|
Dictionary with 'y_pred_proba', 'y_pred', and 'label'/'labels' keys.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
AssertionError
|
If lengths don't match. |
Source code in src/ensemble/ensemble_classification.py
check_metrics_iter_preds
¶
Validate prediction arrays have consistent shapes.
Checks that y_pred_proba, y_pred, and labels have same first dimension.
| PARAMETER | DESCRIPTION |
|---|---|
dict_arrays
|
Dictionary with numpy array values.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
AssertionError
|
If shapes don't match. |
Source code in src/ensemble/ensemble_classification.py
check_metrics_iter_shapes
¶
Dispatch shape checking based on data structure type.
| PARAMETER | DESCRIPTION |
|---|---|
iter_split
|
Split-level metrics iteration data.
TYPE:
|
Source code in src/ensemble/ensemble_classification.py
check_subjects_in_splits
¶
Extract and validate subject codes from metrics iteration data.
| PARAMETER | DESCRIPTION |
|---|---|
metrics_iter
|
Metrics per iteration dictionary.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict or None
|
Dictionary mapping splits to sorted subject code lists. |
Source code in src/ensemble/ensemble_classification.py
check_compare_subjects_for_aggregation
¶
check_compare_subjects_for_aggregation(
subject_codes: dict[str, list[str]] | None,
subject_codes_model: dict[str, list[str]] | None,
run_name: str,
i: int,
split_to_check: str = "train",
) -> list[str]
Compare subject codes between ensemble and current submodel.
Verifies that the current submodel used same subjects as previous models.
| PARAMETER | DESCRIPTION |
|---|---|
subject_codes
|
Subject codes from ensemble (accumulated).
TYPE:
|
subject_codes_model
|
Subject codes from current submodel.
TYPE:
|
run_name
|
Name of current run for error reporting.
TYPE:
|
i
|
Index of current submodel.
TYPE:
|
split_to_check
|
Which split to compare.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list
|
List of run names with mismatched codes (empty if match). |
Source code in src/ensemble/ensemble_classification.py
aggregate_metric_iter
¶
aggregate_metric_iter(
metrics_iter: dict[str, Any] | None,
metrics_iter_model: dict[str, Any],
run_name: str,
ensemble: bool = False,
) -> dict[str, Any]
Aggregate metrics iteration data from submodel into ensemble.
Combines predictions from multiple bootstrap models by extending the iteration arrays.
| PARAMETER | DESCRIPTION |
|---|---|
metrics_iter
|
Accumulated metrics (None for first submodel).
TYPE:
|
metrics_iter_model
|
Metrics from current submodel.
TYPE:
|
run_name
|
Name of current run for logging.
TYPE:
|
ensemble
|
If True, performs additional consistency checks.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Updated accumulated metrics. |
Source code in src/ensemble/ensemble_classification.py
get_label_array
¶
Convert label dictionary to array.
| PARAMETER | DESCRIPTION |
|---|---|
label_dict
|
Dictionary mapping subject codes to label arrays.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
Array of labels in same order as dictionary keys. |
Source code in src/ensemble/ensemble_classification.py
get_preds_array
¶
Convert prediction dictionary to 2D array.
| PARAMETER | DESCRIPTION |
|---|---|
preds_dict
|
Dictionary mapping subject codes to prediction arrays.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
Array of shape (n_subjects, n_bootstrap_iters). |
Source code in src/ensemble/ensemble_classification.py
recompute_ensemble_metrics
¶
recompute_ensemble_metrics(
metrics_iter: dict[str, Any],
sources: dict[str, Any],
cfg: DictConfig,
) -> dict[str, Any]
Recompute metrics for aggregated ensemble predictions.
Takes combined predictions from all submodels and recomputes bootstrap metrics as if they were from a single model.
| PARAMETER | DESCRIPTION |
|---|---|
metrics_iter
|
Aggregated predictions from all submodels.
TYPE:
|
sources
|
Source data containing subject information.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Updated metrics_iter with recomputed metrics. |
Source code in src/ensemble/ensemble_classification.py
683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 | |
get_cls_preds_from_artifact
¶
get_cls_preds_from_artifact(
run: Series,
i: int,
no_submodel_runs: int,
aggregate_preds: bool = False,
) -> dict[str, Any]
Load classification predictions from MLflow run artifact.
| PARAMETER | DESCRIPTION |
|---|---|
run
|
MLflow run data.
TYPE:
|
i
|
Index of current submodel (for logging).
TYPE:
|
no_submodel_runs
|
Total number of submodels (for logging).
TYPE:
|
aggregate_preds
|
If True, log aggregation progress.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Metrics per iteration dictionary from the run. |
Source code in src/ensemble/ensemble_classification.py
aggregate_submodels
¶
aggregate_submodels(
ensemble_model_runs: DataFrame,
no_submodel_runs: int,
aggregate_preds: bool = True,
split_to_check: str = "train",
ensemble_codes: DataFrame | None = None,
) -> tuple[dict[str, Any] | None, DataFrame, bool]
Aggregate predictions from multiple classification submodels.
| PARAMETER | DESCRIPTION |
|---|---|
ensemble_model_runs
|
DataFrame of MLflow runs to aggregate.
TYPE:
|
no_submodel_runs
|
Number of submodels.
TYPE:
|
aggregate_preds
|
If True, actually aggregate predictions. If False, only check codes.
TYPE:
|
split_to_check
|
Split to use for code consistency checking.
TYPE:
|
ensemble_codes
|
Pre-computed ensemble codes for validation.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict or None
|
Aggregated metrics_iter (or None if aggregate_preds=False). |
DataFrame
|
DataFrame of subject codes per submodel. |
bool
|
Whether all submodels have same subject codes. |
Source code in src/ensemble/ensemble_classification.py
814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 | |
get_classification_preds
¶
get_classification_preds(
ensemble_model_runs: DataFrame,
sources: dict[str, Any],
cfg: DictConfig,
) -> dict[str, Any] | None
Get aggregated classification predictions from submodels.
Coordinates the full aggregation process: code checking, prediction aggregation, and metric recomputation.
| PARAMETER | DESCRIPTION |
|---|---|
ensemble_model_runs
|
DataFrame of MLflow runs to ensemble.
TYPE:
|
sources
|
Source data.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict or None
|
Aggregated metrics_iter with recomputed metrics, or None if failed. |
Source code in src/ensemble/ensemble_classification.py
create_pred_dict
¶
Create prediction dictionary from arrays.
| PARAMETER | DESCRIPTION |
|---|---|
split_preds
|
Predicted probabilities.
TYPE:
|
y_true
|
Ground truth labels.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with y_pred, y_pred_proba, and labels. |
Source code in src/ensemble/ensemble_classification.py
get_compact_dict_arrays
¶
Extract compact arrays of subject codes and labels from sources.
Used by bootstrap metric computation functions.
| PARAMETER | DESCRIPTION |
|---|---|
sources
|
Source data dictionary.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with keys like 'subject_codes_train', 'y_train', etc. |
Source code in src/ensemble/ensemble_classification.py
compute_cls_ensemble_metrics
¶
compute_cls_ensemble_metrics(
metrics_iter: dict[str, Any],
sources: dict[str, Any],
cfg: DictConfig,
) -> dict[str, Any]
Compute classification ensemble statistics.
| PARAMETER | DESCRIPTION |
|---|---|
metrics_iter
|
Aggregated predictions from ensemble.
TYPE:
|
sources
|
Source data.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary containing metrics_iter, metrics_stats, subjectwise_stats, and subject_global_stats. |
Source code in src/ensemble/ensemble_classification.py
ensemble_classification
¶
ensemble_classification(
ensemble_model_runs: DataFrame,
cfg: DictConfig,
sources: dict[str, Any],
ensemble_name: str,
) -> dict[str, Any] | None
Create classification ensemble from multiple models.
Main entry point for classification ensembling. Aggregates predictions from submodels and computes ensemble metrics.
| PARAMETER | DESCRIPTION |
|---|---|
ensemble_model_runs
|
DataFrame of MLflow classification runs to ensemble.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
sources
|
Source data.
TYPE:
|
ensemble_name
|
Name for the ensemble.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict or None
|
Ensemble metrics dictionary, or None if ensembling failed. |
Source code in src/ensemble/ensemble_classification.py
ensemble_imputation
¶
Imputation ensemble module.
Provides functionality to combine imputation outputs from multiple models by averaging reconstructions and computing ensemble statistics.
Cross-references: - src/metrics/evaluate_imputation_metrics.py for metric computation - src/ensemble/ensemble_utils.py for run retrieval
get_imputation_results_per_model
¶
Extract imputation reconstructions from model artifacts.
Handles different artifact structures and destandardizes values for proper metric computation.
| PARAMETER | DESCRIPTION |
|---|---|
model_name
|
Name of the imputation model.
TYPE:
|
outlier_artifacts
|
Loaded artifacts from MLflow.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Reconstructions per split (destandardized). |
dict
|
True pupil values per split (destandardized). |
dict
|
Imputation indicating masks per split. |
Source code in src/ensemble/ensemble_imputation.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | |
get_imputation_preds_and_labels
¶
get_imputation_preds_and_labels(
ensemble_model_runs: DataFrame,
gt_dict: dict,
gt_preprocess: dict,
cfg: DictConfig,
)
Load and stack imputation reconstructions from all submodels.
| PARAMETER | DESCRIPTION |
|---|---|
ensemble_model_runs
|
DataFrame of MLflow imputation runs.
TYPE:
|
gt_dict
|
Ground truth data dictionary.
TYPE:
|
gt_preprocess
|
Ground truth preprocessing parameters.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Stacked reconstructions (n_models x n_subjects x n_timepoints). |
dict
|
True pupil values (from last model). |
dict
|
Imputation masks (from last model). |
Source code in src/ensemble/ensemble_imputation.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 | |
compute_ensemble_imputation_metrics
¶
Compute imputation metrics for ensemble (averaged) predictions.
| PARAMETER | DESCRIPTION |
|---|---|
recons
|
Stacked reconstructions per split.
TYPE:
|
true_pupil
|
Ground truth values per split.
TYPE:
|
labels
|
Imputation masks per split.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
metadata_dict
|
Metadata per split.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Metrics per split. |
ndarray
|
Ensemble predictions (mean of reconstructions). |
Source code in src/ensemble/ensemble_imputation.py
get_imputation_stats_dict
¶
Compute ensemble statistics from stacked reconstructions.
Calculates mean, std, and confidence intervals across submodels.
| PARAMETER | DESCRIPTION |
|---|---|
ensembled_recon
|
Stacked reconstructions (n_models x n_subjects x n_timepoints).
TYPE:
|
labels
|
Imputation indicating mask.
TYPE:
|
p
|
Percentile for confidence interval bounds.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with imputation statistics ready for downstream use. |
Source code in src/ensemble/ensemble_imputation.py
add_imputation_dict
¶
Create imputation output dictionary for downstream processing.
Standardizes reconstructions and computes ensemble statistics in format expected by featurization code.
| PARAMETER | DESCRIPTION |
|---|---|
recons
|
Stacked reconstructions per split.
TYPE:
|
predictions
|
Ensemble mean predictions.
TYPE:
|
labels
|
Imputation masks per split.
TYPE:
|
stdz_dict
|
Standardization parameters.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with 'model_artifacts' key containing imputation data. |
See Also
src.featurization.get_arrays_for_splits_from_imputer_artifacts
Source code in src/ensemble/ensemble_imputation.py
ensemble_imputation
¶
ensemble_imputation(
ensemble_model_runs: DataFrame,
cfg: DictConfig,
sources: dict,
ensemble_name: str,
recompute_metrics: bool = False,
)
Create imputation ensemble from multiple models.
Main entry point for imputation ensembling. Loads reconstructions from submodels, averages them, and computes metrics.
| PARAMETER | DESCRIPTION |
|---|---|
ensemble_model_runs
|
DataFrame of MLflow imputation runs to ensemble.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
sources
|
Source data including ground truth.
TYPE:
|
ensemble_name
|
Name for the ensemble.
TYPE:
|
recompute_metrics
|
If True, only recompute submodel metrics without creating ensemble.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict or None
|
Ensemble output dictionary with metrics, reconstructions, and model artifacts, or None if recompute_metrics=True. |
See Also
ensemble_anomaly_detection.get_anomaly_masks_and_labels
Source code in src/ensemble/ensemble_imputation.py
ensemble_utils
¶
get_unique_models_from_best_runs
¶
Extract unique model architectures from MLflow run names.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs
|
DataFrame containing MLflow runs with 'tags.mlflow.runName' column.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list
|
List of unique model architecture names extracted from run names. |
Source code in src/ensemble/ensemble_utils.py
get_best_run_of_the_model
¶
get_best_run_of_the_model(
best_runs: DataFrame,
model: str,
cfg: DictConfig,
best_metric_cfg: DictConfig,
task: str,
include_all_variants: bool = False,
) -> tuple[Optional[Series], Optional[float]]
Get the best performing run for a specific model architecture.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs
|
DataFrame containing all MLflow runs to search.
TYPE:
|
model
|
Model architecture name to filter for.
TYPE:
|
cfg
|
Hydra configuration object.
TYPE:
|
best_metric_cfg
|
Configuration specifying which metric to use for ranking.
TYPE:
|
task
|
Task type ('anomaly_detection', 'imputation', or 'classification').
TYPE:
|
include_all_variants
|
If True, return all variants of the model instead of just the best.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Series
|
Best run for the specified model. |
float
|
Best metric value for that run. |
Source code in src/ensemble/ensemble_utils.py
exclude_ensembles_from_mlflow_runs
¶
Filter out ensemble runs from MLflow runs DataFrame.
Removes runs that have 'ensemble' in their run name, keeping only individual submodel runs for ensemble creation.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs
|
DataFrame containing MLflow runs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame or None
|
Filtered DataFrame without ensemble runs, or None if empty. |
Source code in src/ensemble/ensemble_utils.py
exclude_imputation_ensembles_from_mlflow_runs
¶
Filter out imputation ensemble runs from MLflow runs.
Parses run names to identify and exclude imputation ensembles, keeping only single-model imputation runs.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs
|
DataFrame containing MLflow imputation runs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Filtered DataFrame without imputation ensemble runs. |
Source code in src/ensemble/ensemble_utils.py
keep_only_imputations_from_anomaly_ensembles
¶
Filter to keep only imputation runs that use anomaly ensemble outputs.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs
|
DataFrame containing MLflow imputation runs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Filtered DataFrame with only runs using anomaly ensemble as input. |
Source code in src/ensemble/ensemble_utils.py
remove_worst_model
¶
remove_worst_model(
best_unique_models: dict[str, Series],
best_metrics: list[float],
best_metric_cfg: DictConfig,
) -> dict[str, Series]
Remove the worst performing model from the ensemble candidates.
Used to ensure odd number of models for majority voting in anomaly detection.
| PARAMETER | DESCRIPTION |
|---|---|
best_unique_models
|
Dictionary mapping model names to their MLflow run data.
TYPE:
|
best_metrics
|
List of metric values corresponding to each model.
TYPE:
|
best_metric_cfg
|
Configuration specifying metric direction ('DESC' or 'ASC').
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Updated dictionary with worst model removed. |
Source code in src/ensemble/ensemble_utils.py
exclude_pupil_orig_imputed
¶
exclude_pupil_orig_imputed(
best_unique_models: dict[str, Series],
best_metrics: list[float],
) -> tuple[dict[str, Series], list[float]]
Exclude models trained on original (non-ground-truth) pupil data.
Removes models with 'orig' in their name to keep only ground-truth trained models.
| PARAMETER | DESCRIPTION |
|---|---|
best_unique_models
|
Dictionary mapping model names to their MLflow run data.
TYPE:
|
best_metrics
|
List of metric values corresponding to each model.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Filtered dictionary without 'orig' models. |
list
|
Corresponding filtered metrics list. |
Source code in src/ensemble/ensemble_utils.py
get_anomaly_runs
¶
get_anomaly_runs(
best_runs: DataFrame,
best_metric_cfg: DictConfig,
cfg: DictConfig,
task: str,
return_odd_number_of_models: bool = False,
exclude_orig_data: bool = True,
include_all_variants: bool = False,
) -> dict[str, Series]
Get best anomaly detection runs for ensemble creation.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs
|
DataFrame containing MLflow anomaly detection runs.
TYPE:
|
best_metric_cfg
|
Configuration for metric selection and thresholding.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
task
|
Task type (should be 'anomaly_detection').
TYPE:
|
return_odd_number_of_models
|
If True, ensures odd number of models for majority voting.
TYPE:
|
exclude_orig_data
|
If True, excludes models trained on original (non-GT) data.
TYPE:
|
include_all_variants
|
If True, includes all model variants instead of just best.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary mapping model names to their best MLflow run data. |
Source code in src/ensemble/ensemble_utils.py
287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 | |
create_new_keys_for_all_variants
¶
create_new_keys_for_all_variants(
best_unique_models: dict[str, DataFrame],
) -> dict[str, DataFrame]
Expand model dictionary to have separate keys for each model variant.
When include_all_variants is True, DataFrames may contain multiple rows. This function creates a new key for each row to support downstream processing.
| PARAMETER | DESCRIPTION |
|---|---|
best_unique_models
|
Dictionary where values may be multi-row DataFrames.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with separate keys for each model variant. |
Source code in src/ensemble/ensemble_utils.py
get_best_imputation_col_name
¶
Construct MLflow column name for imputation metric.
| PARAMETER | DESCRIPTION |
|---|---|
best_metric_cfg
|
Configuration containing 'split' and 'string' keys.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
MLflow column name in format 'metrics.{split}/{metric_name}'. |
Source code in src/ensemble/ensemble_utils.py
get_best_imputation_model_per_run_name
¶
Select the best run when multiple runs share the same run name.
| PARAMETER | DESCRIPTION |
|---|---|
runs
|
DataFrame of runs with the same run name.
TYPE:
|
best_metric_cfg
|
Configuration specifying which metric and direction to use.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Single-row DataFrame with the best run. |
Source code in src/ensemble/ensemble_utils.py
get_best_unique_imputation_models
¶
get_best_unique_imputation_models(
best_runs: DataFrame,
best_metric_cfg: DictConfig,
cfg: DictConfig,
task: str,
) -> DataFrame
Get unique best imputation models, one per run name.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs
|
DataFrame containing all MLflow imputation runs.
TYPE:
|
best_metric_cfg
|
Configuration for metric selection and thresholding.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
task
|
Task type (should be 'imputation').
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with one row per unique model configuration. |
Source code in src/ensemble/ensemble_utils.py
parse_imputation_run_name_for_ensemble
¶
Parse imputation run name to extract model and anomaly source.
Run names follow format: '{model_name}__{anomaly_source}'
| PARAMETER | DESCRIPTION |
|---|---|
run_name
|
MLflow run name for imputation model.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Model name (e.g., 'SAITS', 'MOMENT-finetune'). |
str
|
Anomaly source (e.g., 'pupil_gt_', 'LOF'). |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If run name cannot be parsed. |
Source code in src/ensemble/ensemble_utils.py
filter_runs_for_gt
¶
filter_runs_for_gt(
best_runs: DataFrame,
best_metric_cfg: DictConfig,
cfg: DictConfig,
task: str,
return_best_gt: bool = False,
gt_on: Optional[str] = "anomaly",
) -> DataFrame
Filter runs based on ground truth usage.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs
|
DataFrame containing MLflow runs.
TYPE:
|
best_metric_cfg
|
Configuration for metric selection.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
task
|
Task type ('imputation' or 'classification').
TYPE:
|
return_best_gt
|
If True, return only runs using ground truth. If False, return only runs NOT using ground truth.
TYPE:
|
gt_on
|
For classification, which component should have GT: - 'anomaly': only anomaly detection uses GT - 'imputation': only imputation uses GT - None: both must use GT
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Filtered runs based on GT criteria. |
Source code in src/ensemble/ensemble_utils.py
531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 | |
filter_for_detection
¶
Filter out runs containing specified string (e.g., 'zeroshot').
| PARAMETER | DESCRIPTION |
|---|---|
detection_filter_reject
|
String to filter out from run names (e.g., 'zeroshot').
TYPE:
|
best_runs_out
|
DataFrame containing MLflow runs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Filtered runs without rejected string in name. |
Source code in src/ensemble/ensemble_utils.py
get_non_moment_models
¶
Filter to get only non-MOMENT model runs.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs_out
|
DataFrame containing MLflow runs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Runs without 'MOMENT' in the model name. |
Source code in src/ensemble/ensemble_utils.py
get_best_moment
¶
Get the best performing MOMENT variant.
| PARAMETER | DESCRIPTION |
|---|---|
best_metric_cfg
|
Configuration specifying metric and sort direction.
TYPE:
|
runs_moment
|
DataFrame containing only MOMENT model runs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame or None
|
Single-row DataFrame with best MOMENT run, or None if no MOMENT models exist. |
Source code in src/ensemble/ensemble_utils.py
get_unique_sources
¶
Extract unique anomaly sources from imputation run names.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs_out
|
DataFrame containing MLflow imputation runs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list
|
List of unique anomaly source names. |
Source code in src/ensemble/ensemble_utils.py
keep_moment_models
¶
Filter to keep only MOMENT model runs.
| PARAMETER | DESCRIPTION |
|---|---|
runs_source
|
DataFrame containing MLflow runs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Runs with 'MOMENT' in the model name. |
Source code in src/ensemble/ensemble_utils.py
get_best_moments_per_source
¶
get_best_moments_per_source(
best_runs_out: DataFrame, best_metric_cfg: DictConfig
) -> Optional[DataFrame]
Get best MOMENT model for each unique anomaly source.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs_out
|
DataFrame containing MLflow imputation runs.
TYPE:
|
best_metric_cfg
|
Configuration for metric selection.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame or None
|
Best MOMENT run per source, or None if no MOMENT models found. |
Source code in src/ensemble/ensemble_utils.py
get_best_moment_variant
¶
get_best_moment_variant(
best_runs_out: DataFrame,
best_metric_cfg: DictConfig,
return_best_gt: bool,
) -> DataFrame
Get best MOMENT variant while preserving non-MOMENT models.
Handles MOMENT variants (finetune, zeroshot) by selecting best one per anomaly source, then combines with non-MOMENT models.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs_out
|
DataFrame containing all imputation runs.
TYPE:
|
best_metric_cfg
|
Configuration for metric selection.
TYPE:
|
return_best_gt
|
Whether filtering for ground truth runs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Combined DataFrame with best MOMENT variants and all non-MOMENT models. |
Source code in src/ensemble/ensemble_utils.py
get_imputation_runs
¶
get_imputation_runs(
best_runs: DataFrame,
best_metric_cfg: DictConfig,
cfg: DictConfig,
task: str,
return_best_gt: bool,
detection_filter_reject: str = "zeroshot",
) -> Optional[DataFrame]
Get best imputation runs for ensemble creation.
Applies multiple filters: unique models, GT filtering, variant filtering, and MOMENT variant selection.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs
|
DataFrame containing all MLflow imputation runs.
TYPE:
|
best_metric_cfg
|
Configuration for metric selection and thresholding.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
task
|
Task type (should be 'imputation').
TYPE:
|
return_best_gt
|
If True, return only runs using ground truth anomaly detection.
TYPE:
|
detection_filter_reject
|
String to filter out from run names.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame or None
|
Filtered runs for ensemble, or None if no runs pass filters. |
Source code in src/ensemble/ensemble_utils.py
get_best_unique_classification_models
¶
get_best_unique_classification_models(
best_runs: DataFrame,
best_metric_cfg: DictConfig,
cfg: DictConfig,
task: str,
) -> DataFrame
Get unique best classification models, one per run name.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs
|
Series containing all MLflow classification runs.
TYPE:
|
best_metric_cfg
|
Configuration for metric selection.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
task
|
Task type (should be 'classification').
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with one row per unique model configuration. |
Source code in src/ensemble/ensemble_utils.py
drop_embedding_cls_runs
¶
Filter out classification runs using embedding features.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs_out
|
DataFrame containing classification runs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Runs without 'embedding' in the run name. |
Source code in src/ensemble/ensemble_utils.py
get_list_of_good_models
¶
Get list of classifier models to include in ensembles.
| RETURNS | DESCRIPTION |
|---|---|
list
|
List of classifier names considered 'good' for ensembling. |
Source code in src/ensemble/ensemble_utils.py
keep_the_good_models
¶
Filter to keep only runs from approved classifier list.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs_out
|
DataFrame containing classification runs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Runs using classifiers from the approved list. |
Source code in src/ensemble/ensemble_utils.py
keep_cls_runs_when_both_imputation_and_outlier_are_ensemble
¶
keep_cls_runs_when_both_imputation_and_outlier_are_ensemble(
best_runs_out: DataFrame,
) -> DataFrame
Filter to keep classification runs where both preprocessing steps are ensembles.
Used for full-chain ensemble evaluation where both anomaly detection and imputation were done with ensembled models.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs_out
|
DataFrame containing classification runs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Runs where both imputation and outlier detection used ensembles. |
Source code in src/ensemble/ensemble_utils.py
get_classification_runs
¶
get_classification_runs(
best_runs: DataFrame,
best_metric_cfg: DictConfig,
cfg: DictConfig,
task: str,
return_best_gt: bool = True,
gt_on: Optional[str] = "anomaly",
return_only_ensembled_inputs: bool = False,
) -> DataFrame
Get best classification runs for ensemble creation.
Applies filters for unique models, GT usage, embedding exclusion, and approved classifiers.
| PARAMETER | DESCRIPTION |
|---|---|
best_runs
|
DataFrame containing all classification runs.
TYPE:
|
best_metric_cfg
|
Configuration for metric selection.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
task
|
Task type (should be 'classification').
TYPE:
|
return_best_gt
|
If True, return only runs using ground truth preprocessing.
TYPE:
|
gt_on
|
Which component should use GT ('anomaly', 'imputation', or None for both).
TYPE:
|
return_only_ensembled_inputs
|
If True, only return runs where both preprocessing steps were ensembles.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Filtered classification runs for ensemble. |
Source code in src/ensemble/ensemble_utils.py
get_used_models_from_mlflow
¶
get_used_models_from_mlflow(
experiment_name: str,
cfg: DictConfig,
task: str = "anomaly_detection",
exclude_ensemble: bool = True,
return_odd_number_of_models: bool = False,
return_best_gt: bool = False,
return_anomaly_ensembles: bool = False,
gt_on: str = None,
include_all_variants: bool = False,
return_all_runs: bool = False,
return_only_ensembled_inputs: bool = False,
) -> Union[dict[str, Series], DataFrame]
Retrieve best models from MLflow for ensemble creation.
Main entry point for getting submodels to ensemble. Queries MLflow and applies task-specific filtering and selection logic.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
MLflow experiment name to query.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
task
|
Task type: 'anomaly_detection', 'imputation', or 'classification'.
TYPE:
|
exclude_ensemble
|
If True, exclude existing ensemble runs from results.
TYPE:
|
return_odd_number_of_models
|
If True, ensure odd number of models (for majority voting).
TYPE:
|
return_best_gt
|
If True, return only runs using ground truth.
TYPE:
|
return_anomaly_ensembles
|
If True, return imputation runs that used anomaly ensembles.
TYPE:
|
gt_on
|
For classification, which component uses GT.
TYPE:
|
include_all_variants
|
If True, include all model variants.
TYPE:
|
return_all_runs
|
If True, return all runs without filtering.
TYPE:
|
return_only_ensembled_inputs
|
If True, only return runs with ensembled preprocessing.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict or DataFrame
|
Dictionary mapping model names to run data, or DataFrame if return_all_runs. |
Source code in src/ensemble/ensemble_utils.py
1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 | |
ensemble_the_imputation_output_dicts
¶
ensemble_the_imputation_output_dicts(
results_per_model: dict,
ensembled_outputs: dict,
i: int,
submodel: str,
) -> dict
Aggregate imputation outputs from multiple submodels.
Stacks imputation arrays from each submodel into a 4D array (subjects x timepoints x features x submodels) for later ensemble statistics.
| PARAMETER | DESCRIPTION |
|---|---|
results_per_model
|
Imputation results from a single submodel.
TYPE:
|
ensembled_outputs
|
Accumulated ensemble outputs (modified in place).
TYPE:
|
i
|
Index of current submodel.
TYPE:
|
submodel
|
Name of current submodel.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Updated ensembled_outputs with new submodel added. |
Source code in src/ensemble/ensemble_utils.py
1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 | |
compute_ensemble_stats
¶
compute_ensemble_stats(
ensembled_outputs: dict,
ensemble_name: str,
n: int,
cfg: DictConfig,
) -> dict
Compute ensemble statistics (mean, std, CI) from stacked submodel outputs.
| PARAMETER | DESCRIPTION |
|---|---|
ensembled_outputs
|
Dictionary with 4D arrays from stacked submodel predictions.
TYPE:
|
ensemble_name
|
Name of the ensemble.
TYPE:
|
n
|
Number of submodels.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Ensemble outputs with computed statistics (mean, std, CI). |
Source code in src/ensemble/ensemble_utils.py
ensemble_the_imputation_results
¶
ensemble_the_imputation_results(
ensemble_name: str,
mlflow_ensemble: dict[str, Series],
cfg: DictConfig,
) -> dict
Create ensemble from multiple imputation model outputs.
Loads imputation results from each submodel, stacks predictions, and computes ensemble statistics.
| PARAMETER | DESCRIPTION |
|---|---|
ensemble_name
|
Name for the ensemble.
TYPE:
|
mlflow_ensemble
|
Dictionary mapping submodel names to their MLflow run data.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Ensembled imputation output with statistics and metadata. |
Source code in src/ensemble/ensemble_utils.py
1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 | |
get_ensemble_permutations
¶
get_ensemble_permutations(
best_unique_models: dict[str, Series],
_ensemble_cfg: DictConfig,
cfg: DictConfig,
) -> dict[str, dict[str, Series]]
Generate ensemble configurations from available submodels.
Currently creates a single ensemble using all available models. Placeholder for future permutation logic.
| PARAMETER | DESCRIPTION |
|---|---|
best_unique_models
|
Dictionary of available submodels.
TYPE:
|
_ensemble_cfg
|
Ensemble-specific configuration (currently unused).
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary mapping ensemble names to their submodel dictionaries. |
Source code in src/ensemble/ensemble_utils.py
get_imputation_results_from_for_ensembling
¶
get_imputation_results_from_for_ensembling(
experiment_name: str, cfg: DictConfig
) -> dict[str, dict]
Get imputation results and create ensembles from MLflow experiment.
High-level function that retrieves best submodels and creates imputation ensemble(s).
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
MLflow experiment name.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary mapping ensemble names to their ensembled outputs. Empty dict if insufficient models for ensembling. |
Source code in src/ensemble/ensemble_utils.py
get_gt_imputation_labels
¶
Extract ground truth imputation masks from source data.
| PARAMETER | DESCRIPTION |
|---|---|
sources
|
Dictionary containing 'pupil_gt' with ground truth data.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with train/test split imputation masks as int arrays. |
Source code in src/ensemble/ensemble_utils.py
get_metadata_dict_from_sources
¶
Extract metadata dictionary from source data.
| PARAMETER | DESCRIPTION |
|---|---|
sources
|
Dictionary containing data sources.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with train/test split metadata. |
Source code in src/ensemble/ensemble_utils.py
combine_ensembles_into_one_df
¶
Combine multiple ensemble DataFrames into a single DataFrame.
| PARAMETER | DESCRIPTION |
|---|---|
best_unique_models
|
Dictionary where values may be DataFrames of model runs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame or None
|
Combined DataFrame of all models, or None if empty. |
Source code in src/ensemble/ensemble_utils.py
aggregate_codes
¶
Aggregate subject codes from multiple MLflow runs.
Extracts train/test subject codes from each run to verify all models used same data splits.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
DataFrame of MLflow runs with 'params.codes_train' and 'params.codes_test'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with 'train' and 'test' DataFrames of subject codes. |
Source code in src/ensemble/ensemble_utils.py
are_codes_the_same
¶
Check if all columns in DataFrame have identical values.
Used to verify all submodels were trained on same subjects.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
DataFrame where each column represents codes from a model.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if all columns have identical values, False otherwise. |
Source code in src/ensemble/ensemble_utils.py
check_codes_used
¶
Verify all ensemble submodels were trained on the same subjects.
| PARAMETER | DESCRIPTION |
|---|---|
best_unique_models
|
Dictionary of submodels to check.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict or None
|
Input dictionary if checks pass, None if no valid data. |
Source code in src/ensemble/ensemble_utils.py
get_grouped_classification_runs
¶
get_grouped_classification_runs(
best_unique_models: dict,
experiment_name: str,
cfg: DictConfig,
task: str,
) -> dict
Group classification runs by ground truth usage pattern.
Creates groups for: - pupil_gt: Both anomaly and imputation use GT - anomaly_gt: Only anomaly detection uses GT - ensembled_input: Both use ensemble outputs
| PARAMETER | DESCRIPTION |
|---|---|
best_unique_models
|
Dictionary to populate with grouped runs.
TYPE:
|
experiment_name
|
MLflow experiment name.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
task
|
Task type (should be 'classification').
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with runs grouped by GT usage pattern. |
Source code in src/ensemble/ensemble_utils.py
get_results_from_mlflow_for_ensembling
¶
get_results_from_mlflow_for_ensembling(
experiment_name: str,
cfg: DictConfig,
task: str,
recompute_metrics: bool = False,
) -> Optional[dict]
Get MLflow results organized for ensemble creation.
Main entry point for retrieving submodels for ensembling across all tasks. Handles task-specific logic for anomaly detection, imputation, and classification.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
MLflow experiment name.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
task
|
Task type: 'anomaly_detection', 'imputation', or 'classification'.
TYPE:
|
recompute_metrics
|
If True, only retrieve runs for metric recomputation.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict or None
|
Dictionary of grouped submodel runs, or None if no valid runs found. |
Source code in src/ensemble/ensemble_utils.py
1809 1810 1811 1812 1813 1814 1815 1816 1817 1818 1819 1820 1821 1822 1823 1824 1825 1826 1827 1828 1829 1830 1831 1832 1833 1834 1835 1836 1837 1838 1839 1840 1841 1842 1843 1844 1845 1846 1847 1848 1849 1850 1851 1852 1853 1854 1855 1856 1857 1858 1859 1860 1861 1862 1863 1864 1865 1866 1867 1868 1869 1870 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 | |
ensemble_logging
¶
Ensemble MLflow logging module.
Provides utilities for logging ensemble results to MLflow, including metrics, artifacts, and run naming.
get_ensemble_pickle_name
¶
Generate pickle filename for ensemble results.
| PARAMETER | DESCRIPTION |
|---|---|
ensemble_name
|
Name of the ensemble.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Filename in format 'ensemble_{name}_results.pickle'. |
Source code in src/ensemble/ensemble_logging.py
get_source_runs
¶
get_source_runs(
ensemble_mlflow_runs_per_name: Union[
Dict[str, Dict[str, Any]], DataFrame
],
) -> List[str]
Extract run IDs from ensemble submodel data.
| PARAMETER | DESCRIPTION |
|---|---|
ensemble_mlflow_runs_per_name
|
Submodel data (dict of dicts or DataFrame).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list
|
List of MLflow run IDs. |
Source code in src/ensemble/ensemble_logging.py
get_ensemble_name
¶
get_ensemble_name(
runs_per_name: Union[DataFrame, Dict[str, Any]],
ensemble_name_base: str,
ensemble_prefix_str: str,
sort_name: str = "params.model",
) -> str
Generate ensemble run name from submodel names.
| PARAMETER | DESCRIPTION |
|---|---|
runs_per_name
|
Submodel runs data.
TYPE:
|
ensemble_name_base
|
Base name (e.g., source like 'pupil_gt').
TYPE:
|
ensemble_prefix_str
|
Prefix (e.g., 'ensemble' or 'ensembleThresholded').
TYPE:
|
sort_name
|
Column/key to sort models by.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Ensemble name in format '{prefix}-{model1}-{model2}...{__{base}}'. |
Source code in src/ensemble/ensemble_logging.py
log_ensemble_metrics
¶
Log ensemble metrics to MLflow.
Handles different metric structures for anomaly detection and imputation.
| PARAMETER | DESCRIPTION |
|---|---|
metrics
|
Metrics dictionary.
TYPE:
|
task
|
Task type ('anomaly_detection' or 'imputation').
TYPE:
|
Source code in src/ensemble/ensemble_logging.py
log_ensemble_arrays
¶
Save and log ensemble arrays as MLflow artifact.
| PARAMETER | DESCRIPTION |
|---|---|
pred_masks
|
Prediction data to save.
TYPE:
|
task
|
Task type for artifact subdirectory.
TYPE:
|
ensemble_name
|
Name for the pickle file.
TYPE:
|
Source code in src/ensemble/ensemble_logging.py
ensemble_is_empty
¶
ensemble_is_empty(
ensemble_mlflow_runs: Dict[
str, Union[Dict[str, Any], DataFrame]
],
ensemble_name: str,
) -> bool
Check if ensemble has no submodels.
| PARAMETER | DESCRIPTION |
|---|---|
ensemble_mlflow_runs
|
Dictionary of ensemble runs.
TYPE:
|
ensemble_name
|
Name of ensemble to check.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if ensemble is empty. |
Source code in src/ensemble/ensemble_logging.py
get_sort_name
¶
Get parameter name for sorting models by task.
| PARAMETER | DESCRIPTION |
|---|---|
task
|
Task type.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
MLflow parameter column name for model name. |
Source code in src/ensemble/ensemble_logging.py
get_ensemble_quality_threshold
¶
Get quality threshold for ensemble submodel selection.
| PARAMETER | DESCRIPTION |
|---|---|
task
|
Task type.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float or None
|
Quality threshold value, or None if not applicable. |
Source code in src/ensemble/ensemble_logging.py
get_ensemble_prefix
¶
Get prefix string for ensemble name based on quality thresholding.
| PARAMETER | DESCRIPTION |
|---|---|
task
|
Task type.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
'ensembleThresholded' if threshold set, 'ensemble' otherwise. |
Source code in src/ensemble/ensemble_logging.py
get_mlflow_ensemble_name
¶
get_mlflow_ensemble_name(
task: str,
ensemble_mlflow_runs: Dict[
str, Union[Dict[str, Any], DataFrame]
],
ensemble_name: str,
cfg: DictConfig,
) -> Optional[str]
Generate full MLflow run name for ensemble.
| PARAMETER | DESCRIPTION |
|---|---|
task
|
Task type.
TYPE:
|
ensemble_mlflow_runs
|
Dictionary of ensemble submodel runs.
TYPE:
|
ensemble_name
|
Source name (e.g., 'pupil_gt').
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str or None
|
Full ensemble name (e.g., 'ensemble-CatBoost-XGBoost__pupil_gt'), or None if ensemble is empty. |
Source code in src/ensemble/ensemble_logging.py
get_existing_runs
¶
Check for existing MLflow runs with same name.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
MLflow experiment name.
TYPE:
|
mlflow_ensemble_name
|
Ensemble run name to search for.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Matching runs. |
bool
|
True if matching runs exist. |
Source code in src/ensemble/ensemble_logging.py
check_for_old_run
¶
check_for_old_run(
experiment_name: str,
mlflow_ensemble_name: str,
cfg: DictConfig,
delete_old_mlflow_run: bool = True,
) -> bool
Check for and optionally delete existing ensemble runs.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
MLflow experiment name.
TYPE:
|
mlflow_ensemble_name
|
Ensemble run name.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
delete_old_mlflow_run
|
If True, delete existing runs with same name.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if logging should continue. |
Source code in src/ensemble/ensemble_logging.py
log_ensembling_to_mlflow
¶
log_ensembling_to_mlflow(
experiment_name: str,
ensemble_mlflow_runs: Dict[
str, Union[Dict[str, Any], DataFrame]
],
ensemble_name: str,
cfg: DictConfig,
task: str,
metrics: Optional[Dict[str, Any]] = None,
pred_masks: Optional[Dict[str, Any]] = None,
output_dict: Optional[Dict[str, Any]] = None,
) -> None
Log ensemble results to MLflow.
Creates a new MLflow run for the ensemble, logs metrics, parameters, and artifacts based on task type.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
MLflow experiment name.
TYPE:
|
ensemble_mlflow_runs
|
Dictionary of ensemble submodel runs.
TYPE:
|
ensemble_name
|
Source name for the ensemble.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
task
|
Task type ('anomaly_detection', 'imputation', or 'classification').
TYPE:
|
metrics
|
Pre-computed metrics (for anomaly detection).
TYPE:
|
pred_masks
|
Prediction masks (for anomaly detection).
TYPE:
|
output_dict
|
Full output dictionary (for imputation/classification).
TYPE:
|
Source code in src/ensemble/ensemble_logging.py
410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 | |
tasks_ensembling
¶
Ensemble task orchestration module.
Provides high-level functions to coordinate ensemble creation and logging across anomaly detection, imputation, and classification tasks.
check_if_for_reprocess
¶
Check whether to reprocess an existing ensemble.
For anomaly detection and imputation, ensembles are fast to compute. For classification, may want to skip for debugging.
| PARAMETER | DESCRIPTION |
|---|---|
mlflow_ensemble_name
|
Name of the ensemble.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if ensemble should be (re)processed. |
Source code in src/ensemble/tasks_ensembling.py
get_ensembled_prediction
¶
get_ensembled_prediction(
ensemble_mlflow_runs: dict,
experiment_name: str,
cfg: DictConfig,
task: str,
sources: dict,
recompute_metrics: bool = False,
)
Create ensemble predictions for all ensemble configurations.
| PARAMETER | DESCRIPTION |
|---|---|
ensemble_mlflow_runs
|
Dictionary where keys are ensemble names (e.g., 'pupil_gt') and values are dicts/DataFrames of submodel runs.
TYPE:
|
experiment_name
|
MLflow experiment name.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
task
|
Task type ('anomaly_detection', 'imputation', or 'classification').
TYPE:
|
sources
|
Source data.
TYPE:
|
recompute_metrics
|
If True, only recompute submodel metrics.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary mapping ensemble names to their outputs. |
Source code in src/ensemble/tasks_ensembling.py
log_ensemble_to_mlflow
¶
log_ensemble_to_mlflow(
ensemble_mlflow_runs: dict,
experiment_name: str,
cfg: DictConfig,
ensemble_output: dict,
task: str,
recompute_metrics: bool = False,
)
Log all ensemble outputs to MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
ensemble_mlflow_runs
|
Dictionary of ensemble submodel runs.
TYPE:
|
experiment_name
|
MLflow experiment name.
TYPE:
|
cfg
|
Main Hydra configuration.
TYPE:
|
ensemble_output
|
Dictionary mapping ensemble names to their outputs.
TYPE:
|
task
|
Task type.
TYPE:
|
recompute_metrics
|
If True, metrics were recomputed (affects logging).
TYPE:
|
Source code in src/ensemble/tasks_ensembling.py
task_ensemble
¶
Main ensemble task: create and log ensembles for a given task.
Orchestrates the full ensembling pipeline: 1. Retrieve submodel runs from MLflow 2. Create ensemble predictions 3. Log results to MLflow
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Main Hydra configuration.
TYPE:
|
task
|
Task type ('anomaly_detection', 'imputation', or 'classification').
TYPE:
|
sources
|
Source data.
TYPE:
|
recompute_metrics
|
If True, only recompute submodel metrics without creating ensembles.
TYPE:
|
Source code in src/ensemble/tasks_ensembling.py
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 | |