Metrics Module¶
Evaluation metrics for imputation and classification quality assessment.
Overview¶
The metrics module provides utilities for computing evaluation metrics, particularly for imputation quality assessment.
API Reference¶
evaluate_imputation_metrics
¶
get_imputation_metric_dict
¶
get_imputation_metric_dict(
model_name: str,
imputation_artifacts: Dict,
cfg: DictConfig,
) -> Dict[str, Dict]
Compute imputation metrics for all data splits.
| PARAMETER | DESCRIPTION |
|---|---|
model_name
|
Name of the imputation model being evaluated.
TYPE:
|
imputation_artifacts
|
Dictionary containing model artifacts and source data with imputation results.
TYPE:
|
cfg
|
Hydra configuration object.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary mapping split names to their computed metrics. |
Source code in src/metrics/evaluate_imputation_metrics.py
log_metrics_per_split_as_mlflow_artifact
¶
log_metrics_per_split_as_mlflow_artifact(
metrics_global: Dict,
model_name: str,
split: str,
model_artifacts: Dict,
cfg: DictConfig,
) -> None
Log global imputation metrics to MLflow as an artifact.
| PARAMETER | DESCRIPTION |
|---|---|
metrics_global
|
Dictionary of global metrics (e.g., MAE, MSE, MRE).
TYPE:
|
model_name
|
Name of the imputation model.
TYPE:
|
split
|
Data split name (e.g., 'train', 'val', 'test').
TYPE:
|
model_artifacts
|
Dictionary containing model artifacts.
TYPE:
|
cfg
|
Hydra configuration object.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
|
Source code in src/metrics/evaluate_imputation_metrics.py
recompute_submodel_imputation_metrics
¶
recompute_submodel_imputation_metrics(
run_id: str,
submodel_mlflow_run: Run,
model_name: str,
gt_dict: Dict,
gt_preprocess: Dict,
reconstructions_submodel: Dict[str, ndarray],
cfg: DictConfig,
) -> Dict[str, Dict]
Recompute and re-log imputation metrics for a submodel to MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
run_id
|
MLflow run ID to log metrics to.
TYPE:
|
submodel_mlflow_run
|
MLflow run object for the submodel.
TYPE:
|
model_name
|
Name of the imputation model.
TYPE:
|
gt_dict
|
Ground truth data dictionary with labels and data per split.
TYPE:
|
gt_preprocess
|
Preprocessing parameters used for destandardization.
TYPE:
|
reconstructions_submodel
|
Dictionary mapping splits to imputation arrays.
TYPE:
|
cfg
|
Hydra configuration object.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary mapping splits to their computed metrics. |
See Also
compute_granular_metrics : For anomaly detection recomputation.
Source code in src/metrics/evaluate_imputation_metrics.py
compute_metrics_by_model
¶
compute_metrics_by_model(
model_name: str,
imputation_artifacts: Dict,
cfg: DictConfig,
_log_if_improved: bool = True,
log_mlflow: bool = True,
) -> Dict[str, Dict]
Compute and log imputation metrics for a given model across all splits.
| PARAMETER | DESCRIPTION |
|---|---|
model_name
|
Name of the imputation model being evaluated.
TYPE:
|
imputation_artifacts
|
Dictionary containing model artifacts, source data, and optionally pre-computed metrics.
TYPE:
|
cfg
|
Hydra configuration object.
TYPE:
|
_log_if_improved
|
Unused parameter for future model registry logging (default True).
TYPE:
|
log_mlflow
|
Whether to log subjectwise metrics as MLflow artifact (default True).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary mapping splits to their computed metrics (global, subjectwise). |
Source code in src/metrics/evaluate_imputation_metrics.py
compute_metrics_by_split
¶
compute_metrics_by_split(
split_imputation: Dict,
preprocess_dict: Dict,
split_data: Dict,
model_name: str,
split: str,
cfg: DictConfig,
) -> Dict
Compute imputation metrics for a single data split.
| PARAMETER | DESCRIPTION |
|---|---|
split_imputation
|
Imputation results for the split.
TYPE:
|
preprocess_dict
|
Preprocessing parameters for destandardization.
TYPE:
|
split_data
|
Original data for the split including metadata.
TYPE:
|
model_name
|
Name of the imputation model.
TYPE:
|
split
|
Data split name (e.g., 'train', 'val', 'test').
TYPE:
|
cfg
|
Hydra configuration object.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with 'global', 'subjectwise', and 'subjectwise_arrays' metrics. |
Source code in src/metrics/evaluate_imputation_metrics.py
compute_imputation_metrics
¶
compute_imputation_metrics(
targets: ndarray,
predictions: ndarray,
indicating_mask: ndarray,
cfg: DictConfig,
metadata_dict: Dict,
checks_on: bool = False,
) -> Dict
Compute global and subjectwise imputation metrics using BenchPOTS methodology.
Uses the BenchPOTS suite for fair evaluation of imputation algorithms. See https://arxiv.org/pdf/2406.12747 and https://github.com/WenjieDu/BenchPOTS.
| PARAMETER | DESCRIPTION |
|---|---|
targets
|
Ground truth values, shape (n_subjects, n_timepoints, n_features).
TYPE:
|
predictions
|
Imputed predictions, shape (n_subjects, n_timepoints, n_features).
TYPE:
|
indicating_mask
|
Binary mask indicating missing values, shape (n_subjects, n_timepoints, n_features).
TYPE:
|
cfg
|
Hydra configuration object.
TYPE:
|
metadata_dict
|
Metadata including subject codes for subjectwise metrics.
TYPE:
|
checks_on
|
Whether to run prechecks for NaN removal and validation (default False).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with 'global', 'subjectwise', and 'subjectwise_arrays' keys. |
Source code in src/metrics/evaluate_imputation_metrics.py
compute_CI_imputation_metrics
¶
compute_CI_imputation_metrics(
metrics_subjectwise: Dict,
metrics_global: Dict,
p: float = 0.05,
) -> Tuple[Dict[str, ndarray], Dict]
Compute confidence intervals for imputation metrics from subjectwise values.
| PARAMETER | DESCRIPTION |
|---|---|
metrics_subjectwise
|
Dictionary mapping subject codes to their metric dictionaries.
TYPE:
|
metrics_global
|
Global metrics dictionary to augment with CI values.
TYPE:
|
p
|
Percentile for CI bounds (default 0.05 for 5th and 95th percentiles).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
(metrics_subjectwise_arrays, metrics_global) where arrays contain per-metric numpy arrays and global dict includes CI bounds. |
Source code in src/metrics/evaluate_imputation_metrics.py
get_arrays_from_subject_dicts
¶
Convert subjectwise metric dictionaries to arrays per metric.
| PARAMETER | DESCRIPTION |
|---|---|
metrics_subjectwise
|
Dictionary mapping subject codes to their metric dictionaries.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary mapping metric names to numpy arrays of values across subjects. |
Source code in src/metrics/evaluate_imputation_metrics.py
subjectwise_metrics_wrapper
¶
subjectwise_metrics_wrapper(
predictions: ndarray,
targets: ndarray,
masks: ndarray,
cfg: DictConfig,
metadata_dict: Dict,
checks_on: bool = False,
) -> Dict[str, Dict]
Compute imputation metrics for each subject individually.
| PARAMETER | DESCRIPTION |
|---|---|
predictions
|
Imputed predictions, shape (n_subjects, n_timepoints, n_features).
TYPE:
|
targets
|
Ground truth values, shape (n_subjects, n_timepoints, n_features).
TYPE:
|
masks
|
Binary mask indicating missing values, shape (n_subjects, n_timepoints, n_features).
TYPE:
|
cfg
|
Hydra configuration object.
TYPE:
|
metadata_dict
|
Metadata containing subject codes.
TYPE:
|
checks_on
|
Whether to run prechecks for NaN handling (default False).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary mapping subject codes to their metric dictionaries. |
Source code in src/metrics/evaluate_imputation_metrics.py
check_target_pred_ratio
¶
Check for scale mismatch between targets and predictions.
Logs warnings if the ratio between prediction and target means is infinite or NaN, which may indicate standardization issues.
| PARAMETER | DESCRIPTION |
|---|---|
targets
|
Ground truth values.
TYPE:
|
predictions
|
Imputed predictions.
TYPE:
|
subject_code
|
Subject identifier for logging purposes.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
|
Source code in src/metrics/evaluate_imputation_metrics.py
remove_NaNs_from_triplet
¶
remove_NaNs_from_triplet(
X: ndarray, Y: ndarray, mask: ndarray
) -> Tuple[ndarray, ndarray, ndarray]
Remove NaN values from predictions, targets, and mask arrays by cropping.
Handles NaNs that may occur from padding in models like MOMENT.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Predictions array.
TYPE:
|
Y
|
Targets array.
TYPE:
|
mask
|
Indicating mask array.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
(X, Y, mask) with NaN regions cropped out. |
Source code in src/metrics/evaluate_imputation_metrics.py
check_for_nan_subjects
¶
check_for_nan_subjects(
X: ndarray,
Y: ndarray,
mask: ndarray,
return_nanfree: bool = False,
) -> Tuple[ndarray, ndarray, ndarray]
Check for and optionally remove subjects with NaN values.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Predictions array, shape (n_subjects, n_timepoints, n_features).
TYPE:
|
Y
|
Targets array, shape (n_subjects, n_timepoints, n_features).
TYPE:
|
mask
|
Indicating mask array, shape (n_subjects, n_timepoints, n_features).
TYPE:
|
return_nanfree
|
If True, return arrays with NaN subjects removed (default False).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
(X, Y, mask) optionally filtered to exclude subjects with NaNs. |
Source code in src/metrics/evaluate_imputation_metrics.py
imputation_metrics_wrapper
¶
imputation_metrics_wrapper(
predictions: ndarray,
targets: ndarray,
masks: ndarray,
subject_code: str,
prechecks: bool = False,
) -> Dict
Compute imputation metrics (MAE, MSE, MRE) using PyPOTS utilities.
| PARAMETER | DESCRIPTION |
|---|---|
predictions
|
Imputed predictions array.
TYPE:
|
targets
|
Ground truth values array.
TYPE:
|
masks
|
Binary mask indicating missing values.
TYPE:
|
subject_code
|
Subject identifier or 'global' for aggregate metrics.
TYPE:
|
prechecks
|
Whether to run NaN removal and validation checks (default False).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with 'mae', 'mse', 'mre', 'missing_rate' keys. |
Source code in src/metrics/evaluate_imputation_metrics.py
if_recompute_metrics
¶
Determine whether to recompute imputation metrics.
Currently a placeholder that always returns True.
| PARAMETER | DESCRIPTION |
|---|---|
metrics_path
|
Path to existing metrics file.
TYPE:
|
_metrics_cfg
|
Metrics configuration (unused).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
Always returns True (recompute metrics). |
Source code in src/metrics/evaluate_imputation_metrics.py
metrics_utils
¶
Key Functions¶
| Function | Description |
|---|---|
evaluate_imputation_metrics |
Compute MAE, RMSE for imputation |
compute_reconstruction_error |
Signal reconstruction quality |