anomaly_detection¶
Outlier detection methods for PLR signals.
Overview¶
This module provides 15 different outlier detection methods:
- Ground Truth: Human-annotated masks
- Foundation Models: MOMENT, UniTS, TimesNet
- Traditional: LOF, OneClassSVM, PROPHET
- Ensembles: Voting combinations
Main Entry Point¶
flow_anomaly_detection
¶
flow_anomaly_detection
¶
Run the complete anomaly detection flow for PLR data.
Orchestrates outlier detection across multiple hyperparameter configurations and creates ensembles of the individual models.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Hydra configuration containing model and experiment parameters.
TYPE:
|
df
|
Input PLR data with the following columns: - 'pupil_orig': Original recording from the pupillometer. The software has rejected some clear artifacts (null), but blink artifacts remain. - 'pupil_raw': Output from anomaly detection (ground truth for modeling). All outliers are set to null for subsequent imputation. - 'gt': Ground truth for imputation containing manually-supervised imputation (manually placed points + MissForest), denoised with CEEMD. This signal lacks the high-frequency noise present in raw signal.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
The input DataFrame (currently unchanged; results logged to MLflow). |
Notes
This function: 1. Runs outlier detection for each hyperparameter configuration 2. Logs results to MLflow 3. Creates ensemble models from individual detectors
Source code in src/anomaly_detection/flow_anomaly_detection.py
Core Functions¶
anomaly_detection
¶
outlier_detection_selector
¶
outlier_detection_selector(
df: DataFrame,
cfg: DictConfig,
experiment_name: str,
run_name: str,
model_name: str,
)
Select and execute outlier detection method.
Dispatches to the appropriate outlier detection implementation based on model_name. Supports foundation models (MOMENT, UniTS, TimesNet), traditional methods (LOF, OneClassSVM, SubPCA), and Prophet.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Input PLR data with columns: pupil_raw, pupil_gt, etc.
TYPE:
|
cfg
|
Full Hydra configuration.
TYPE:
|
experiment_name
|
MLflow experiment name for tracking.
TYPE:
|
run_name
|
MLflow run name.
TYPE:
|
model_name
|
Name of outlier detection method. One of: 'MOMENT', 'TimesNet', 'UniTS', 'LOF', 'OneClassSVM', 'PROPHET', 'SigLLM'.
TYPE:
|
Notes
SubPCA/EIF (TSB_AD) was archived - see archived/TSB_AD/ (not used in final paper).
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
(outlier_artifacts, model) where: - outlier_artifacts: dict with detection results and metrics - model: trained outlier detection model |
| RAISES | DESCRIPTION |
|---|---|
NotImplementedError
|
If model_name is not supported. |
Source code in src/anomaly_detection/anomaly_detection.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 | |
outlier_detection_PLR_workflow
¶
outlier_detection_PLR_workflow(
df: DataFrame,
cfg: DictConfig,
experiment_name: str,
run_name: str,
) -> dict
Run the complete outlier detection workflow for PLR data.
Orchestrates the outlier detection pipeline: 1. Check if recomputation is needed (vs loading existing results) 2. Run outlier detection if needed 3. Log results to MLflow
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Input PLR data.
TYPE:
|
cfg
|
Full Hydra configuration.
TYPE:
|
experiment_name
|
MLflow experiment name.
TYPE:
|
run_name
|
MLflow run name.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Outlier detection artifacts including masks and metrics. |
Source code in src/anomaly_detection/anomaly_detection.py
anomaly_utils
¶
pick_just_one_light_vector
¶
Extract a single light vector from the dataset.
Since all subjects share the same light stimulus timing, we only need one representative light vector for analysis.
| PARAMETER | DESCRIPTION |
|---|---|
light
|
Series containing light stimulus arrays for each color channel. Each value is a 2D array where the first dimension is subjects.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with the same keys as input, but with 1D arrays (single subject's light vector for each channel). |
Source code in src/anomaly_detection/anomaly_utils.py
get_data_for_sklearn_anomaly_models
¶
get_data_for_sklearn_anomaly_models(
df: DataFrame, cfg: DictConfig, train_on: str
) -> Tuple[
ndarray, ndarray, ndarray, ndarray, Dict[str, ndarray]
]
Prepare data for sklearn-based anomaly detection models.
Extracts and formats training and test data from a Polars DataFrame for use with traditional machine learning outlier detection methods.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Input PLR data containing pupil signals and labels.
TYPE:
|
cfg
|
Hydra configuration containing data processing parameters.
TYPE:
|
train_on
|
Column name specifying which pupil signal to use for training (e.g., 'pupil_orig', 'pupil_raw').
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
A tuple containing: - X : np.ndarray Training data array of shape (n_subjects, n_timepoints). - y : np.ndarray Training labels (outlier mask) of same shape as X. - X_test : np.ndarray Test data array. - y_test : np.ndarray Test labels (outlier mask). - light : dict Light stimulus timing vectors for each color channel. |
Source code in src/anomaly_detection/anomaly_utils.py
sort_anomaly_detection_runs_ensemble
¶
sort_anomaly_detection_runs_ensemble(
mlflow_runs: DataFrame,
best_metric_cfg: DictConfig,
sort_by: str,
task: str,
) -> Series
Sort MLflow runs for ensemble anomaly detection by specified metric.
| PARAMETER | DESCRIPTION |
|---|---|
mlflow_runs
|
DataFrame containing MLflow run information.
TYPE:
|
best_metric_cfg
|
Configuration specifying which metric to use and sort direction.
TYPE:
|
sort_by
|
Sorting strategy. Currently only 'best_metric' is supported.
TYPE:
|
task
|
Task name for metric column lookup (e.g., 'outlier_detection').
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Series
|
The best run according to the specified sorting criteria. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If sort_by is not 'best_metric' or direction is unknown. |
Source code in src/anomaly_detection/anomaly_utils.py
sort_anomaly_detection_runs
¶
Sort MLflow anomaly detection runs by time or loss metric.
| PARAMETER | DESCRIPTION |
|---|---|
mlflow_runs
|
DataFrame containing MLflow run information.
TYPE:
|
best_string
|
Column name for the loss metric when sorting by 'best_loss'.
TYPE:
|
sort_by
|
Sorting strategy: 'start_time' for most recent, 'best_loss' for lowest loss.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Series
|
The best run according to the specified sorting criteria. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If sort_by is not 'start_time' or 'best_loss'. |
Notes
To be combined eventually with newer sort_anomaly_detection_runs_ensemble().
Source code in src/anomaly_detection/anomaly_utils.py
get_anomaly_detection_run
¶
get_anomaly_detection_run(
experiment_name: str,
cfg: DictConfig,
sort_by: str = "start_time",
best_string: str = "best_loss",
best_metric_cfg: Optional[DictConfig] = None,
) -> Optional[Series]
Retrieve a previous anomaly detection run from MLflow.
Searches for existing runs matching the current configuration and returns the best one according to the specified sorting criteria.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
Name of the MLflow experiment to search.
TYPE:
|
cfg
|
Hydra configuration for determining run name.
TYPE:
|
sort_by
|
Sorting strategy: 'start_time' or 'best_loss'. Default is 'start_time'.
TYPE:
|
best_string
|
Column name for loss metric. Default is 'best_loss'.
TYPE:
|
best_metric_cfg
|
Configuration for ensemble metric sorting. Default is None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Series or None
|
The matching MLflow run as a Series, or None if no matching run found. |
Source code in src/anomaly_detection/anomaly_utils.py
if_remote_anomaly_detection
¶
if_remote_anomaly_detection(
try_to_recompute: bool,
_anomaly_cfg: DictConfig,
experiment_name: str,
cfg: DictConfig,
) -> bool
Determine whether to recompute anomaly detection or use cached results.
| PARAMETER | DESCRIPTION |
|---|---|
try_to_recompute
|
If True, always recompute regardless of cached results.
TYPE:
|
_anomaly_cfg
|
Anomaly detection configuration (currently unused).
TYPE:
|
experiment_name
|
MLflow experiment name to check for existing runs.
TYPE:
|
cfg
|
Full Hydra configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if anomaly detection should be (re)computed, False if cached results should be used. |
Source code in src/anomaly_detection/anomaly_utils.py
save_outlier_detection_dataframe_to_mlflow
¶
save_outlier_detection_dataframe_to_mlflow(
df: DataFrame,
experiment_name: str,
_previous_experiment_name: str,
cfg: DictConfig,
copy_orig_db: bool = False,
) -> None
Save outlier detection results as a DuckDB database to MLflow.
Exports the dataframe to DuckDB format and logs it as an MLflow artifact for later retrieval and analysis.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Polars DataFrame containing outlier detection results.
TYPE:
|
experiment_name
|
MLflow experiment name for logging.
TYPE:
|
_previous_experiment_name
|
Name of the previous experiment (currently unused, for future reference).
TYPE:
|
cfg
|
Hydra configuration.
TYPE:
|
copy_orig_db
|
Whether to copy the original database. Default is False.
TYPE:
|
Notes
TODO: Not needed for outlier detection as is, but could be useful for saving results as DuckDB for easy inspection without re-running.
Source code in src/anomaly_detection/anomaly_utils.py
get_best_run
¶
Get the best (first) run from an MLflow experiment.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
Name of the MLflow experiment to search.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Series
|
The first run found in the experiment. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If no runs are found in the experiment. |
Notes
Currently picks the first run found (often the only one). Add filters if you have multiple dataset versions or different filter requirements.
Source code in src/anomaly_detection/anomaly_utils.py
log_anomaly_model_as_mlflow_artifact
¶
Log a trained anomaly detection model to MLflow as an artifact.
| PARAMETER | DESCRIPTION |
|---|---|
checkpoint_file
|
Path to the model checkpoint file.
TYPE:
|
run_name
|
Name of the MLflow run (for logging purposes).
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
Exception
|
If the artifact cannot be logged to MLflow. |
Notes
This can be slow for large models (e.g., 1.3GB).
Source code in src/anomaly_detection/anomaly_utils.py
print_available_artifacts
¶
Print available artifacts at the given path.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Full path to an artifact file.
TYPE:
|
Notes
Currently a stub function that extracts directory and filename.
Source code in src/anomaly_detection/anomaly_utils.py
get_artifact
¶
get_artifact(
run_id: str,
run_name: str,
model_name: str,
subdir: str = "outlier_detection",
) -> Optional[str]
Download an artifact from MLflow by run ID and subdirectory.
| PARAMETER | DESCRIPTION |
|---|---|
run_id
|
MLflow run ID.
TYPE:
|
run_name
|
Name of the MLflow run.
TYPE:
|
model_name
|
Name of the model (used for filename generation).
TYPE:
|
subdir
|
Artifact subdirectory: 'outlier_detection', 'model', 'imputation', 'baseline_model', or 'metrics'. Default is 'outlier_detection'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str or None
|
Local path to the downloaded artifact, or None for baseline_model with ensembled input. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If subdir is unknown. |
Exception
|
If artifact cannot be downloaded. |
Source code in src/anomaly_detection/anomaly_utils.py
434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 | |
check_outlier_detection_artifact
¶
Validate the structure of outlier detection artifacts.
| PARAMETER | DESCRIPTION |
|---|---|
outlier_artifacts
|
Dictionary containing outlier detection results with 'outlier_results' key.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
AssertionError
|
If the artifact structure is invalid. |
Source code in src/anomaly_detection/anomaly_utils.py
check_outlier_results
¶
Validate the structure of outlier results dictionary.
| PARAMETER | DESCRIPTION |
|---|---|
outlier_results
|
Dictionary containing per-split outlier detection results.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
AssertionError
|
If the results structure is invalid. |
Source code in src/anomaly_detection/anomaly_utils.py
check_split_results
¶
Validate consistency between flat and array representations.
Ensures that the number of samples in flattened arrays matches the total size of the original arrays.
| PARAMETER | DESCRIPTION |
|---|---|
split_results
|
Dictionary containing 'arrays_flat' and 'arrays' with prediction results.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
AssertionError
|
If sample counts do not match between representations. |
Source code in src/anomaly_detection/anomaly_utils.py
get_no_subjects_in_outlier_artifacts
¶
Get the number of subjects from outlier detection artifacts.
| PARAMETER | DESCRIPTION |
|---|---|
outlier_artifacts
|
Dictionary containing outlier detection results.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
int
|
Number of subjects in the training split. |
Source code in src/anomaly_detection/anomaly_utils.py
outlier_detection_artifacts_dict
¶
outlier_detection_artifacts_dict(
mlflow_run: Series, model_name: str, task: str
) -> Dict[str, Any]
Load outlier detection artifacts from an MLflow run.
| PARAMETER | DESCRIPTION |
|---|---|
mlflow_run
|
MLflow run information containing run_id and run name.
TYPE:
|
model_name
|
Name of the outlier detection model.
TYPE:
|
task
|
Task subdirectory for artifacts.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Loaded outlier detection artifacts dictionary. |
Warnings
Logs a warning if artifact file size exceeds 2GB.
Source code in src/anomaly_detection/anomaly_utils.py
get_moment_model_from_mlflow_artifacts
¶
get_moment_model_from_mlflow_artifacts(
run_id: str,
run_name: str,
model: Module,
device: str,
cfg: DictConfig,
task: str,
model_name: str = "MOMENT",
) -> Module
Load a MOMENT model from MLflow artifacts.
Downloads the model checkpoint and loads it into the provided model object, verifying that the weights have changed from the initial state.
| PARAMETER | DESCRIPTION |
|---|---|
run_id
|
MLflow run ID.
TYPE:
|
run_name
|
Name of the MLflow run.
TYPE:
|
model
|
Model object to load weights into.
TYPE:
|
device
|
Device to load the model onto ('cpu' or 'cuda').
TYPE:
|
cfg
|
Hydra configuration.
TYPE:
|
task
|
Task name (e.g., 'outlier_detection', 'imputation').
TYPE:
|
model_name
|
Name of the model. Default is 'MOMENT'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Module
|
Model with loaded weights. |
| RAISES | DESCRIPTION |
|---|---|
AssertionError
|
If loaded weights are identical to pretrained weights. |
Source code in src/anomaly_detection/anomaly_utils.py
get_anomaly_detection_results_from_mlflow
¶
get_anomaly_detection_results_from_mlflow(
experiment_name: str,
cfg: DictConfig,
run_name: str,
model_name: str,
get_model: bool = False,
) -> Tuple[Dict[str, Any], Optional[Module]]
Retrieve anomaly detection results and optionally the model from MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
MLflow experiment name.
TYPE:
|
cfg
|
Hydra configuration.
TYPE:
|
run_name
|
Name of the MLflow run.
TYPE:
|
model_name
|
Name of the outlier detection model.
TYPE:
|
get_model
|
Whether to also load the trained model. Default is False.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
A tuple containing: - outlier_artifacts : dict Loaded outlier detection artifacts. - model : torch.nn.Module or None The trained model if get_model=True, otherwise None. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If no matching run is found. |
NotImplementedError
|
If get_model=True for finetuned models (not yet implemented). |
Source code in src/anomaly_detection/anomaly_utils.py
get_source_dataframe_from_mlflow
¶
Import anomaly detection results from MLflow as a Polars DataFrame.
Downloads the DuckDB artifact from MLflow and loads it as a DataFrame.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
MLflow experiment name to retrieve data from.
TYPE:
|
cfg
|
Hydra configuration for data loading.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Polars DataFrame containing the outlier-detected data with train and test splits. |
| RAISES | DESCRIPTION |
|---|---|
AssertionError
|
If the DataFrame does not contain exactly 2 splits (train and val). |
Notes
Placeholder task for importing anomaly detection results from MLflow.
Source code in src/anomaly_detection/anomaly_utils.py
Traditional Methods¶
outlier_sklearn
¶
subjectwise_LOF
¶
Apply Local Outlier Factor to a single subject's data.
| PARAMETER | DESCRIPTION |
|---|---|
X_subj
|
Time series data for one subject, shape (n_timepoints, 1).
TYPE:
|
clf
|
Configured LOF classifier instance.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
Binary outlier predictions (1 = outlier, 0 = inlier). |
Source code in src/anomaly_detection/outlier_sklearn.py
subjectwise_HPO
¶
Apply hyperparameter-optimized outlier detection subject by subject.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Input data array of shape (n_subjects, n_timepoints).
TYPE:
|
clf
|
Configured outlier detection classifier.
TYPE:
|
model_name
|
Name of the model ('LOF' currently supported).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
Binary outlier predictions of shape (n_subjects, n_timepoints). |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If model_name is not supported. |
Source code in src/anomaly_detection/outlier_sklearn.py
datasetwise_HPO
¶
Apply outlier detection on the entire dataset at once.
Flattens all subjects' data and fits a single model, then reshapes predictions back to original shape.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Input data array of shape (n_subjects, n_timepoints).
TYPE:
|
clf
|
Configured outlier detection classifier.
TYPE:
|
model_name
|
Name of the model ('LOF' currently supported).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
Binary outlier predictions of shape (n_subjects, n_timepoints). |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If model_name is not supported. |
Source code in src/anomaly_detection/outlier_sklearn.py
get_LOF
¶
get_LOF(
X: ndarray,
params: Dict[str, Any],
subjectwise: bool = True,
) -> Tuple[ndarray, LocalOutlierFactor]
Run Local Outlier Factor outlier detection.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Input data array of shape (n_subjects, n_timepoints).
TYPE:
|
params
|
Parameters for LocalOutlierFactor (n_neighbors, contamination, etc.).
TYPE:
|
subjectwise
|
If True, fit LOF independently for each subject. If False, fit on all data at once. Default is True (more relevant for deployment).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
A tuple containing: - preds : np.ndarray Binary outlier predictions of shape (n_subjects, n_timepoints). - clf : LocalOutlierFactor The configured LOF classifier instance. |
Source code in src/anomaly_detection/outlier_sklearn.py
get_outlier_y_from_data_dict
¶
Extract outlier labels from data dictionary by difficulty level.
| PARAMETER | DESCRIPTION |
|---|---|
data_dict
|
Data dictionary containing labels for each split.
TYPE:
|
label
|
Difficulty level: 'all', 'granular', 'easy', or 'medium'.
TYPE:
|
split
|
Data split: 'train' or 'test'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
Boolean outlier mask array. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If label is not a recognized difficulty level. |
Source code in src/anomaly_detection/outlier_sklearn.py
eval_on_all_outlier_difficulty_levels
¶
eval_on_all_outlier_difficulty_levels(
data_dict: Dict[str, Any], preds: ndarray, split: str
) -> Dict[str, Dict[str, Any]]
Evaluate outlier detection across all difficulty levels.
| PARAMETER | DESCRIPTION |
|---|---|
data_dict
|
Data dictionary containing labels for each split.
TYPE:
|
preds
|
Binary outlier predictions.
TYPE:
|
split
|
Data split: 'train' or 'test'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with metrics for each difficulty level ('all', 'easy', 'medium'). Each entry contains 'scalars' (metrics) and 'arrays' (predictions, labels). |
Source code in src/anomaly_detection/outlier_sklearn.py
get_outlier_metrics
¶
get_outlier_metrics(
score: Optional[ndarray],
preds: ndarray,
y: ndarray,
df: Optional[DataFrame] = None,
cfg: Optional[DictConfig] = None,
split: Optional[str] = None,
) -> Dict[str, Any]
Compute outlier detection metrics.
| PARAMETER | DESCRIPTION |
|---|---|
score
|
Anomaly scores (if available).
TYPE:
|
preds
|
Binary outlier predictions.
TYPE:
|
y
|
Ground truth outlier labels.
TYPE:
|
df
|
Original DataFrame (for future use). Default is None.
TYPE:
|
cfg
|
Configuration (for future use). Default is None.
TYPE:
|
split
|
Data split name (for future use). Default is None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary containing: - 'scalars': Scalar metrics (F1, precision, recall, etc.) - 'arrays': Prediction mask and true labels. |
Source code in src/anomaly_detection/outlier_sklearn.py
LOF_wrapper
¶
LOF_wrapper(
X: ndarray,
y: ndarray,
X_test: ndarray,
y_test: ndarray,
best_params: Dict[str, Any],
) -> Dict[str, Dict[str, Any]]
Run LOF outlier detection on train and test splits.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Training data of shape (n_subjects, n_timepoints).
TYPE:
|
y
|
Training labels (outlier mask).
TYPE:
|
X_test
|
Test data of shape (n_subjects, n_timepoints).
TYPE:
|
y_test
|
Test labels (outlier mask).
TYPE:
|
best_params
|
Optimized LOF parameters (n_neighbors, contamination).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Metrics dictionary with keys 'train', 'test', 'outlier_train', 'outlier_test'. |
Source code in src/anomaly_detection/outlier_sklearn.py
sklearn_outlier_hyperparameter_tuning
¶
sklearn_outlier_hyperparameter_tuning(
model_cfg: DictConfig,
X: ndarray,
y: ndarray,
model_name: str,
contamination: float,
subjectwise: bool = True,
) -> Dict[str, int]
Perform grid search hyperparameter tuning for sklearn outlier detectors.
| PARAMETER | DESCRIPTION |
|---|---|
model_cfg
|
Model configuration containing search space parameters.
TYPE:
|
X
|
Training data of shape (n_subjects, n_timepoints).
TYPE:
|
y
|
Ground truth outlier labels.
TYPE:
|
model_name
|
Name of the model ('LOF' currently supported).
TYPE:
|
contamination
|
Expected proportion of outliers in the data.
TYPE:
|
subjectwise
|
Whether to fit model per subject. Default is True.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Best hyperparameters found (e.g., {'n_neighbors': 200}). |
| RAISES | DESCRIPTION |
|---|---|
NotImplementedError
|
If model_name is 'OneClassSVM' (not yet implemented). |
ValueError
|
If model_name is not supported. |
Source code in src/anomaly_detection/outlier_sklearn.py
subjectwise_OneClassSVM
¶
Apply One-Class SVM outlier detection subject by subject.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Input data of shape (n_subjects, n_timepoints).
TYPE:
|
params
|
Parameters for OneClassSVM (gamma, kernel, nu).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
Binary outlier predictions of shape (n_subjects, n_timepoints). |
Source code in src/anomaly_detection/outlier_sklearn.py
OneClassSVM_wrapper
¶
OneClassSVM_wrapper(
X: ndarray,
y: ndarray,
X_test: ndarray,
y_test: ndarray,
params: Dict[str, Any],
) -> Dict[str, Dict[str, Any]]
Run One-Class SVM outlier detection on train and test splits.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Training data of shape (n_subjects, n_timepoints).
TYPE:
|
y
|
Training labels (outlier mask).
TYPE:
|
X_test
|
Test data of shape (n_subjects, n_timepoints).
TYPE:
|
y_test
|
Test labels (outlier mask).
TYPE:
|
params
|
OneClassSVM parameters (gamma, kernel, nu).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Metrics dictionary with keys 'train', 'test', 'outlier_train', 'outlier_test'. |
Source code in src/anomaly_detection/outlier_sklearn.py
mlflow_log_params
¶
Log parameters to MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
params
|
Dictionary of parameter names and values to log.
TYPE:
|
Notes
Logs a warning if a parameter cannot be logged.
Source code in src/anomaly_detection/outlier_sklearn.py
log_outlier_pickled_artifact
¶
Save and log outlier detection results as a pickled artifact to MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
metrics
|
Outlier detection metrics and predictions to save.
TYPE:
|
model_name
|
Name of the model (used for filename).
TYPE:
|
Source code in src/anomaly_detection/outlier_sklearn.py
log_prophet_model
¶
Log a Prophet model to MLflow (placeholder).
| PARAMETER | DESCRIPTION |
|---|---|
model
|
The Prophet model to log.
TYPE:
|
model_name
|
Name of the model.
TYPE:
|
Notes
Currently a stub function; model logging not implemented.
Source code in src/anomaly_detection/outlier_sklearn.py
log_outlier_mlflow_artifacts
¶
log_outlier_mlflow_artifacts(
metrics: Dict[str, Dict[str, Any]],
model: Optional[Any],
model_name: str,
) -> None
Log outlier detection metrics and artifacts to MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
metrics
|
Dictionary containing metrics for each split with 'scalars' and 'arrays'.
TYPE:
|
model
|
The trained model (if applicable).
TYPE:
|
model_name
|
Name of the model.
TYPE:
|
Notes
Logs scalar metrics to MLflow. For array values (confidence intervals), logs separate _lo and _hi metrics.
Source code in src/anomaly_detection/outlier_sklearn.py
outlier_sklearn_wrapper
¶
outlier_sklearn_wrapper(
df: DataFrame,
cfg: DictConfig,
model_cfg: DictConfig,
experiment_name: str,
run_name: str,
model_name: str,
) -> Tuple[Dict[str, Dict[str, Any]], None]
Run sklearn-based outlier detection with hyperparameter tuning.
Supports Local Outlier Factor (LOF) and One-Class SVM methods. For LOF, performs grid search hyperparameter tuning on n_neighbors.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Input PLR data containing pupil signals.
TYPE:
|
cfg
|
Full Hydra configuration.
TYPE:
|
model_cfg
|
Model-specific configuration.
TYPE:
|
experiment_name
|
MLflow experiment name.
TYPE:
|
run_name
|
MLflow run name.
TYPE:
|
model_name
|
Name of the model: 'LOF' or 'OneClassSVM'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
A tuple containing: - metrics : dict Outlier detection metrics for train and test splits. - model : None No model object returned for sklearn methods. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If model_name is not supported. |
References
Hyperparameter tuning approach: https://github.com/vsatyakumar/automatic-local-outlier-factor-tuning https://arxiv.org/abs/1902.00567
Source code in src/anomaly_detection/outlier_sklearn.py
563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 | |
outlier_prophet
¶
create_ds
¶
Create datetime series for Prophet from sample indices.
Converts sample indices to datetime objects based on the sampling rate, which Prophet requires for time series modeling.
| PARAMETER | DESCRIPTION |
|---|---|
X_subj
|
Subject data array (used for length).
TYPE:
|
fps
|
Sampling rate in frames per second. Default is 30.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list
|
List of datetime objects representing each timepoint. |
Source code in src/anomaly_detection/outlier_prophet.py
reject_outliers
¶
Identify outliers based on Prophet prediction uncertainty.
Points where the prediction error exceeds a factor of the uncertainty interval are flagged as outliers.
| PARAMETER | DESCRIPTION |
|---|---|
pred
|
Prophet prediction DataFrame with columns 'y', 'yhat', 'yhat_upper', 'yhat_lower'.
TYPE:
|
model_cfg
|
Model configuration containing 'uncertainty_factor'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
A tuple containing: - y : np.ndarray Original values with outliers set to NaN. - pred_mask : np.ndarray Binary mask (1 = outlier, 0 = inlier). |
References
https://medium.com/@reza.rajabi/outlier-and-anomaly-detection-using-facebook-prophet-in-python-3a83d58b1bdf
Source code in src/anomaly_detection/outlier_prophet.py
pad_input
¶
Pad input array by duplicating the last value.
| PARAMETER | DESCRIPTION |
|---|---|
X_subj
|
Input array of shape (n_timepoints, 1).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
ndarray
|
Padded array of shape (n_timepoints + 1, 1). |
Source code in src/anomaly_detection/outlier_prophet.py
plot_fitted_model
¶
Display Prophet model fit visualization.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
Fitted Prophet model.
TYPE:
|
pred
|
Prophet prediction DataFrame.
TYPE:
|
Source code in src/anomaly_detection/outlier_prophet.py
create_prophet_df
¶
Create Prophet-compatible DataFrame from time series data.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Input data array of shape (n_timepoints, 1).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with columns 'ds' (datetime) and 'y' (values). |
| RAISES | DESCRIPTION |
|---|---|
NotImplementedError
|
If X has more than one channel (multivariate not supported). |
Source code in src/anomaly_detection/outlier_prophet.py
get_changepoints_from_light
¶
Extract changepoints from light stimulus timing for Prophet.
Identifies key physiological events (light onsets/offsets and maximum constriction points) to use as manual changepoints in Prophet.
| PARAMETER | DESCRIPTION |
|---|---|
light
|
Light stimulus data with 'Red' and 'Blue' channels.
TYPE:
|
df
|
Prophet DataFrame with 'ds' and 'y' columns.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Series
|
Sorted datetime changepoints including light onsets/offsets and maximum constriction times. |
References
https://facebook.github.io/prophet/docs/trend_changepoints.html https://github.com/facebook/prophet/issues/697
Source code in src/anomaly_detection/outlier_prophet.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 | |
add_manual_changepoints
¶
add_manual_changepoints(
auto_changepoints: Series,
changepoints: List[Any],
df: DataFrame,
) -> Series
Combine automatic and manual changepoints.
| PARAMETER | DESCRIPTION |
|---|---|
auto_changepoints
|
Changepoints automatically detected by Prophet.
TYPE:
|
changepoints
|
Manually specified changepoints.
TYPE:
|
df
|
Prophet DataFrame to match timestamps.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Series
|
Combined unique changepoints. |
Source code in src/anomaly_detection/outlier_prophet.py
get_prophet_model
¶
Fit a Prophet model with optional manual changepoints.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Prophet DataFrame with 'ds' and 'y' columns.
TYPE:
|
light
|
Light stimulus data for manual changepoint extraction.
TYPE:
|
model_cfg
|
Model configuration with 'changepoint_prior_scale' and 'manual_changepoints' settings.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Prophet
|
Fitted Prophet model. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If manual_changepoints method is not recognized. |
Notes
If manual_changepoints is 'light', extracts changepoints from light stimulus timing and refits the model with combined changepoints.
Source code in src/anomaly_detection/outlier_prophet.py
prophet_per_X
¶
prophet_per_X(
X: ndarray,
y: ndarray,
light: Dict[str, Any],
model_cfg: DictConfig,
model: Optional[Prophet] = None,
plot: bool = False,
) -> Tuple[ndarray, ndarray, int, Prophet]
Run Prophet outlier detection on a single time series.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Input time series of shape (n_timepoints, 1).
TYPE:
|
y
|
Ground truth labels (not used in detection, for consistency).
TYPE:
|
light
|
Light stimulus data for changepoint extraction.
TYPE:
|
model_cfg
|
Model configuration.
TYPE:
|
model
|
Pre-fitted model to use. If None, fits a new model. Default is None.
TYPE:
|
plot
|
Whether to display the fit visualization. Default is False.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
A tuple containing: - y : np.ndarray Original values with outliers set to NaN. - pred_mask : np.ndarray Binary outlier mask. - no_outliers : int Number of detected outliers. - model : Prophet The fitted Prophet model. |
Source code in src/anomaly_detection/outlier_prophet.py
prophet_per_split
¶
prophet_per_split(
X: ndarray,
y: ndarray,
light: Dict[str, Any],
model_cfg: DictConfig,
model: Optional[Prophet] = None,
) -> Tuple[ndarray, ndarray, ndarray]
Run Prophet outlier detection on all subjects in a split.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Input data of shape (n_subjects, n_timepoints).
TYPE:
|
y
|
Ground truth labels of shape (n_subjects, n_timepoints).
TYPE:
|
light
|
Light stimulus data.
TYPE:
|
model_cfg
|
Model configuration.
TYPE:
|
model
|
Pre-fitted model to use for all subjects. Default is None.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
A tuple containing: - X_cleaned : np.ndarray Cleaned signals with outliers as NaN. - pred_masks : np.ndarray Binary outlier masks for all subjects. - no_outliers_per_subject : np.ndarray Number of outliers detected per subject. |
Source code in src/anomaly_detection/outlier_prophet.py
prophet_dataset_per_split
¶
prophet_dataset_per_split(
X: ndarray,
y: ndarray,
X_test: ndarray,
y_test: ndarray,
light: Dict[str, Any],
model_cfg: DictConfig,
) -> Tuple[ndarray, int, ndarray, ndarray, Prophet]
Train Prophet on training data and apply to test set.
| PARAMETER | DESCRIPTION |
|---|---|
X
|
Training data.
TYPE:
|
y
|
Training labels.
TYPE:
|
X_test
|
Test data.
TYPE:
|
y_test
|
Test labels.
TYPE:
|
light
|
Light stimulus data.
TYPE:
|
model_cfg
|
Model configuration.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
A tuple containing: - pred_mask : np.ndarray Training outlier mask. - no_outliers : int Number of training outliers. - pred_masks_test : np.ndarray Test outlier masks. - no_outliers_test : np.ndarray Number of outliers per test subject. - model : Prophet The trained Prophet model. |
Source code in src/anomaly_detection/outlier_prophet.py
outlier_prophet_wrapper
¶
outlier_prophet_wrapper(
df: DataFrame,
cfg: DictConfig,
model_cfg: DictConfig,
experiment_name: str,
run_name: str,
) -> Tuple[Dict[str, Dict[str, Any]], Optional[Prophet]]
Run Prophet-based outlier detection on PLR data.
Uses Facebook Prophet to model the time series trend and identifies outliers based on prediction uncertainty intervals.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Input PLR data.
TYPE:
|
cfg
|
Full Hydra configuration.
TYPE:
|
model_cfg
|
Prophet model configuration.
TYPE:
|
experiment_name
|
MLflow experiment name.
TYPE:
|
run_name
|
MLflow run name.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
A tuple containing: - metrics : dict Outlier detection metrics for train and test splits. - model : Prophet or None The trained Prophet model (None for per_subject mode). |
| RAISES | DESCRIPTION |
|---|---|
NotImplementedError
|
If train_method is 'datasetwise' (not yet implemented). |
ValueError
|
If train_method is not recognized. |
Source code in src/anomaly_detection/outlier_prophet.py
611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 | |
TimesNet Integration¶
timesnet_wrapper
¶
log_mlflow_params
¶
Log TimesNet model parameters to MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
TimesNet model.
TYPE:
|
outlier_model_cfg
|
Model configuration containing PARAMS and MODEL sections.
TYPE:
|
Source code in src/anomaly_detection/timesnet_wrapper.py
log_timesnet_mlflow_metrics
¶
Log TimesNet training metrics to MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
metrics
|
Dictionary of metrics per split containing 'global' scalar values.
TYPE:
|
results_best
|
Best epoch results containing losses per split.
TYPE:
|
best_epoch
|
Index of the best training epoch.
TYPE:
|
Source code in src/anomaly_detection/timesnet_wrapper.py
timesnet_outlier_wrapper
¶
timesnet_outlier_wrapper(
df: DataFrame,
cfg: DictConfig,
outlier_model_cfg: DictConfig,
experiment_name: str,
run_name: str,
task: str = "outlier_detection",
model_name: str = "TimesNet",
) -> tuple[dict, Module]
Run TimesNet-based outlier detection on PLR data.
TimesNet uses temporal 2D variations for time series analysis. This wrapper handles data preparation, training, and MLflow logging.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Input PLR data.
TYPE:
|
cfg
|
Full Hydra configuration.
TYPE:
|
outlier_model_cfg
|
TimesNet model configuration.
TYPE:
|
experiment_name
|
MLflow experiment name.
TYPE:
|
run_name
|
MLflow run name.
TYPE:
|
task
|
Task name. Default is 'outlier_detection'.
TYPE:
|
model_name
|
Model name for logging. Default is 'TimesNet'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
A tuple containing: - outlier_artifacts : dict Dictionary with metrics, predictions, and metadata. - model : torch.nn.Module The trained TimesNet model. |
References
"4.3 Anomaly Detection" https://github.com/thuml/Time-Series-Library/blob/main/tutorial/TimesNet_tutorial.ipynb
Source code in src/anomaly_detection/timesnet_wrapper.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 | |
Logging¶
log_anomaly_detection
¶
log_anomaly_metrics
¶
Log outlier detection metrics to MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
metrics
|
Dictionary of metrics per split containing 'scalars' with 'global' values.
TYPE:
|
cfg
|
Hydra configuration (currently unused).
TYPE:
|
Notes
For array values (confidence intervals), logs separate _lo and _hi metrics.
Source code in src/anomaly_detection/log_anomaly_detection.py
log_losses
¶
Log reconstruction losses to MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
best_outlier_results
|
Results from the best epoch containing per-split loss arrays.
TYPE:
|
cfg
|
Hydra configuration (currently unused).
TYPE:
|
best_epoch
|
Index of the best training epoch.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If no losses are found for a split. |
Source code in src/anomaly_detection/log_anomaly_detection.py
log_anomaly_predictions
¶
Log outlier detection predictions to MLflow as CSV files.
| PARAMETER | DESCRIPTION |
|---|---|
model_name
|
Name of the model for filename generation.
TYPE:
|
preds
|
Dictionary of predictions per split containing 'arrays'.
TYPE:
|
cfg
|
Hydra configuration (currently unused).
TYPE:
|
transpose
|
Whether to transpose arrays before saving. Default is True.
TYPE:
|
Source code in src/anomaly_detection/log_anomaly_detection.py
check_debug_n_subjects_outlier_artifacts
¶
Validate subject count in debug mode.
| PARAMETER | DESCRIPTION |
|---|---|
outlier_artifacts
|
Outlier detection artifacts to validate.
TYPE:
|
cfg
|
Hydra configuration with debug settings.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If subject count doesn't match expected debug count. |
Source code in src/anomaly_detection/log_anomaly_detection.py
log_outlier_artifacts_dict
¶
Save and log outlier detection artifacts to MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
model_name
|
Name of the model for filename generation.
TYPE:
|
outlier_artifacts
|
Complete outlier detection results to save.
TYPE:
|
cfg
|
Hydra configuration.
TYPE:
|
checks_on
|
Whether to run validation checks. Default is True.
TYPE:
|
Source code in src/anomaly_detection/log_anomaly_detection.py
log_anomaly_detection_to_mlflow
¶
log_anomaly_detection_to_mlflow(
model_name: str,
run_name: str,
outlier_artifacts: dict,
cfg: DictConfig,
)
Log complete anomaly detection results to MLflow.
Orchestrates logging of metrics, losses, predictions, and artifacts.
| PARAMETER | DESCRIPTION |
|---|---|
model_name
|
Name of the model.
TYPE:
|
run_name
|
MLflow run name.
TYPE:
|
outlier_artifacts
|
Complete outlier detection results containing: - 'metadata': with 'best_epoch' - 'outlier_results': per-epoch results - 'metrics': evaluation metrics - 'preds': predictions
TYPE:
|
cfg
|
Hydra configuration.
TYPE:
|
Notes
Ends the MLflow run after logging all artifacts.