log_helpers¶
Logging, MLflow, and utility functions.
Overview¶
Centralized utilities for:
- MLflow experiment tracking
- Hydra configuration
- Artifact management
- System utilities
MLflow Integration¶
mlflow_utils
¶
init_mlflow
¶
Initialize MLflow tracking URI from configuration.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Configuration containing SERVICES.mlflow_tracking_uri.
TYPE:
|
Notes
If no URI is specified, MLflow uses a local 'mlruns' directory.
Source code in src/log_helpers/mlflow_utils.py
init_mlflow_experiment
¶
init_mlflow_experiment(
mlflow_cfg: Optional[DictConfig] = None,
experiment_name: str = "PLR_imputation",
override_default_location: bool = False,
_permanent_delete: bool = True,
) -> None
Initialize or get an MLflow experiment.
| PARAMETER | DESCRIPTION |
|---|---|
mlflow_cfg
|
MLflow configuration (currently unused).
TYPE:
|
experiment_name
|
Name of the experiment to create/get.
TYPE:
|
override_default_location
|
If True, use custom artifact location.
TYPE:
|
_permanent_delete
|
Permanent deletion flag (currently unused).
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
Exception
|
If experiment creation fails (e.g., permission issues). |
Source code in src/log_helpers/mlflow_utils.py
set_artifact_store_location
¶
Set MLflow artifact store location.
Currently a placeholder for future remote storage (e.g., S3) configuration.
| RETURNS | DESCRIPTION |
|---|---|
None
|
No artifact store location is set currently. |
Source code in src/log_helpers/mlflow_utils.py
init_mlflow_run
¶
init_mlflow_run(
mlflow_cfg: DictConfig,
run_name: str,
cfg: DictConfig,
experiment_name: str,
) -> None
Start a new MLflow run.
| PARAMETER | DESCRIPTION |
|---|---|
mlflow_cfg
|
MLflow configuration with 'log_system_metrics' flag.
TYPE:
|
run_name
|
Name for the MLflow run.
TYPE:
|
cfg
|
Full Hydra configuration to log.
TYPE:
|
experiment_name
|
Name of the MLflow experiment.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
Exception
|
If run creation fails. |
Source code in src/log_helpers/mlflow_utils.py
log_hydra_cfg_to_mlflow
¶
Log Hydra configuration to MLflow as a YAML artifact.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Hydra configuration to log.
TYPE:
|
Source code in src/log_helpers/mlflow_utils.py
get_mlflow_info
¶
Get current MLflow run information as a dictionary.
Collects tags, run info, and experiment info from the active MLflow run. Useful for storing MLflow metadata alongside model artifacts for later reference when logging metrics or additional artifacts.
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with 'run_tags', 'run_info', and 'experiment' keys. |
Source code in src/log_helpers/mlflow_utils.py
log_metrics_as_mlflow_artifact
¶
log_metrics_as_mlflow_artifact(
metrics_subjectwise: Dict[str, Any],
model_name: str,
model_artifacts: Dict[str, Any],
cfg: DictConfig,
) -> None
Log subject-wise metrics as a pickled MLflow artifact.
| PARAMETER | DESCRIPTION |
|---|---|
metrics_subjectwise
|
Dictionary containing per-subject metrics.
TYPE:
|
model_name
|
Name of the model for filename generation.
TYPE:
|
model_artifacts
|
Model artifacts containing MLflow info.
TYPE:
|
cfg
|
Configuration object (currently unused).
TYPE:
|
Source code in src/log_helpers/mlflow_utils.py
mlflow_imputation_metrics_logger
¶
Log global imputation metrics to MLflow.
Handles both scalar metrics and array metrics (e.g., confidence intervals).
| PARAMETER | DESCRIPTION |
|---|---|
metrics_global
|
Dictionary of metric names to values.
TYPE:
|
split
|
Data split name for metric naming.
TYPE:
|
Source code in src/log_helpers/mlflow_utils.py
log_mlflow_imputation_metrics
¶
log_mlflow_imputation_metrics(
metrics_global: Dict[str, Any],
model_name: str,
split: str,
model_artifacts: Dict[str, Any],
cfg: DictConfig,
) -> None
Log imputation metrics and Hydra log to MLflow for an existing run.
| PARAMETER | DESCRIPTION |
|---|---|
metrics_global
|
Global metrics dictionary.
TYPE:
|
model_name
|
Name of the imputation model (currently unused).
TYPE:
|
split
|
Data split name.
TYPE:
|
model_artifacts
|
Model artifacts with MLflow info.
TYPE:
|
cfg
|
Configuration object (currently unused).
TYPE:
|
Source code in src/log_helpers/mlflow_utils.py
log_system_params_to_mlflow
¶
Log system parameters (hardware, library versions) to MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
prefix
|
Prefix for parameter names in MLflow.
TYPE:
|
Source code in src/log_helpers/mlflow_utils.py
log_mlflow_params
¶
log_mlflow_params(
mlflow_params: Dict[str, Any],
model_name: Optional[str] = None,
run_name: Optional[str] = None,
) -> None
Log model parameters and system info to MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
mlflow_params
|
Dictionary of parameters to log.
TYPE:
|
model_name
|
Model name to log as 'model' parameter.
TYPE:
|
run_name
|
Run name (currently unused).
TYPE:
|
Source code in src/log_helpers/mlflow_utils.py
save_pypots_model_to_mlflow
¶
save_pypots_model_to_mlflow(
entry: DirEntry,
model: Any,
cfg: DictConfig,
as_artifact: bool = False,
) -> None
Save PyPOTS model to MLflow as artifact or registered model.
| PARAMETER | DESCRIPTION |
|---|---|
entry
|
Directory entry for the model file.
TYPE:
|
model
|
PyPOTS model object.
TYPE:
|
cfg
|
Configuration object.
TYPE:
|
as_artifact
|
If True, log as simple artifact; if False, use MLflow model logging.
TYPE:
|
Source code in src/log_helpers/mlflow_utils.py
mlflow_log_pytorch_model
¶
Log PyTorch model to MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
PyTorch model to log.
TYPE:
|
path
|
Artifact path for the model.
TYPE:
|
cfg
|
Configuration object (currently unused).
TYPE:
|
Notes
This is a basic implementation without model signature. PyPOTS models may require special handling as they are not standard torch.nn.Module.
Source code in src/log_helpers/mlflow_utils.py
pytpots_artifact_wrapper
¶
pytpots_artifact_wrapper(
pypots_dir: str,
model: Any,
cfg: DictConfig,
model_ext: str = ".pypots",
as_artifact: bool = True,
) -> None
Log all PyPOTS artifacts from a directory to MLflow.
Iterates through the PyPOTS output directory and logs directories, model files, and other artifacts appropriately.
| PARAMETER | DESCRIPTION |
|---|---|
pypots_dir
|
Path to PyPOTS output directory.
TYPE:
|
model
|
PyPOTS model object.
TYPE:
|
cfg
|
Configuration object.
TYPE:
|
model_ext
|
File extension for model files.
TYPE:
|
as_artifact
|
If True, log model as artifact; if False, use MLflow model logging.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
Exception
|
If artifact logging fails. |
Source code in src/log_helpers/mlflow_utils.py
log_mlflow_artifacts_after_pypots_model_train
¶
log_mlflow_artifacts_after_pypots_model_train(
results_path: str,
pypots_dir: str,
model: Any,
cfg: DictConfig,
) -> None
Log results and PyPOTS artifacts to MLflow after training.
| PARAMETER | DESCRIPTION |
|---|---|
results_path
|
Path to results pickle file.
TYPE:
|
pypots_dir
|
Path to PyPOTS output directory.
TYPE:
|
model
|
PyPOTS model object.
TYPE:
|
cfg
|
Configuration object.
TYPE:
|
Source code in src/log_helpers/mlflow_utils.py
log_imputation_db_to_mlflow
¶
log_imputation_db_to_mlflow(
db_path: str,
mlflow_cfg: Dict[str, Any],
model: str,
cfg: DictConfig,
) -> None
Log imputation DuckDB database to MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
db_path
|
Path to DuckDB file.
TYPE:
|
mlflow_cfg
|
MLflow configuration with run_info.
TYPE:
|
model
|
Model name (currently unused).
TYPE:
|
cfg
|
Configuration object (currently unused).
TYPE:
|
Source code in src/log_helpers/mlflow_utils.py
post_imputation_model_training_mlflow_log
¶
post_imputation_model_training_mlflow_log(
metrics_model: Dict[str, Any],
model_artifacts: Dict[str, Any],
cfg: DictConfig,
) -> None
Check if current model improved over previous best and log accordingly.
Compares current model metrics against previously logged best model and logs to MLflow Model Registry if improved.
| PARAMETER | DESCRIPTION |
|---|---|
metrics_model
|
Current model metrics.
TYPE:
|
model_artifacts
|
Model artifacts with MLflow info.
TYPE:
|
cfg
|
Configuration object.
TYPE:
|
Source code in src/log_helpers/mlflow_utils.py
check_if_improved_with_direction
¶
check_if_improved_with_direction(
metric_string: str,
metric_direction: str,
current_metric_value: float,
best_metric_value: float,
) -> bool
Check if current metric is better than previous best based on direction.
| PARAMETER | DESCRIPTION |
|---|---|
metric_string
|
Name of the metric for logging.
TYPE:
|
metric_direction
|
'ASC' if lower is better, 'DESC' if higher is better.
TYPE:
|
current_metric_value
|
Current model's metric value.
TYPE:
|
best_metric_value
|
Previous best metric value.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if current is better than previous best. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If metric_direction is not 'ASC' or 'DESC'. |
Source code in src/log_helpers/mlflow_utils.py
is_current_better_than_previous
¶
is_current_better_than_previous(
metrics_model: Dict[str, Any],
model_dict: Dict[str, Any],
best_previous_run: Dict[str, Any],
cfg: DictConfig,
) -> bool
Determine if current model outperforms the previous best.
| PARAMETER | DESCRIPTION |
|---|---|
metrics_model
|
Current model metrics.
TYPE:
|
model_dict
|
Model artifacts with MLflow info.
TYPE:
|
best_previous_run
|
Previous best run data.
TYPE:
|
cfg
|
Configuration object.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if current model is better. |
Source code in src/log_helpers/mlflow_utils.py
mlflow_artifacts
¶
get_mlflow_run_ids_from_imputation_artifacts
¶
get_mlflow_run_ids_from_imputation_artifacts(
imputation_artifacts: Dict[str, Any],
) -> Dict[str, str]
Extract MLflow run IDs from imputation artifacts dictionary.
| PARAMETER | DESCRIPTION |
|---|---|
imputation_artifacts
|
Dictionary containing 'artifacts' key with model-specific MLflow info.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Mapping of model names to their MLflow run IDs. |
Source code in src/log_helpers/mlflow_artifacts.py
get_mlflow_metric_params
¶
get_mlflow_metric_params(
metrics: Dict[str, Any],
cfg: DictConfig,
splitkey: str = "gt",
metrictype: str = "global",
metricname: str = "mae",
) -> Dict[str, Any]
Extract specific metric parameters from nested metrics dictionary for MLflow logging.
Filters metrics by split key, metric type, and metric name to keep the MLflow dashboard clean while still allowing programmatic access to all metrics.
| PARAMETER | DESCRIPTION |
|---|---|
metrics
|
Nested metrics dictionary with structure: {model_name: {split: {split_key: {metric_type: {metric: value}}}}}.
TYPE:
|
cfg
|
Configuration object (currently unused).
TYPE:
|
splitkey
|
Split key to filter (e.g., 'gt' for ground truth).
TYPE:
|
metrictype
|
Metric type to filter (e.g., 'global', 'per_subject').
TYPE:
|
metricname
|
Specific metric name to extract.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with model name and filtered metrics suitable for MLflow logging. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If more than one model is found in the metrics dictionary. |
Source code in src/log_helpers/mlflow_artifacts.py
get_mlflow_params
¶
Extract and set MLflow experiment and run ID from info dictionary.
| PARAMETER | DESCRIPTION |
|---|---|
mlflow_info
|
Dictionary containing 'experiment' and 'run_info' keys with MLflow metadata.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple of str
|
Tuple of (experiment_id, run_id). |
Notes
Also sets the MLflow experiment as a side effect.
Source code in src/log_helpers/mlflow_artifacts.py
get_mlflow_info_from_model_dict
¶
Extract MLflow info dictionary from model artifacts dictionary.
| PARAMETER | DESCRIPTION |
|---|---|
model_dict
|
Model artifacts dictionary containing 'mlflow' key with run/experiment info.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
MLflow info dictionary with run_info, experiment, and artifact_uri. |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
If 'mlflow' key is missing from model_dict. |
Source code in src/log_helpers/mlflow_artifacts.py
get_duckdb_from_mlflow
¶
Download and locate DuckDB file from MLflow artifacts.
| PARAMETER | DESCRIPTION |
|---|---|
artifact_uri
|
MLflow artifact URI to search.
TYPE:
|
dir_name
|
Directory name within artifacts containing the database.
TYPE:
|
wildcard
|
File extension to match.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Local path to downloaded DuckDB file. |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If no DuckDB artifact is found. |
Source code in src/log_helpers/mlflow_artifacts.py
write_new_col_to_mlflow
¶
Write a new metric column to MLflow runs.
Used for harmonizing column names by writing values under a new metric name.
| PARAMETER | DESCRIPTION |
|---|---|
model_best_runs
|
DataFrame containing run_id and the column to write.
TYPE:
|
col_name
|
Source column name in the DataFrame.
TYPE:
|
col_name_init
|
Target metric name for MLflow (will have 'metrics.' prefix stripped).
TYPE:
|
Source code in src/log_helpers/mlflow_artifacts.py
get_col_for_for_best_anomaly_detection_metric
¶
Get DataFrame column name for best metric based on task type.
| PARAMETER | DESCRIPTION |
|---|---|
best_metric_cfg
|
Configuration with 'string' (metric name) and 'split' keys.
TYPE:
|
task
|
Task type: 'anomaly_detection', 'outlier_detection', or 'imputation'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Column name in format 'metrics.{split}/{metric}' or direct string. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If task type is not recognized. |
Source code in src/log_helpers/mlflow_artifacts.py
harmonize_anomaly_col_name
¶
harmonize_anomaly_col_name(
col_name: str,
model_best_runs: DataFrame,
best_metric_cfg: DictConfig,
model: str,
) -> str
Harmonize metric column name if not found in DataFrame.
Falls back to 'test' split if the specified column is missing, and writes the harmonized values back to MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
col_name
|
Expected column name.
TYPE:
|
model_best_runs
|
DataFrame with MLflow run data.
TYPE:
|
best_metric_cfg
|
Best metric configuration.
TYPE:
|
model
|
Model name for logging.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Harmonized column name that exists in the DataFrame. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If harmonized column contains only NaN values. |
Source code in src/log_helpers/mlflow_artifacts.py
threshold_filter_run
¶
threshold_filter_run(
best_run: Union[Series, DataFrame],
col_name: str,
best_metric_cfg: DictConfig,
) -> Optional[Union[Series, DataFrame]]
Filter run based on ensemble quality threshold.
Returns None if the run's metric does not meet the threshold requirement.
| PARAMETER | DESCRIPTION |
|---|---|
best_run
|
Run data to filter.
TYPE:
|
col_name
|
Column name containing the metric to check.
TYPE:
|
best_metric_cfg
|
Configuration with 'ensemble_quality_threshold' and 'direction' keys.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
pd.Series, pd.DataFrame, or None
|
Original run data if threshold is met, None otherwise. |
Source code in src/log_helpers/mlflow_artifacts.py
get_best_run_of_pd_dataframe
¶
get_best_run_of_pd_dataframe(
model_best_runs: DataFrame,
cfg: DictConfig,
best_metric_cfg: DictConfig,
task: str,
model: str,
include_all_variants: bool = False,
) -> Tuple[
Optional[Union[Series, DataFrame]], Optional[float]
]
Find the best MLflow run from a DataFrame based on metric configuration.
| PARAMETER | DESCRIPTION |
|---|---|
model_best_runs
|
DataFrame containing MLflow runs for the model.
TYPE:
|
cfg
|
Full configuration object.
TYPE:
|
best_metric_cfg
|
Configuration specifying best metric, direction, and threshold.
TYPE:
|
task
|
Task type for determining column name format.
TYPE:
|
model
|
Model name for logging.
TYPE:
|
include_all_variants
|
If True, return all runs sorted; if False, return only the best run.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
Tuple of (best_run, best_metric) where best_run is a Series/DataFrame and best_metric is the metric value (or None if all variants returned). |
Source code in src/log_helpers/mlflow_artifacts.py
get_imputation_results_from_mlflow
¶
get_imputation_results_from_mlflow(
mlflow_run: Series,
model_name: str,
cfg: DictConfig,
dir_name: str = "imputation",
) -> Dict[str, Any]
Download imputation results from MLflow artifact store.
| PARAMETER | DESCRIPTION |
|---|---|
mlflow_run
|
MLflow run data containing run_id and tags.
TYPE:
|
model_name
|
Name of the imputation model.
TYPE:
|
cfg
|
Configuration object (currently unused).
TYPE:
|
dir_name
|
Artifact subdirectory name.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Loaded imputation results dictionary with 'mlflow_run' key added. |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If imputation results cannot be found or downloaded. |
Source code in src/log_helpers/mlflow_artifacts.py
get_mlflow_artifact_uri_from_run
¶
Get artifact URI from MLflow run.
| PARAMETER | DESCRIPTION |
|---|---|
best_run
|
Run data containing 'run_id'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Artifact URI for the run. |
Source code in src/log_helpers/mlflow_artifacts.py
get_best_metric_from_current_run
¶
get_best_metric_from_current_run(
metrics_model: dict, split_key: str, metric_string: str
) -> float
Extract specific metric value from current run's metrics dictionary.
| PARAMETER | DESCRIPTION |
|---|---|
metrics_model
|
Metrics dictionary with structure {split_key: {global: {metric: value}}}.
TYPE:
|
split_key
|
Data split key (e.g., 'test', 'val').
TYPE:
|
metric_string
|
Name of the metric to extract.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
The metric value. |
Source code in src/log_helpers/mlflow_artifacts.py
get_best_previous_mlflow_logged_model
¶
get_best_previous_mlflow_logged_model(
model_dict: Dict[str, Any], cfg: DictConfig
) -> Optional[Dict[str, Any]]
Find the best previously logged MLflow model matching current configuration.
| PARAMETER | DESCRIPTION |
|---|---|
model_dict
|
Model artifacts dictionary containing MLflow info.
TYPE:
|
cfg
|
Configuration for determining search parameters.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Best previous run data, or None if no matching runs found. |
Source code in src/log_helpers/mlflow_artifacts.py
iterate_through_mlflow_run_artifacts
¶
iterate_through_mlflow_run_artifacts(
run_artifacts: List[FileInfo],
fname: str,
run_id: str,
dir_download: str,
artifacts_string: str = "imputation",
) -> Optional[Dict[str, Any]]
Iterate through MLflow artifacts to find and download a specific file.
| PARAMETER | DESCRIPTION |
|---|---|
run_artifacts
|
List of MLflow artifact objects.
TYPE:
|
fname
|
Filename to find and download.
TYPE:
|
run_id
|
MLflow run ID.
TYPE:
|
dir_download
|
Local directory for downloads (currently unused).
TYPE:
|
artifacts_string
|
Artifact path to match.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict or None
|
Loaded results dictionary, or None if not found. |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If the specified artifact cannot be found. |
Source code in src/log_helpers/mlflow_artifacts.py
download_mlflow_artifacts
¶
download_mlflow_artifacts(
run_id: str, fname: str, run_artifacts: List[FileInfo]
) -> Optional[Dict[str, Any]]
Download MLflow artifacts for a specific run.
| PARAMETER | DESCRIPTION |
|---|---|
run_id
|
MLflow run ID.
TYPE:
|
fname
|
Filename to download.
TYPE:
|
run_artifacts
|
List of available artifacts.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Loaded artifacts dictionary. |
Source code in src/log_helpers/mlflow_artifacts.py
retrieve_mlflow_artifacts_from_best_run
¶
retrieve_mlflow_artifacts_from_best_run(
best_run: Dict[str, Any],
cfg: DictConfig,
model_name: str,
) -> Tuple[Dict[str, Any], List[FileInfo]]
Retrieve imputation artifacts from the best MLflow run.
| PARAMETER | DESCRIPTION |
|---|---|
best_run
|
Best run data containing 'run_id'.
TYPE:
|
cfg
|
Configuration object (currently unused).
TYPE:
|
model_name
|
Name of the model for filename generation.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
Tuple of (imputer_artifacts, run_artifacts). |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If no results are found in the best run. |
Source code in src/log_helpers/mlflow_artifacts.py
get_mlflow_artifact_from_run_name
¶
get_mlflow_artifact_from_run_name(
run_name: str, filter_for_finished: bool = True
) -> Optional[Dict[str, str]]
Find MLflow artifact info by run name across all experiments.
| PARAMETER | DESCRIPTION |
|---|---|
run_name
|
Name of the run to find.
TYPE:
|
filter_for_finished
|
If True, only search finished runs.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict or None
|
Dictionary with run_id, experiment_id, and artifact_uri if found. |
Source code in src/log_helpers/mlflow_artifacts.py
return_best_mlflow_run
¶
return_best_mlflow_run(
current_experiment: Dict[str, Any],
metric_string: str,
split_key: str,
metric_direction: str,
run_name: str,
) -> Optional[Dict[str, Any]]
Find the best MLflow run matching the given criteria.
Searches for runs with the specified name, filters out NaN metrics, and returns the best run based on metric direction.
| PARAMETER | DESCRIPTION |
|---|---|
current_experiment
|
Experiment dictionary with 'experiment_id'.
TYPE:
|
metric_string
|
Metric name to optimize.
TYPE:
|
split_key
|
Data split for the metric.
TYPE:
|
metric_direction
|
'ASC' for minimization, 'DESC' for maximization.
TYPE:
|
run_name
|
Exact run name to match.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict or None
|
Best run as dictionary, or None if no valid runs found. |
Source code in src/log_helpers/mlflow_artifacts.py
755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 | |
what_to_search_from_mlflow
¶
what_to_search_from_mlflow(
run_name: str,
cfg: DictConfig,
model_type: Optional[str] = None,
) -> Tuple[
Optional[Dict[str, Any]],
Optional[str],
Optional[str],
Optional[str],
]
Determine MLflow search parameters from run name and configuration.
| PARAMETER | DESCRIPTION |
|---|---|
run_name
|
Name of the MLflow run.
TYPE:
|
cfg
|
Configuration containing IMPUTATION_METRICS settings.
TYPE:
|
model_type
|
Model type (currently unused).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
Tuple of (current_experiment, metric_string, split_key, metric_direction), or (None, None, None, None) if run not found. |
Source code in src/log_helpers/mlflow_artifacts.py
check_if_run_exists
¶
Check if an MLflow run with the given name exists in the experiment.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
Name of the MLflow experiment.
TYPE:
|
run_name
|
Run name to search for.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if run exists, False otherwise. |
Source code in src/log_helpers/mlflow_artifacts.py
Logging¶
log_utils
¶
define_run_name
¶
Define run name from configuration name and version.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Configuration with 'NAME' and 'VERSION' keys.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Run name in format '{name}_v{version}'. |
Source code in src/log_helpers/log_utils.py
define_suffix_to_run_name
¶
Generate suffix for run name based on model name.
| PARAMETER | DESCRIPTION |
|---|---|
model_name
|
Name of the model.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Suffix in format '_{model_name}_ph1'. |
Notes
This is a placeholder implementation.
Source code in src/log_helpers/log_utils.py
update_run_name
¶
Append base run name to existing run name.
| PARAMETER | DESCRIPTION |
|---|---|
run_name
|
Existing run name.
TYPE:
|
base_run_name
|
Base name to append.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Combined run name with underscore separator. |
Source code in src/log_helpers/log_utils.py
setup_loguru
¶
Configure loguru logger for console and file output.
Sets up logging to stderr with color and to a file in the artifacts directory. Removes any existing log file before starting.
| RETURNS | DESCRIPTION |
|---|---|
str
|
Path to the log file. |
Source code in src/log_helpers/log_utils.py
log_loguru_log_to_prefect
¶
Log contents of loguru log file as Prefect markdown artifact.
| PARAMETER | DESCRIPTION |
|---|---|
filepath
|
Path to the log file.
TYPE:
|
description
|
Description for the Prefect artifact.
TYPE:
|
Source code in src/log_helpers/log_utils.py
get_datetime_as_string
¶
Get current datetime as formatted string.
| PARAMETER | DESCRIPTION |
|---|---|
use_gmt_time
|
If True, use UTC time; otherwise use local time.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Datetime string in format 'YYYYMMDD-HHMMSS'. |
Source code in src/log_helpers/log_utils.py
Hydra Utilities¶
hydra_utils
¶
update_hydra_ouput_dir
¶
Generate Hydra CLI argument for custom output directory.
Creates a timestamped output directory path for Hydra runs.
| PARAMETER | DESCRIPTION |
|---|---|
use_gmt_time
|
If True, use GMT time for timestamp (currently unused).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Hydra CLI argument string in format 'hydra.run.dir={path}'. |
Source code in src/log_helpers/hydra_utils.py
get_hydra_output_dir
¶
Get Hydra output directory from runtime config or fallback.
| RETURNS | DESCRIPTION |
|---|---|
str
|
Path to Hydra output directory. |
Notes
Falls back to default artifacts directory if Hydra runtime config is not available (e.g., when using Hydra Compose API).
Source code in src/log_helpers/hydra_utils.py
get_intermediate_hydra_log_path
¶
Get path to intermediate Hydra log file.
| RETURNS | DESCRIPTION |
|---|---|
str or None
|
Path to the log file, or None if not found. |
| RAISES | DESCRIPTION |
|---|---|
NotImplementedError
|
If multiple log files are found in the output directory. |
Source code in src/log_helpers/hydra_utils.py
save_hydra_cfg_as_yaml
¶
Save Hydra configuration as YAML file.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Hydra configuration to save.
TYPE:
|
dir_output
|
Output directory path.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Path to saved YAML file. |
Source code in src/log_helpers/hydra_utils.py
get_cfg_HydraCompose
¶
Load Hydra configuration using Compose API.
Uses the Hydra Compose API instead of the decorator-based approach for more flexible configuration loading.
| PARAMETER | DESCRIPTION |
|---|---|
args
|
Arguments with 'config_file' attribute.
TYPE:
|
config_dir
|
Directory containing configuration files.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DictConfig
|
Loaded Hydra configuration. |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If the configuration file does not exist. |
Source code in src/log_helpers/hydra_utils.py
add_hydra_cli_args
¶
Add Hydra CLI arguments to sys.argv.
Appends config path, config name, and custom output directory arguments to sys.argv for Hydra decorator-based initialization.
| PARAMETER | DESCRIPTION |
|---|---|
args
|
Arguments with 'config_path' and 'config_name' attributes.
TYPE:
|
Source code in src/log_helpers/hydra_utils.py
log_the_hydra_log_as_mlflow_artifact
¶
log_the_hydra_log_as_mlflow_artifact(
hydra_log,
suffix: str = "_train",
intermediate: bool = False,
)
Log Hydra log file as MLflow artifact with optional suffix.
Creates a copy of the log file with a suffix and logs it to MLflow. The copy is removed after logging.
| PARAMETER | DESCRIPTION |
|---|---|
hydra_log
|
Path to Hydra log file.
TYPE:
|
suffix
|
Suffix to append to log filename.
TYPE:
|
intermediate
|
If True, log to 'hydra_logs/intermediate' path.
TYPE:
|
Source code in src/log_helpers/hydra_utils.py
log_hydra_artifacts_to_mlflow
¶
Log Hydra artifacts to MLflow for imputation runs.
| PARAMETER | DESCRIPTION |
|---|---|
artifacts_dir
|
Artifacts directory path (currently unused).
TYPE:
|
model_name
|
Model name (currently unused).
TYPE:
|
cfg
|
Configuration object (currently unused).
TYPE:
|
run_name
|
Run name (currently unused).
TYPE:
|
Source code in src/log_helpers/hydra_utils.py
Local Artifacts¶
local_artifacts
¶
if_dicts_match
¶
Check if two dictionaries match (placeholder implementation).
| PARAMETER | DESCRIPTION |
|---|---|
_dict1
|
First dictionary (unused in placeholder).
TYPE:
|
_dict2
|
Second dictionary (unused in placeholder).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
Always returns True (placeholder - TODO: implement actual comparison). |
Source code in src/log_helpers/local_artifacts.py
pickle_save
¶
Save results to pickle file with optional verification.
| PARAMETER | DESCRIPTION |
|---|---|
results
|
Data to save.
TYPE:
|
results_path
|
Path to save the pickle file.
TYPE:
|
debug_load
|
If True, reload and verify the saved file.
TYPE:
|
Source code in src/log_helpers/local_artifacts.py
save_results_dict
¶
save_results_dict(
results_dict: dict,
results_path: str,
name: str = None,
debug_load: bool = True,
) -> None
Save results dictionary to pickle file.
Removes existing file if present before saving.
| PARAMETER | DESCRIPTION |
|---|---|
results_dict
|
Dictionary to save.
TYPE:
|
results_path
|
Path for the pickle file (must have .pickle extension).
TYPE:
|
name
|
Name for logging purposes.
TYPE:
|
debug_load
|
If True, verify saved file by reloading.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
NotImplementedError
|
If results_path does not have .pickle extension. |
Source code in src/log_helpers/local_artifacts.py
pickle_load
¶
Load data from pickle file.
| PARAMETER | DESCRIPTION |
|---|---|
results_path
|
Path to pickle file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
object
|
Loaded data. |
| RAISES | DESCRIPTION |
|---|---|
Exception
|
If loading fails, often due to NumPy version mismatch. |
Source code in src/log_helpers/local_artifacts.py
load_results_dict
¶
Load results dictionary from file.
| PARAMETER | DESCRIPTION |
|---|---|
results_path
|
Path to results file (must be .pickle).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Loaded results dictionary. |
| RAISES | DESCRIPTION |
|---|---|
NotImplementedError
|
If file is not a pickle file. |
Source code in src/log_helpers/local_artifacts.py
save_object_to_pickle
¶
Save any object to pickle file.
| PARAMETER | DESCRIPTION |
|---|---|
obj
|
Object to save.
TYPE:
|
path
|
Output file path.
TYPE:
|
Source code in src/log_helpers/local_artifacts.py
save_array_as_csv
¶
Save NumPy array as CSV file.
| PARAMETER | DESCRIPTION |
|---|---|
array
|
Array to save.
TYPE:
|
path
|
Output CSV file path.
TYPE:
|
Source code in src/log_helpers/local_artifacts.py
Naming and URIs¶
log_naming_uris_and_dirs
¶
get_feature_pickle_artifact_uri
¶
get_feature_pickle_artifact_uri(
run: Dict[str, Any],
source: str,
cfg: DictConfig,
subdir: str = "features",
) -> str
Construct MLflow artifact URI for feature pickle files.
| PARAMETER | DESCRIPTION |
|---|---|
run
|
MLflow run dictionary containing 'run_id'.
TYPE:
|
source
|
Data source name used for filename generation.
TYPE:
|
cfg
|
Configuration object (currently unused but kept for API consistency).
TYPE:
|
subdir
|
Subdirectory within the MLflow artifact store.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
MLflow artifact URI in format 'runs:/{run_id}/{subdir}/{filename}'. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_feature_pickle_base
¶
Generate base filename for feature pickle files.
| PARAMETER | DESCRIPTION |
|---|---|
run_name
|
Name of the run to use as the base filename.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Filename with .pickle extension. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_features_pickle_fname
¶
Generate pickle filename for feature data.
| PARAMETER | DESCRIPTION |
|---|---|
data_source
|
Name of the data source.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Filename with .pickle extension. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_baseline_names
¶
Get list of baseline method names for PLR preprocessing.
| RETURNS | DESCRIPTION |
|---|---|
list of str
|
Baseline method names: denoised ground truth and outlier-removed raw. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_feature_name_from_cfg
¶
Extract feature name and version from configuration.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Configuration containing PLR_FEATURIZATION.FEATURES_METADATA with 'name' and 'version' keys.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Combined feature name and version string. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
define_featurization_run_name_from_base
¶
Construct featurization run name from base name and configuration.
| PARAMETER | DESCRIPTION |
|---|---|
base_name
|
Base name to append to the run name.
TYPE:
|
cfg
|
Configuration containing feature metadata.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Run name in format 'features-{feature_name}{version}_{base_name}'. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
xgboost_variant_run_name
¶
xgboost_variant_run_name(
run_name: str,
xgboost_cfg: DictConfig,
model_name: str = "XGBOOST",
) -> str
Modify run name to include XGBoost variant suffix.
| PARAMETER | DESCRIPTION |
|---|---|
run_name
|
Original run name containing the model name.
TYPE:
|
xgboost_cfg
|
XGBoost configuration containing 'variant_name'.
TYPE:
|
model_name
|
Model name string to find and replace in run_name.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Modified run name with variant suffix, or original if no variant. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_pypots_model_path
¶
Convert results path to PyPOTS model path.
| PARAMETER | DESCRIPTION |
|---|---|
results_path
|
Path to results file.
TYPE:
|
ext_out
|
Extension for the output model file.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Path to PyPOTS model file with 'results' replaced by 'model'. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_mlflow_metric_name
¶
Construct MLflow metric name from split and metric key.
| PARAMETER | DESCRIPTION |
|---|---|
split
|
Data split name (e.g., 'train', 'test', 'val').
TYPE:
|
metric_key
|
Metric identifier (e.g., 'auroc', 'mae').
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
MLflow metric name in format '{split}/{metric_key}'. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_outlier_pickle_name
¶
Generate pickle filename for outlier detection results.
| PARAMETER | DESCRIPTION |
|---|---|
model_name
|
Name of the outlier detection model.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Filename in format 'outlierDetection_{model_name}.pickle'. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_outlier_csv_name
¶
Generate CSV filename for outlier detection data export.
| PARAMETER | DESCRIPTION |
|---|---|
model_name
|
Name of the outlier detection model.
TYPE:
|
split
|
Data split name (e.g., 'train', 'test').
TYPE:
|
key
|
Data key identifier.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Filename in format 'outlierDetection_{model_name}{split}.csv'. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_duckdb_file
¶
get_duckdb_file(
data_cfg: DictConfig,
use_demo_data: bool = False,
demo_db_file: str = "PLR_demo_data.db",
use_synthetic_data: bool = False,
) -> str
Get path to DuckDB database file.
| PARAMETER | DESCRIPTION |
|---|---|
data_cfg
|
Data configuration containing 'data_path' and 'filename_DuckDB'.
TYPE:
|
use_demo_data
|
If True, use demo database for testing.
TYPE:
|
demo_db_file
|
Filename of demo database.
TYPE:
|
use_synthetic_data
|
If True, use synthetic database (SYNTH_PLR_DEMO.db) for CI/testing. This takes precedence over use_demo_data.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Absolute path to the DuckDB file. |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If the database file does not exist. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
update_outlier_detection_run_name
¶
Generate descriptive run name for outlier detection based on configuration.
Creates a run name that encodes the model type, detection method, variant, and training data source. For MOMENT models, includes finetune/zeroshot mode, model size (large/base/small), and training data type.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Configuration containing OUTLIER_MODELS with model-specific settings.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Descriptive run name encoding model configuration. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If more than one model is specified in OUTLIER_MODELS. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
update_imputation_run_name
¶
Generate descriptive run name for imputation based on configuration.
Creates a run name that encodes the model type, detection method, variant, and training data source. For MOMENT models, includes finetune/zeroshot mode, model size (large/base/small), and training data type.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Configuration containing MODELS with model-specific settings.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Descriptive run name encoding model configuration. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If more than one model is specified in MODELS. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_torch_model_name
¶
Generate PyTorch model filename from run name.
| PARAMETER | DESCRIPTION |
|---|---|
run_name
|
Name of the training run.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Model filename with .pth extension (e.g., 'MOMENT_finetune_large_model.pth'). |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_debug_string_to_add
¶
Get prefix string for debug experiment names.
| RETURNS | DESCRIPTION |
|---|---|
str
|
Debug prefix '__DEBUG_'. |
get_demo_string_to_add
¶
Get prefix string for demo data experiment names.
| RETURNS | DESCRIPTION |
|---|---|
str
|
Demo data prefix '__DEMODATA_'. |
get_synthetic_string_to_add
¶
Get prefix string for synthetic data experiment names.
Part of the 4-gate isolation architecture. See src/utils/data_mode.py.
| RETURNS | DESCRIPTION |
|---|---|
str
|
Synthetic data prefix 'synth_'. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
if_runname_is_debug
¶
Check if run name indicates a debug run.
| PARAMETER | DESCRIPTION |
|---|---|
run_name
|
Name of the run to check.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if run name contains the debug prefix. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
experiment_name_wrapper
¶
Add prefixes to experiment name based on configuration flags.
Prepends demo data, debug, and/or synthetic prefixes to the experiment name if the corresponding configuration flags are set.
Part of the 4-gate isolation architecture. See src/utils/data_mode.py.
Priority order (applied in reverse so first prefix appears first): 1. synthetic (synth_) - from EXPERIMENT.is_synthetic or data_mode detection 2. demo data (__DEMODATA_) - from EXPERIMENT.use_demo_data 3. debug (__DEBUG_) - from EXPERIMENT.debug
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
Base experiment name.
TYPE:
|
cfg
|
Configuration with EXPERIMENT.use_demo_data, EXPERIMENT.debug, and EXPERIMENT.is_synthetic flags.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Experiment name with appropriate prefixes. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_outlier_detection_experiment_name
¶
Get experiment name for outlier detection from configuration.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Configuration containing PREFECT.FLOW_NAMES.OUTLIER_DETECTION.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Experiment name with appropriate prefixes applied. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_model_name_from_run_name
¶
Extract model name and key from run name.
For MOMENT models, strips version and size information to create a normalized key. For other models, the key equals the model name.
| PARAMETER | DESCRIPTION |
|---|---|
run_name
|
Full run name containing model information.
TYPE:
|
task
|
Task type (currently unused, reserved for future use).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple of str
|
Tuple of (model_name, model_key) where model_key is normalized. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_foundation_model_names
¶
Get list of supported foundation model names.
| RETURNS | DESCRIPTION |
|---|---|
list of str
|
Names of foundation models: MOMENT and UniTS. |
get_simple_outlier_detectors
¶
Get list of traditional outlier detection method names.
| RETURNS | DESCRIPTION |
|---|---|
list of str
|
Names of simple outlier detectors: LOF, OneClassSVM, PROPHET. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_eval_metric_name
¶
Extract evaluation metric name from classifier configuration.
Looks for metric_val in HYPERPARAMS (XGBoost, CatBoost, TabM) or fit_params.scoring (Logistic Regression).
| PARAMETER | DESCRIPTION |
|---|---|
cls_model_name
|
Name of the classifier model.
TYPE:
|
cfg
|
Configuration containing CLS_HYPERPARAMS for the model.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Name of the evaluation metric. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If eval_metric cannot be found in the configuration. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_train_loss_name
¶
Get training loss function name from configuration.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Configuration containing CLASSIFICATION_SETTINGS.loss.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Name of the loss function. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
update_cls_run_name
¶
update_cls_run_name(
cls_model_name: str,
source_name: str,
model_cfg: DictConfig,
hparam_cfg: DictConfig,
cfg: DictConfig,
) -> str
Construct classification run name from model and source information.
| PARAMETER | DESCRIPTION |
|---|---|
cls_model_name
|
Name of the classifier model.
TYPE:
|
source_name
|
Name of the data source/preprocessing pipeline.
TYPE:
|
model_cfg
|
Model configuration (currently unused).
TYPE:
|
hparam_cfg
|
Hyperparameter configuration (currently unused).
TYPE:
|
cfg
|
Full configuration for extracting eval metric.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Run name in format '{model}eval-{metric}_'. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_embedding_npy_fname
¶
Generate filename for embedding numpy array.
| PARAMETER | DESCRIPTION |
|---|---|
model_name
|
Name of the model that generated embeddings.
TYPE:
|
split
|
Data split name (e.g., 'train', 'test').
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Filename in format '{model_name}embedding.npy'. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_moment_cls_run_name
¶
Generate classification run name for MOMENT model.
Encodes model variant, detection type, and loss weighting in the name.
| PARAMETER | DESCRIPTION |
|---|---|
cls_model_name
|
Base classifier model name.
TYPE:
|
cls_model_cfg
|
MOMENT model configuration with MODEL settings.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Run name in format '{model}-{variant}_{detection_type}[_w]'. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_imputation_pickle_name
¶
Generate pickle filename for imputation results.
| PARAMETER | DESCRIPTION |
|---|---|
model_name
|
Name of the imputation model.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Filename in format 'imputation_{model_name}.pickle'. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_summary_fname
¶
Generate summary database filename from experiment name.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
Name of the experiment.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Filename with 'PLR_' prefix removed and .db extension. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_summary_fpath
¶
Get full path for summary database, removing existing file if present.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
Name of the experiment.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Full path to summary database file. |
Notes
Deletes existing file at the path before returning.
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_summary_artifacts_fname
¶
Generate summary artifacts pickle filename from experiment name.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
Name of the experiment.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Filename with 'PLR_' prefix removed and .pickle extension. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
get_summary_artifacts_fpath
¶
Get full path for summary artifacts pickle, removing existing file if present.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
Name of the experiment.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Full path to summary artifacts pickle file. |
Notes
Deletes existing file at the path before returning.
Source code in src/log_helpers/log_naming_uris_and_dirs.py
parse_task_from_exp_name
¶
Parse task type from experiment name string.
| PARAMETER | DESCRIPTION |
|---|---|
experiment_name
|
Name of the experiment containing task identifier.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Task type: 'outlier_detection', 'imputation', 'classification', or 'featurization'. |
Source code in src/log_helpers/log_naming_uris_and_dirs.py
Model Retraining¶
retrain_or_not
¶
check_if_imputation_model_trained_already_from_mlflow
¶
check_if_imputation_model_trained_already_from_mlflow(
cfg: DictConfig, run_name: str, model_type: str
) -> dict | None
Check if an imputation model with matching configuration exists in MLflow.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Configuration for determining search parameters.
TYPE:
|
run_name
|
Name of the run to search for.
TYPE:
|
model_type
|
Type of model to search for.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict or None
|
Best matching run data if found, None otherwise. |
Source code in src/log_helpers/retrain_or_not.py
if_retrain_the_imputation_model
¶
if_retrain_the_imputation_model(
cfg: DictConfig,
run_name: str | None = None,
model_type: str = "imputation",
) -> tuple[bool, dict]
Determine whether to retrain an imputation model.
Checks configuration flag and MLflow history to decide if retraining is needed.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Configuration with IMPUTATION_TRAINING.retrain_models flag.
TYPE:
|
run_name
|
Name of the run to check.
TYPE:
|
model_type
|
Type of model.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
tuple
|
Tuple of (should_retrain: bool, best_run: dict). |
Source code in src/log_helpers/retrain_or_not.py
check_if_imputation_source_featurized_already_from_mlflow
¶
check_if_imputation_source_featurized_already_from_mlflow(
cfg: DictConfig, experiment_name: str, run_name: str
) -> bool
Check if features have already been extracted for an imputation source.
| PARAMETER | DESCRIPTION |
|---|---|
cfg
|
Configuration object (currently unused).
TYPE:
|
experiment_name
|
MLflow experiment name.
TYPE:
|
run_name
|
Run name to search for.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if featurization run exists, False otherwise. |
Source code in src/log_helpers/retrain_or_not.py
if_refeaturize_from_imputation
¶
Determine whether to re-extract features from imputation results.
| PARAMETER | DESCRIPTION |
|---|---|
run_name
|
Run name to check.
TYPE:
|
experiment_name
|
MLflow experiment name.
TYPE:
|
cfg
|
Configuration with PLR_FEATURIZATION.re_featurize flag.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if re-featurization is needed. |
Source code in src/log_helpers/retrain_or_not.py
if_recompute_and_viz_imputation_metrics
¶
Determine whether to recompute and visualize imputation metrics.
| PARAMETER | DESCRIPTION |
|---|---|
_recompute
|
Input flag (currently unused — placeholder implementation).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
Always returns True in current implementation. |
Notes
This is a placeholder function. Future implementation should check for previously computed metrics to avoid redundant computation.
Source code in src/log_helpers/retrain_or_not.py
if_recreate_ensemble
¶
Determine whether to recreate an ensemble model.
| PARAMETER | DESCRIPTION |
|---|---|
ensemble_name
|
Name of the ensemble.
TYPE:
|
experiment_name
|
MLflow experiment name.
TYPE:
|
cfg
|
Configuration object (currently unused).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
True if no previous runs found, False otherwise. |
Source code in src/log_helpers/retrain_or_not.py
System Utilities¶
system_utils
¶
get_commit_id
¶
Get current git commit ID.
| PARAMETER | DESCRIPTION |
|---|---|
return_short
|
If True, return short hash; otherwise return full hash.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Git commit hash, or np.nan if git is not available. |
Source code in src/log_helpers/system_utils.py
get_processor_info
¶
Get CPU model name from system.
| RETURNS | DESCRIPTION |
|---|---|
str or nan
|
CPU model name, or np.nan if detection fails. |
Notes
Currently only fully implemented for Linux. Windows and macOS have placeholder implementations.
Source code in src/log_helpers/system_utils.py
get_system_params
¶
Get system hardware parameters.
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with 'CPU' (model name) and 'RAM_GB' (total RAM in GB). |
Source code in src/log_helpers/system_utils.py
get_library_versions
¶
Get versions of key Python libraries.
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with version strings for Python, NumPy, Polars, OS, PyTorch, CUDA, and cuDNN. |
Source code in src/log_helpers/system_utils.py
get_system_param_dict
¶
Get comprehensive system parameters dictionary.
Collects hardware info, library versions, and git commit for reproducibility logging.
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with 'system', 'libraries', and 'git_commit' keys. |
Source code in src/log_helpers/system_utils.py
Visualization Logging¶
viz_log_utils
¶
get_run_ids_from_infos
¶
Extract run IDs from MLflow info dictionaries.
| PARAMETER | DESCRIPTION |
|---|---|
mlflow_infos
|
Dictionary mapping names to MLflow info with 'run_info' containing 'run_id'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Mapping of names to run IDs. |
Source code in src/log_helpers/viz_log_utils.py
export_viz_as_artifacts
¶
export_viz_as_artifacts(
fig_paths: dict,
flow_type: str,
cfg: DictConfig,
mlflow_run_ids: dict = None,
mlflow_infos: dict = None,
)
Export visualization files as MLflow artifacts.
Logs figure files to all relevant MLflow runs. Useful for aggregated visualizations that span multiple model runs.
| PARAMETER | DESCRIPTION |
|---|---|
fig_paths
|
Dictionary mapping figure names to file paths.
TYPE:
|
flow_type
|
Type of flow for logging context.
TYPE:
|
cfg
|
Configuration object (currently unused).
TYPE:
|
mlflow_run_ids
|
Pre-computed mapping of model names to run IDs.
TYPE:
|
mlflow_infos
|
MLflow info dictionaries to extract run IDs from.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If neither mlflow_run_ids nor mlflow_infos is provided. |
Source code in src/log_helpers/viz_log_utils.py
Polars Utilities¶
polars_utils
¶
cast_numeric_polars_cols
¶
Cast all numeric columns in Polars DataFrame to specified type.
Useful for avoiding schema errors when combining DataFrames with different numeric precision.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
Input DataFrame.
TYPE:
|
cast_to
|
Target numeric type.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with numeric columns cast to specified type. |
| RAISES | DESCRIPTION |
|---|---|
NotImplementedError
|
If cast_to is not "Float64". |