Classification¶

Stage 4 of the pipeline: Training and evaluating classifiers.

Default Classifier: CatBoost¶

Fixed Classifier

CatBoost is the default and recommended classifier. The research question is about preprocessing effects, not classifier comparison.

Why CatBoost?¶

Property	Value
Mean AUROC	0.878
Best AUROC	0.913
Handles categorical	Yes
GPU support	Yes
Overfitting protection	Built-in

Bootstrap Evaluation¶

All results use bootstrap validation:

CLS_EVALUATION:
  BOOTSTRAP:
    n_iterations: 1000
    alpha_CI: 0.95

This provides:

Robust confidence intervals
Per-iteration metrics for statistical tests
Subject-wise stability analysis

STRATOS Metrics¶

Classification is evaluated with all STRATOS-compliant metrics:

Discrimination¶

AUROC: Area Under ROC Curve (95% CI)

Calibration¶

Calibration slope: Should be ~1.0
Calibration intercept: Should be ~0.0
O:E ratio: Observed/Expected ratio

Overall Performance¶

Brier score: Proper scoring rule
Scaled Brier (IPA): Interpretable proportion

Clinical Utility¶

Net Benefit: At clinical threshold
DCA curves: Decision Curve Analysis

Running Classification¶

# Default (CatBoost with best preprocessing)
python -m src.classification.flow_classification

# With specific preprocessing
python -m src.classification.flow_classification \
    outlier_method=MOMENT-gt-finetune \
    imputation_method=SAITS

API Reference¶

flow_classification ¶

flow_classification(cfg: DictConfig) -> None

Main classification flow for glaucoma screening from PLR features.

Orchestrates the classification pipeline including feature-based and time-series classification approaches. Initializes MLflow experiment and delegates to subflows.

PARAMETER	DESCRIPTION
`cfg`	Hydra configuration with PREFECT flow names and settings. TYPE: `DictConfig`

Notes

Time-series classification is currently disabled as it showed limited promise after refactoring.

Source code in src/classification/flow_classification.py

def flow_classification(cfg: DictConfig) -> None:
    """
    Main classification flow for glaucoma screening from PLR features.

    Orchestrates the classification pipeline including feature-based
    and time-series classification approaches. Initializes MLflow
    experiment and delegates to subflows.

    Parameters
    ----------
    cfg : DictConfig
        Hydra configuration with PREFECT flow names and settings.

    Notes
    -----
    Time-series classification is currently disabled as it showed
    limited promise after refactoring.
    """
    experiment_name = experiment_name_wrapper(
        experiment_name=cfg["PREFECT"]["FLOW_NAMES"]["CLASSIFICATION"], cfg=cfg
    )
    logger.info("FLOW | Name: {}".format(experiment_name))
    logger.info("=====================")
    prev_experiment_name = experiment_name_wrapper(
        experiment_name=cfg["PREFECT"]["FLOW_NAMES"]["FEATURIZATION"], cfg=cfg
    )

    # Init the MLflow experiment
    init_mlflow_experiment(experiment_name=experiment_name)

    # Classify from hand-crafted features/embeddings
    flow_feature_classification(cfg, prev_experiment_name)

    # Classify from time series
    ts_cls = False
    if ts_cls:
        raise NotImplementedError(
            "Need to be finished, new bug with the refactoring, but did not seem promising"
        )