Usage ====== Run ---- .. code-block:: bash git clone https://github.com/FaramirHurin/ADV-O.git cd ADV-O pip install -r requirements.txt python main.py Output ------ .. code-block:: bash Table 6: Synthetic data: R2 scores for the predicted features for various regressors. x_terminal_id y_terminal_id TX_AMOUNT MLPRegressor(max_iter=2000, random_state=42) 0.85 0.59 0.94 Ridge(random_state=42) 0.85 0.58 0.93 RandomForestRegressor(random_state=42) 0.85 0.59 0.90 Naive 0.39 0.54 0.91 Table 7: Synthetic data: accuracy of oversampling algorithms. All oversampling algorithms have been tested using a Balanced Random Forest. No oversampling has been tested with a classic Random Forest ('Baseline'), and a Balanced Random Forest ('Baseline balanced'). Baseline Baseline_balanced SMOTE Random KMeansSMOTE ADVO PRAUC 0.32 0.37 0.36 0.37 0.36 0.37 PRAUC_Card 0.45 0.50 0.46 0.49 0.48 0.48 Precision 0.34 0.23 0.27 0.26 0.25 0.27 Recall 0.29 0.89 0.68 0.72 0.73 0.69 F1 score 0.31 0.36 0.39 0.38 0.37 0.39 PK50 0.76 0.36 0.56 0.30 0.40 0.42 PK100 0.78 0.37 0.52 0.38 0.39 0.45 PK200 0.74 0.38 0.50 0.44 0.36 0.55 PK500 0.61 0.40 0.50 0.40 0.40 0.55 PK1000 0.48 0.42 0.46 0.44 0.40 0.48 PK2000 0.36 0.38 0.40 0.39 0.38 0.41 Table 8: Synthetic data: AUC of absolute differences between kde x_terminal_id y_terminal_id TX_AMOUNT SMOTE 0.11 0.10 0.18 Random 0.05 0.11 0.02 KMeansSMOTE 0.05 0.10 0.02 ADVO 0.09 0.12 0.03