A Hybrid Oversampling Framework for NIDS Class Imbalance
SMOTE selects two minority samples at random — regardless of which cluster they belong to — and interpolates between them. This creates points in empty space between clusters, introducing noise and confusion for the classifier.
Each step in DSMOTE solves a specific problem. Click any step icon to learn what it does and why it matters.
Instead of treating all minority samples as one blob, DSMOTE uses KMeans to discover sub-groups. Synthetic samples are then generated within each cluster — not across them.
A synthetic sample x_new is accepted only if ‖x_new − x_i‖ ≤ d_mean. This density gate keeps new points inside the safe zone — preventing noise, overfitting, and cluster bleeding.
Minority classes like pod (264 samples) and warezclient (1,020 samples) are boosted to ~256K–264K samples, achieving near-parity with the majority class after controlled reduction.
Under SMOTE, RF learns to predict every single sample as "Benign" (91.9% accuracy by doing nothing). DSMOTE forces the model to actually learn minority attack classes — Exploits, Fuzzers, Backdoor, Shellcode — that matter for security.