COBRA: Counterfactual oversampling framework for imbalanced structured data classification
Recently, cyberattacks have grown more frequent and sophisticated, placing sustained pressure on digital infrastructure.
Network intrusion detection remains a first line of defense, and Deep Learning (DL) enables large-scale
identification of malicious activity in network flows. A persistent challenge, however, is severe class imbalance:
real traffic is dominated by benign flows, while rare but consequential attacks are underrepresented. DL models
trained on such data typically favor the majority class and systematically overlook the very attacks of interest.
A common remedy is data rebalancing through oversampling, yet its effects on what models learn and which
features they depend on remain underexplored. In this work, we propose COBRA ( COunterfactual BAlancing for
Rare-class ) method, where we examine oversampling through counterfactual explanations: we synthesize realistic
?what-if? variants that minimally transform network flows into particular attack classes, use them to rebalance
the training set, and analyze how feature relevance changes once these counterfactuals are introduced using
eXplainable AI (XAI). Our study focuses on deep neural networks (DNNs) for multi-class intrusion detection,
evaluating both detection effectiveness and explanation stability. We conduct a systematic evaluation of two
widely used benchmarks, NSL-KDD and CICIDS17. Furthermore, we assess the robustness of these models under
adversarial attacks.