دراسة تأثير تقنيات اختيار الميزات على أداء خوارزمية الخلايا الجذعية المناعية DCA في تصنيف الهجمات في الشبكات الحاسوبية
Keywords:
Features Selection Algorithms, Computational Networks, Dendritic Cell Algorithm, UNSW-NB15 Dataset, RFE.Abstract
The exponential growth in network traffic and associated security threats necessitates the development of robust systems for detecting and classifying network attacks. This is crucial for mitigating their impact on individuals, organizations, and societies, thereby ensuring cybersecurity.
Classification systems based on artificial intelligence techniques and algorithms rely on massive amounts of network data. This makes the process of selecting an appropriate feature subset a critical step in enhancing system efficiency, accuracy, and interpretability. This is achieved by reducing the dimensionality of the dataset, mitigating noise, and eliminating redundant or irrelevant features that don't contribute to classification categories.
This research paper investigates a set of machine learning-based feature selection techniques including Redundant Feature Elimination (RFE) algorithms, such as RF-RFE, SVM-RFE, LR-RFE, and Genetic Algorithms (GA). These Techniques are applied to the benchmark dataset UNSW-NB15, which comprises 45 features related to network traffic and connection behavior in computer networks. The performance and effectiveness of these techniques in selecting the optimal feature set for 8-14 features were analyzed, and their performance in improving the accuracy of classification models was compared by testing their results with the immune Dendritic cell algorithm (DCA) for identifying malicious activities and detecting threats in networks.
The final results of this study demonstrated an outstanding performance of the classification model when feature selection techniques were applied. this confirms their effectiveness in dimensionality reduction noise mitigation and redundancy elimination. The Genetic Algorithm achieved the best results with an accuracy of 99% using only 8 features and a very low false alarm rate (FAR) of 1.61%, outperforming other machine learning-based algorithms, among which the Random Forest Algorithm with an accuracy of 97.6%.