Spearman Correlation Heatmaps After Feature Selection

Published: 20 November 2024| Version 1 | DOI: 10.17632/hxd7gmrvth.1
Contributors:
abdulkader hajjouz,

Description

Description: This is a Spearman Correlation Heatmap of the 32 features used for machine learning and deep learning models in cybersecurity. The diagonal cells are perfect self-correlation (value = 1) and the off-diagonal cells are pairwise correlations between features. Since there are no strong correlations (close to 1 or -1) we removed the redundant or irrelevant features, so each selected feature brings unique and independent information to the model. Feature selection is key in building cyber intrusion detection systems as it reduces computational overhead, simplifies the model and improves accuracy and robustness. This is part of the systematic feature engineering process to optimize datasets for anomaly detection, network traffic analysis and intrusion detection. Researchers in AI for cybersecurity can use this to build more interpretable and efficient models to detect in large scale networks. This figure shows the importance of correlation analysis for high dimensional datasets and contributes to cyber, data science and machine learning. Why It Matters: Reduces overfitting in machine learning models. Improves computational efficiency for large-scale datasets. Enhances feature interpretability for robust cybersecurity solutions. Keywords: Spearman Correlation Heatmap, Feature Selection, Intrusion Detection System, Cybersecurity, Machine Learning, Deep Learning, Anomaly Detection, Network Traffic Analysis, Artificial Intelligence in Cybersecurity, Dataset Optimization, Feature Engineering for Cyber Threats References: This file pertains to our research study, which has been accepted for publication in the Scientific and Technical Journal of Information Technologies, Mechanics and Optics. The study is titled: "Enhancing and Extending CatBoost for Accurate Detection and Classification of DoS and DDoS Attack Subtypes in Network Traffic." https://doi.org/10.1109/ICSIP61881.2024.10671552 https://doi.org/10.24143/2072-9502-2024-3-65-74

Files

Steps to reproduce

import seaborn as sns import matplotlib.pyplot as plt import numpy as np from scipy.cluster import hierarchy from scipy.stats import spearmanr corr = spearmanr(X).correlation corr_linkage = hierarchy.ward(corr) fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(25, 40)) sns.heatmap(corr, xticklabels=X.columns, yticklabels=X.columns, annot=True, fmt=".2f", linewidths=.5, cmap="coolwarm", ax=ax1) # dendrogram dendro = hierarchy.dendrogram(corr_linkage, labels=X.columns, ax=ax2, leaf_rotation=90) dendro_idx = np.arange(0, len(dendro['ivl'])) ax2.plot([0, 1000], [1, 1], ':r') plt.show() corr_updated = spearmanr(X).correlation plt.figure(figsize=(25, 20)) sns.heatmap(corr_updated, xticklabels=X.columns, yticklabels=X.columns, linewidths=.5, cmap=sns.diverging_palette(620, 10, as_cmap=True))

Institutions

Universitet ITMO Megafakul'tet komp'uternyh tehnologij i upravlenia

Categories

Computer Science, Cybersecurity, Data Science, Machine Learning, Feature Selection, Big Data Analytics, Brain Anomaly, Network Analysis

Licence