Advanced Machine Learning Pipeline Utilizing Generative and Explainable Artificial Intelligence for Reliable Intrusion Detection in Internet of Things.

Chandrani Mukherjee

doi:10.64149/J.Carcinog.25.1.67-88

Authors

Chandrani Mukherjee Author

DOI:

https://doi.org/10.64149/J.Carcinog.25.1.67-88

Keywords:

generative artificial intelligence, internet of things, intrusion detection system, local interpretable modelagnostic explanations, principal component analysis, Shapley additive explanation, transmission control protocol

Abstract

Recent research on intrusion detection systems (IDSs) in the internet of things (IoT) has explored various models, including deep neural networks, classical classifiers, explainable artificial intelligence (XAI), and dimensionality reduction methods such as principal component analysis (PCA). However, few studies offer a comprehensive AI pipeline that systematically integrate data preprocessing, class imbalance handling (e.g., using the synthetic minority oversampling technique (SMOTE)), advanced feature engineering (e.g., PCA and linear discriminant analysis), multimodel selection paradigms, and modern XAI techniques. This study fills that gap by proposing a unified IDS framework that integrates these elements and introduces generative AI and large language models (LLMs), such as Gemini, to automate dynamic feature extraction from unstructured network logs. Data visualization tools like tdistributed stochastic neighbor embedding (t-SNE) and Shapley additive explanation (SHAP) are employed to analyze feature distributions before and after dimensionality reduction. Experimental results confirm that the SMOTE significantly improves model accuracy, whereas dimensionality reduction has limited effect on model performance. Among evaluated classifiers, XGBoost achieves the highest accuracy (99.99%). For explainability, TreeSHAP is preferred due to its computational efficiency, and t-SNE visualizations based on SHAP values reveal distinct clusters of benign and malicious network traffic. This integration of data processing, automated feature extraction using LLMs, model selection, and interpretable machine learning offers a novel approach to IoT security. In addition to advancing IDS methodology via robust and transparent decision-making, this study exemplifies the potential of integrating automated data engineering and XAI in cyber–physical system research..