Adaptive Automl Pipelines for Large-Scale Data Streams Under Concept Drift
International Journal of Development Research
Adaptive Automl Pipelines for Large-Scale Data Streams Under Concept Drift
Received 19th January, 2025; Received in revised form 17th February, 2025; Accepted 24th March, 2025; Published online 30th April, 2025
Copyright©2025, Akash Vijayrao Chaudhari and Pallavi Ashokrao Charate. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Data stream mining in non-stationary environments presents the twin challenges of automated model selection and concept drift adaptation. This paper proposes a framework for Adaptive AutoML Pipelines capable of continuous learning from large-scale streaming data under evolving distributions. We integrate Automated Machine Learning (AutoML) with online learning to dynamically optimize full model pipelines – including preprocessing, feature selection, and classification – as new data arrive and concepts change. A drift detection mechanism triggers rapid pipeline reconfiguration or incremental update when statistical properties of the target variable shift over timearxiv.orgpure.tue.nl. Experiments on real and synthetic data streams with sudden, gradual, and recurring drift demonstrate that the proposed adaptive pipelines significantly outperform static AutoML solutions and classical stream-learning baselines in both accuracy and time to recovery after drift. We present detailed methodology, including a high-level pipeline architecture diagram and concept drift handling strategies, and we report results with tables and figures for multiple benchmark streams. The findings underscore the importance of continuous pipeline (re)optimization for maintaining robust performance in dynamic environments. Finally, we discuss scalability considerations – such as asynchronous model search and distributed deployment – that enable our approach to handle high-velocity data streams in real-world applications like fraud detection and IoT sensor networks.