Embedded Online Machine Learning

Papers

Nov 24, 2021·

Nikita Yudin

Dmitry Kamzolov

Vadim Sinolits

Pavel Golovkin

Alexey Erchenko

· 0 min read

Cite Source Document DOI

Abstract

The paper presents research on a set of “classical” machine learning algorithms for tiny (microbatch, i.e., batch size equal to or less than 128) embedded online machine learning on ARM processor boards with hard memory limits and a tiny memory footprint while running on a single CPU without multithreading. We propose mathematical improvements to algorithms as well as other programming optimizations. In the presence of evolving data streams, we present an adaptation of the Gradient Boosting Decision Trees (GBDT) learning algorithm for classification tasks, the eXtreme Gradient Boosting (XGBoost, XGB) and the Random Forest (RF) learning algorithms for supervised anomaly detection tasks, and the Extended Isolation Forest (EIF) learning algorithm for unsupervised anomaly detection tasks. In this scenario, as new data is added over time, the connection between the class and the characteristics may shift, resulting in concept drift. The proposed technique generates new members of the ensemble from microbatches and/or batches of data for each algorithm as new data becomes available. The maximum ensemble size is specified, but learning does not stop when it reaches this size because the ensemble is constantly updated with new data to ensure compatibility with the current notion. We tested our technique on real-world data and compared it to the original batch-incremental learning algorithms for data streams. Our implementations gain a speedup in inference up to several times even demonstrating prediction quality improvement by 0.1-0.3 in terms of F1 measure in some cases.

Type

Conference

Publication

In 2021 International Conference Engineering and Telecommunication (En&T)

Last updated on Nov 24, 2021

Applications

Authors

Dmitry Kamzolov

Researcher

← The Power of First-Order Smooth Optimization for Black-Box Non-Smooth Problems Jan 28, 2022

An Accelerated Second-Order Method for Distributed Stochastic Optimization Mar 26, 2021 →