The research goal of my master thesis was to find real-time capable solutions to automatically detect anomalies in time series data streams, which are especially useful to monitor servers. I evaluated several algorithms and finally ensembled an own algorithm which meets almost all of the previously gathered requirements.
In the figure, you can see a collection of outliers, the algorithm is so good, it is even hard to see the anomaly behind all the true positives (green dots) ;-) At the beginning of the measurements, there are two false positives (red dots)
Very strong trends in the data set are still tricky to handle, especially at the beginning of the measurements, because it is difficult to distinguish between a normal and an anomalous change. Welcome to the topic of anomaly detection! ;-)
I took enough time to deep dive into the topic (but it is still a huge topic!) and came up with a good algorithm, which is very resource friendly (no loops over the whole dataset, just incremental updates).
During my studies, I messed up my Python installation and only the macOS built-in Python 2 worked ¯\(ツ)_/¯
As the topic was bigger than expected, the chapter about the production use case (e.g. using influxDB and kapacitor) was neglected.
python, tensorflow, keras, docker, influxDB, kapacitor