I haven't written much here lately as I've been busy elsewhere. But some of that includes working extensively with Prometheus for monitoring various services. Along the way, I've written up a blog post on the Grafana Labs blog about working with noisy data. The beginning of it is here but the full post is up on the Grafana blog.
Most of us have learned the hard way that it’s usually cheaper to fix something before it breaks and needs an expensive emergency repair. Because of that, I like to keep track of what’s happening in my house so I know as early as possible if something is wrong.
As part of that effort, I have a temperature sensor in my attic attached to a Raspberry Pi, which Prometheus scrapes every 15 seconds so I can view the data in Grafana. This way, I know how things look over time, and I can get alerts if my house is getting too hot or too cold.
Unfortunately, my temperature sensor is a bit flaky. It works most of the time, but occasionally it gives me wildly inaccurate readings. Here’s an example that looks at a few hours of data:
Even though the weather can be unpredictable here in New England, it’s pretty unlikely that the room temperature dropped from 20°C to -10° for 15 seconds before returning to normal!
Still, other than these occasional single readings that are obviously wrong, most of the data looks good. So my first thought when I saw this glitch was to replace the sensor with one that works more consistently. After all, having good data helps with everything.
But I started to think about the problem more . . .
What would I do if I had a sensor like this that couldn’t easily be replaced? If my sensor were on top of a mountain or deep under the ocean, it would be difficult and expensive to fix. And if it were on a satellite or in a rover on Mars, it would be impossible.
I couldn’t help but wonder: If most of the data is good, is there a way to keep the good bits and throw out the bad?
After speaking with some data scientist friends, it turns out that the answer is yes. And even better, Prometheus has a function to do exactly that!