Automated Freshness & Volume Monitors

This video details how Elementary's automated monitors work for detecting anomalies in data freshness and volume without manual configuration. It explains how metadata from information schemas and query histories is used to track patterns in table updates and identify anomalies, such as missing updates or low data volumes. The video also covers how to fine-tune settings, provide feedback on results, and adjust detection parameters for more accurate monitoring over time.

We collect metadata from the information schemas and query or job histories that are available in the data warehouses in order to provide you automated monitors specifically for freshness and volume, which are not coupled into your pipelines.

It means that we can automatically detect patterns of updates to your tables. And understand if there are tables that are not updated or if there are significantly lower volumes ingested into your tables. And this way you get like a wide coverage of volume and freshness without needing to manually and one by one configuring it into your tables.

So this is why it's called automated monitors because they don't require any configurations from your side. And it's also important that these monitors are implemented this way, because if there is an issue with your pipeline, that something happens, someone changes the schedule, there's like a failure with the pipeline itself, then you are covered with these monitors, and these are really decoupled from the pipeline itself. So we mentioned that we have the automated monitors out of the box. So by default, when you will onboard to Elementary you will see an increase that is dramatic, in terms of the number of volume and freshness validations, because once you onboard to Elementary, we automatically pull all the metadata from the query history.

And provide you coverage of automated volume and freshness for all your tables. So it looks as follows. You can see a graph that represents the incremental updates to the to the table. So each point here is a metric that represents the number of rows at this update time. And you can see that in this case, all of a sudden the table stopped growing.

And this is why it was marked as an anomaly. You can also tweak the settings and do some fine tuning to the automated monitor. So you can say that you only care about drops or spikes. You can say, I want to relax. The expectations or make it make them more tight. You can increase that detection period because sometimes, for example, with marketing data, it's really common that the data is being back, backfilled.

So you'd want the detection period to be longer. So then once you are certified with the changes, you can simulate the changes and see how they will impact the test results. And then you can just save it. And from now on, the test will run with these settings and it will also trigger another run for this test specifically.

In addition, you can also provide us feedback so you can say if this was a false positive or true positive. You can add some context about it. Our algorithms get these tagged results and use them to improve the model and if there are many false positives, they would make it less sensitive.

And we also have human in the loop that like an analyst that reviews these feedbacks and is making sure that the algorithm adjusted correctly to the given feedback. So we really recommend to work with both the settings and the feedback to fine tune the results. Usually after a few days, once you connected your data warehouse, The automated volumes are stabilized and you get accurate results.
So it usually takes a few days until all the backfilling is done, and you worked with it a little, and you did some fine tuning. But usually after a few days you don't need to to configure them anymore, and it works it works well. Let's see an example for automated freshness. So for automated freshness, what we do is that we show you a graph that represents updates.

Each and every line that you see is an actual update to the underlying monitor table. So the time between two lines is the time without updates. It's the gap between two consecutive updates. And this way you can see like the cadence that this table is usually getting updates like every day or so, and all of the sudden now when we we run the monitor on the metadata that is being collected from the updates by default, every hour, we can also configure to be triggered more often or less often.

It depends on the use cases, but usually it's within one hour and like we validate if the last update happened in a longer time, like a longer time as best since the last update. And when we compare it to historical updates, we see that this is a wider gap, a wider range than what we use as a baseline.

So this is how the automated freshness test works.