It’s all Amazon’s fault.
Today’s consumers expect products to be:
- easy to find and order
- available to the exact specs they want
- top quality
- delivered the same day
They also would prefer to go through the order process without needing to speak to a human being. Except, of course, if something goes wrong, in which case ideally a human representative should be reachable immediately.
Data consumers are no different. They expect data analytics products to be self-serve, easy to find and deliver accurate and reliable results that can power smart business decisions. If there’s an insight that the existing data products in stock don’t offer, requests are made to develop (ASAP, please) a new data product that fits the bill.
It sounds lovely, and of course, there’s nothing you want more than to give your data users exactly what they need, but still…
Isn’t that expecting a little too much of you?
From data asset managers to data product manufacturers
If you felt your blood pressure starting to rise and your head beginning to throb when you read the above, you’re not alone in the overwhelm.
The shift from thinking about data as “an asset that it’s your responsibility to manage” to data as “a product that it’s your responsibility to produce and get into the hands of users” is a biggie.
Work becomes about customer expectations and customer support. It’s about making the resources you have practically – not just theoretically – available for use.
In a way, it’s exciting. As a data analyst or engineer, you can make a significantly more direct impact on your company’s business success. You can work hand-in-hand with your data consumers in fruitful collaborations, harnessing your skills and ingenuity to build much-needed tools.
But it can also be overwhelming. How do you increase the quantity and availability of data product releases while simultaneously maintaining – or even improving – the quality of those products? With more direct responsibility for the data end-product, what toll will that take on your and your team if the product breaks or turns out unhelpful results (often through no fault of yours!)?
This dilemma is dealt with by the emerging field of DataOps and one of its chief components: data observability.
What is data observability?
Data observability is the capability and process of being on top of your data pipelines at a level that provides the ability to catch, identify and resolve issues in a timely manner.
A DataOps view prescribes that the purpose of having data is to use the data. Data analytics teams take the perspective that they are delivering data analytics products – not just services – to data consumers. Products should be ready-to-consume, easily accessible and responsive to the consumers’ needs.
In addition – and pardon us if this sounds obvious – the product should WORK. It shouldn’t stop functioning, and it shouldn’t function incorrectly. If something breaks, it should be fixed immediately, with the least possible impact on the consumers’ product use.
Achieving the former goal (i.e. making your data product development fast, agile and responsive to your consumers’ needs) requires collaboration between data managers, developers and consumers, a development environment conducive to experimentation and automated testing.
Achieving the latter goal (i.e. the uninterrupted smooth functioning of developed data products) requires data observability tools and solutions.
What do data observability tools do?
A comprehensive data observability platform will keep you abreast of:
- Suspicious anomalies in the freshness, distribution or volume of incoming or outgoing data
- Leading indicators of known data processing or accuracy issues
- The root cause of the problem upstream
- The impact of the problem downstream
A data observability solution will send out alerts for anomalies and leading indicators so that your data engineering team can look into an issue right away, hopefully before it has actually become a bonafide problem. Ideally, data observability technology would be powered by machine learning capabilities, enabling it to figure out by itself what qualifies as a potential issue or an anomaly.
When it comes to enabling your team to quickly track down the source of the data issue and fix it, strong data lineage capabilities are key in a data observability platform. Some issues, however, are straightforward and defined enough to be fully or partially dealt with by automated workflows, and a strong data observability platform will offer that.
Signs you may need to up your data observability game
Your data products rely heavily on external data sources
You may be fastidious when it comes to your data, but as soon as your data products aren’t using just your data, you’re in trouble. External data sources are a wildcard, and it’s best to keep a close eye on wildcards when you want tame, predictable results.
There are too many cooks in the data pipeline
Why do too many cooks spoil the broth? Because there will be an inevitable lack of communication between said cooks, and one will dump in chicken broth while the other stirs in broccoli and a third adds some pureed corn. Or every one of the nine-cook crew thinks that he personally needs to add in the hot pepper, and the resulting soup tastes… yeah.
The more teams and individuals that can affect, influence and change a data pipeline, the more prone it is to getting messed up. If you have pipelines with many contributors, you’re going to want to be on top of what’s happening in them.
You’re using open-source data observability tools and it’s not enough
Open-source data observability tools like Prometheus, OpenTelemetry or OpsTrace are helpful for specific aspects of observability, but at this point, no one tool really gives you an end-to-end system. If you’re spending more time than you want connecting and managing all your data observability moving parts, it’s worth looking into a complete, end-to-end data observability platform that will connect easily with all your data systems and give you everything you need.
You develop in a cloud-native environment
Cloud native is wonderful for moving fast. Distributed computing and automated scaling give you flexibility and agility. But as any driver knows, the faster you’re going, the more time it takes to stop, and the more damage caused if you didn’t stop on time.
The faster data and changes in data products flow through your environment, the more you can accomplish… until something crashes. Then your speed is a liability. The way to keep your speed an asset is to make sure that the speed at which you can observe and respond to changes in your data environment is the same at which those changes occur.
Just sit back and observe
You want to deliver the best, most effective data products at speed and scale. You really do. But data consumer happiness is not the hill you want to die on.
Fortunately, comprehensive, machine learning-enabled, data monitoring and observability tools enable you to have your data observability, respond to data customer needs as soon as they occur (and even before they occur!) – and enjoy life, too.