We’ve all been there, right?
Choosing which flavor of ice cream to get.
Picking which outfit to wear to tonight’s party.
Deciding which show to binge-watch.
And, of course, selecting which data catalog software to invest in. The array of available options – and of features within those options – can make your head spin.
While we can’t relieve your analysis paralysis when it comes to Netflix, frozen desserts or your wardrobe, we’d love to help you out when it comes to data catalog analysis.
In our opinion and experience (and we’ve got quite a bit under our belt), the following are three of the most important features of data catalog software – and why.
1) Automation (self-creating and self-updating)
The average company has over 100K data assets. If you were to try to harvest all that metadata manually, you would need to live on espresso (or maybe Red Bull) for the next few years.
Fortunately, just as no one harvests a 100K acre farm by hand, no one needs to harvest metadata for an enterprise data catalog by hand.
An automated data catalog solution will automatically harvest data assets from across the different tools in your BI environment, including ETLs, databases, and reporting tools. An automated solution can collect, analyze, tag, and organize – and then build and populate the data catalog by itself. Manual work is restricted to reviewing and enriching the catalog entries with missing descriptions, ratings, data owners, tags, etc.
Automation for the initial build is important, but data catalog automation for ongoing maintenance is even more so. Even the Oxford English Dictionary adds and changes hundreds of word entries every three months. And we’re betting your data environment changes much faster than the English language.
If you would need employees to remember to manually update the data catalog every time they created, manipulated, or altered data assets, your catalog would be outdated from Day 1. (Okay, maybe Day 2.) An automated data catalog that periodically self-updates keeps your catalog relevant, accurate and up-to-date.
If you have enough data assets to make the benefits of a data catalog necessary, then automation is a must for building that catalog. And the more assets you have, the more advanced automation capabilities become necessary to streamline the process and save your data team’s time and skills.
Create Company-Wide Consistency
with a self-creating, self-updating data catalog with built-in data lineage and collaborationLearn More
2) Built-in Collaboration
Ever had the unenviable experience of having a question about a data asset – and finding out that seemingly the only person who knows the answer is no longer with the company and hasn’t left a forwarding address?
Hello, brick wall.
To avoid head-on collisions with opaque data assets, a data catalog tool should have built-in collaborative features. In-catalog collaboration enables all users to communicate about any catalog entry right there IN the entry, with direct access to and involvement of the data owner, steward and subject matter expert.
Not only does this in-catalog collaboration make it crystal clear who to ask, eliminating time-wasting “who in the company knows this” searches, but the communication is recorded and saved for all future users, eliminating the need to ask the same questions over and over and over… With these collaborative features, tribal knowledge is simply and clearly preserved.
So even if the original data owner moves to the South Pole, you don’t need to mount an Antarctic exploration to get your mission-critical question answered.
Aside from valuable access to “authoritative” information on the data asset in question, collaboration tools also give you equally valuable “crowdsourced” information, such as ratings, and general literacy about the data asset. This user-generated information helps you to actively evaluate the relevance of the data asset for your prospective use.
3) Integrated Data Lineage
An ideal data catalog serves as the hub for an organization’s data activities.
If a business user wants to find a report on an aspect of the business, the data catalog could provide him with all relevant reports in short order.
If a self-service BI user is searching for the right data for a report she is creating, the data catalog could show her all relevant data assets and give her the ratings, reviews and tribal knowledge necessary to choose the best assets for the job.
If a BI team member is questioned as to a data asset’s accuracy, the data catalog can be a one-stop solution, if data lineage capabilities are integrated into the catalog. Finding the asset in question is simple, and then all it would take is one click to gain access to the end-to-end lineage, providing complete visibility into what happened to the data as it flowed through the BI landscape. The time it takes to get accurate answers and/or correct errors is immeasurably shortened as well as understanding where the data originated and where it is being used, thus building trust in the data and BI as a whole.
The Power of Data
We’re hoping you’re feeling less paralyzed and more empowered when it comes to selecting a data catalog platform.
And on second thought, maybe we CAN help you out when it comes to your other areas of analysis paralysis.
Just search in your data catalog tool for a “most popular ice cream flavors” data asset and run a predictive, bivariate analysis with whipped cream and a cherry on top.
See? Data will set you free.