Your CFO finally gave the okay to purchase data catalog software. Wa-hoo!
When you’ve cleaned up the confetti and the cocktail glasses, it’s time to get down to business. This is a major investment. How will you choose the best data catalog software for your company?
Lest the proliferation of data catalog features and options leave you groping for someone’s leftover margarita, here’s a guide with the questions to ask to cut through the overwhelm and reach clarity.
Does it support automatic harvesting from your other data/BI software?
If you’re in the market for a data catalog, you likely have at least one tool each for your ETL, database, analysis, and analytics/reporting functions. As your data catalog is supposed to serve as a window and access point for all those data assets, everything will run a lot smoother if your data catalog can automatically harvest and continuously update its data assets from your entire BI landscape.
If the data catalog in question does not have built-in support or ingestion of data assets for one or more of the tools in your BI stack, but you can still rig up a connection with external integration technology and a bunch of cable ties, that could work too, although it’s less than ideal.
If you’re going to be manually passing information from these systems to the data catalog (like exporting and importing spreadsheets – ACK!), it’s usually not worth the effort.
Check the data catalog’s ability for ease of integration with each tool in your BI environment.
The 3 Must-Haves for Every Data Catalog
Check out our latest eBookDownload Now
How simple is the data catalog to set up/maintain?
And while we’re on the topic of manually gathering, centralizing, and inserting information, we’ll tip you off that unless you have access to a whole team of indentured servants, you’re going to want a data catalog that can be constructed as automatically as possible.
With tens of thousands of data assets at a minimum – and upwards of a million in large organizations – manual labor is just not cost-effective or realistic without the catalog quickly becoming stale and irrelevant.
Automation is the name of the game.
An automated data catalog solution will automatically extract the data assets from across the different software in your BI landscape, including ETLs, databases, and reporting tools. An automated solution can collect, analyze, and organize – and then build or populate the data catalog by itself. Manual work is restricted to reviewing and enriching the catalog entries with missing descriptions, ratings, data owners, tags, etc. that have yet to be documented in any of the BI tools.
Automation simplifies both the initial creation of the data catalog, as well as the inevitable ongoing maintenance. Your data ecosystem constantly changes as your users create, manipulate or alter data assets. But relying on users to manually update the catalog for every change will leave your catalog perpetually out-of-date. Data catalog tools that utilize automation will routinely review all metadata within your BI landscape and update your data catalog accordingly.
Check to what extent the data catalog software is automated – both for initial catalog creation and for ongoing updates.
How user-friendly is the UI?
All users can benefit from a user-friendly UI – your business and self-service BI users need a catalog that is easy for them to use when locating assets and your data owners and stewards need something that is easy for them to navigate through and curate, and all data citizens need an easy and intuitive way to collaborate about the data (more about that ahead).
How user-friendly does it need to be?
Hint: if it looks like this, that’s bad.
Ideally the UI should be as intuitive to users as an online shopping site. If the data catalog is designed well, the ability to find, search, filter, curate and use the data asset of your dreams in the data catalog will be as easy as it is to search, filter, find and order the headphones of your dreams on Amazon.
Search and filter are the average user’s portal into the catalog, so how search results are displayed and marked up is particularly important.
Check the ease with which users can navigate and understand the UI. If you are technically inclined, find a non-techie business user to sit down with you and watch them navigate the UI.
How well does the data catalog help you actively evaluate and get answers about the data?
A powerful search function is a great start. But once you click through from the search results, do you have the information and resources you need to effectively use your company’s data assets? Possible resources to check for are:
What do other users have to say about this data asset? Is it reliable? Is it useful – and what is it useful for? Enterprise data catalog tools that provide rating and review features for catalog entries give users real-time, real-life insight into the available data assets.
Sensitivity and Status
Is this asset sensitive and has it been approved for use?
Roles and Responsibilities
Who is the data owner? Who is the data steward? Is there a subject matter expert?
Other relevant assets
What other assets are relevant to the asset I am looking at?
Where is the data coming from? What processes are involved in creating and delivering it? Where is the data asset being used?
In-catalog collaboration enables all users to communicate about any catalog entry right there IN the entry, with direct access to and involvement of the data owner, steward and subject matter expert. The communication is recorded and saved for future access, preserving tribal knowledge and saving valuable time in the future when a user has a previously asked question.
Check the data catalog software’s built-in information and collaboration resources. What information and capabilities appear when you view a data catalog entry?
How good is the catalog’s data lineage capability?
A data catalog can be a time-saving resource and one-stop shop for the data as long as there are data lineage capabilities that are integrated into the catalog. If, for example, a BI team member is questioned as to a data asset’s accuracy, the data catalog makes finding the asset simple, and then integrated data lineage makes access to the end-to-end lineage only one click away, providing complete visibility into the data’s flow through the BI landscape, and what happened to it along the way.
The quality and depth of the data lineage is paramount. At Octopai, for example, we integrated three different layers of data lineage into our platform:
- cross-system lineage, providing high-level visualization of data flow across the entire data landscape
- end-to-end column lineage, providing column-to-column-level lineage between systems from the entry point into the BI landscape, all the way through to reporting and analytics
- inner-system lineage, enabling you to dive deep into the logic of a report, ETL, or database object data flow
With immeasurably shorter time to get accurate answers and/or correct errors, integrated data lineage leads to more trust in the data and in BI as a whole.
Check the presence, quality and depth of the data catalog’s data lineage capabilities.
Does the catalog software match your multi-vendor, hybrid BI environment?
The average BI environment has grown more and more complex, with most companies using several vendors. Hybrid environments are common, with some of the BI tools located on-premises and others on the cloud. In addition, users may need to access the data catalog remotely.
Now is the time to make sure the data software you are considering can perform where, when and how you need it – both now and as your company grows and progresses.
Check that the data catalog software supports on-prem and on the cloud BI tools, and is securely accessible when and where you need it.
Does the data catalog software match your security or compliance needs?
GDPR. HIPAA. CCPA. FRTB. IFRS.
All those little letters make big demands of your organization and the way you handle data. Your data catalog, as the central hub around which your data activities will be focused, should make compliance easier, not harder.
If you work with security or compliance requirements, the data catalog should have built-in capabilities for data tagging, sensitive data indications and approval statuses.
Check the compliance, privacy and governance features provided by the data catalog software and if they match up with your needs in managing sensitive data.
How good is the support?
Learning how to use your data catalog software is not like learning how to use a can opener. There are going to be questions. There are going to be frustrations. There are going to be times when you scratch your head and say, “How do I DO that?”
You need somebody to lean on.
If you are seriously considering a data catalog tool, spend the time to act like a confused user. Come up with a reasonable question or difficulty and try to get help from their customer support staff. Keep track of how long it took to get your question answered or issue resolved, and the overall feel of the support experience.
Check how smooth and helpful it is to get answers to your questions using the catalog software’s support.
The Numbers Game
So you’re done evaluating how the data catalog software could contribute to your organization.
Now it’s time to run some numbers and evaluate if it will contribute.
Total cost of ownership (TCO)
How much will you pay over your term of use for the privilege of owning this data catalog software? Figures and points to take into account:
- Onboarding costs
- Ongoing costs (are the costs straightforward? Is the cost based on quantity of assets, usage, users or another pricing model? Are these likely to change over time?
- Employee hours (especially for initial setup, if you’ll be handling that in-house)
- Integrator fee (if you’ll be using an integrator)
Return on investment (ROI)
How much will the data catalog enhance the company’s finances? Figures and points to take into account:
- How much time will it save your employees in finding and using data?
- Will greater data visibility and accessibility contribute to BI insights that will improve your bottom line – and by how much?
- Will the data catalog help your data citizens maximize the value of your enterprise’s data by increasing its use and increasing both output and accuracy- how would this affect your bottom line?
- Will the data catalog enable/enhance company-wide self-service BI and how much money would that save your company?
- Will it enable data science projects by creating independence amongst data scientists that will be self-sufficient in locating and understanding the data, and what is the financial potential of those projects?
- Will it increase the ratio of business decisions based on accurate and trustworthy data because it’s simply more available and transparent to business, and how much money could that save your company (e.g. bidding on the correct ads or improving customer recommendations)?
- How much time will it save by preserving tribal knowledge and shortening collaboration cycles?
- How much time will the traceability save data citizens that need to understand where the data came from, what happened to it and where it is being used?
Making Smart Choices
We hope that this detailed guide can smooth the path to selecting the right data catalog software for your business intelligence.
And if you’re still stressed, drop the cocktails and try some meditation.