The Data Catalog is the talk of the town. Everyone wants one. But what is a data catalog and do you actually need one? Moreover, if you do need one, how can you get it up and running quickly and accurately?
The M&A Use Case for a Data Catalog
Imagine you work for a company—let’s call it Perfect World. In Perfect World, everything is perfect, even its BI environment: There are no data silos, all systems integrate with one another, the data is clean throughout, and employees and management can rely on a single, comprehensive source of truth for all their decisions. (Yes, it’s far-fetched…just roll with it.)
Over time, Perfect World enjoys considerable success and amasses a large cash position. One morning the CEO wakes up and decides it’s time to acquire a larger competitor, Real World.
Sadly, Real World does not sport the same perfect BI environment that Perfect World has. Real World has multiple, overlapping systems (accumulated through acquisitions of its own), systems that aren’t integrated, and it’s really hard to find one’s way around as the metadata – the description of the data – varies per system. Depending on which tool you’re in, you’ll find customer identification fields called customer ID, cust ID, C_ID, cust_number and more. And it’s the same with phone number fields – you’ve got phone, p.number, tel, telno. etc. When your metadata looks like this, it really is close to impossible to find and understand your data. Does this sound familiar?
Suddenly, Perfect World’s perfect BI landscape is turned on its head.
So now what?
In the not-so-distant past, Perfect World’s strategy would likely have been to attempt to manually move all of the data from Real World’s BI environment over to Perfect World’s. This would have required months, if not years, and millions of dollars, and might have caused the combined companies to lose focus on actually operating the business while waiting for the data environment to be perfect again. Meanwhile, there might be another acquisition, starting the cycle of trouble all over again. Egads.
Today there are tools available that render much of that mind-numbing, tedious, frustrating manual work unnecessary. Among the latest tools that can help is the data catalog.
What Is a Data Catalog?
As the name implies, a data catalog is a store of information about an organization’s data assets. It gives users a mechanism for identifying disparate data sources that can be used together to extract actionable intelligence. TDWI’s Dave Stodder writes that “an up-to-date, comprehensive data catalog can make it easier for users to collaborate on data because it offers agreed-upon data definitions they can use to organize related data and build analytics models.”
Although not a new concept, data catalogs have become practical only lately, thanks to advanced technologies such as machine learning. As a result, data catalogs are becoming increasingly popular, especially in large organizations with many different data assets.
There’s a catch, however. A big one.
The data catalog catch
What’s the catch? Well, it takes time and effort to construct a data catalog. A lot of time and effort. The process of tracking down and analyzing one’s metadata in order to construct the data catalog can take weeks or months, depending on the number, size, and variety of data assets. Why? Because most organizations are managing their metadata using manual methods and silo-based tools, which tend to just complicate the process. What happens when you throw metadata management automation into the mix? A lot.
Metadata management automation is a must for data catalogs
Among the most important tasks when building a data catalog is compiling the metadata from all your data assets. This is important because it forms the basis for understanding what you have and the relationships between different systems. Gathering the metadata is no easy task, especially from large, complex systems such as SAP, which can have thousands of tables and tens of thousands of columns. An automated tool for discovering and managing metadata is essential to building a good data catalog.
Data catalog companies often reach out to Octopai for partnership, because they know our technology enables their customers to build data catalogs much more quickly, accurately, and easily. They know that Octopai is in effect the ultimate data catalog enabler. Without metadata management automation to help discover and analyze metadata from across systems, the actual implementation of a data catalog is complicated, painful and sometimes impossible. When embarking on a data catalog project, automated metadata management is an absolute must. Attempting to set up a data catalog without it is a recipe for months and months of manual data mapping that could indeed lead nowhere.