Many years ago in households across America, the annual arrival of the Sears and Roebuck catalog in the mailbox was a major event. Families could browse its pages and, finances permitting, go to the local Sears store to buy what they wanted. They could also complete the attached paper order form, mail it to Sears with payment, then wait for delivery.
Now, store catalogs have moved online, but the concept is the same: The catalog is a categorized reference of goods or services that are available for purchase.
A data catalog is similar in that it provides consumers (typically developers, database analysts, BI professionals, and data scientists) with information about the various data assets available within a given organization.
The information a data catalog contains can include:
– Identity of the individual, group, or department that owns or administers the data asset
– Status of the asset
-Is it still in production use
-Is it a legacy data asset that is kept only for historical purposes?
– Technical information about the asset
– The lineage, or where the data in that asset comes from
-Transactional data from one or more applications
-Sensor data from industrial machinery
-Video data from security cameras, etc.
– Where the data asset resides
-Magnetic/optical media, etc.
How is a data catalog useful?
Perusing the data catalog is often the first step for data consumers that need to identify data resources for a given project or task, such as building a report or dashboard. A BI (business intelligence) team member might need to locate all data assets (database tables, flat files, or some other resource) that contain customer data to ensure all of a company’s customers are represented. Once the consumer has identified potentially relevant data assets, the process of delving into the details can begin.
Although data catalog tools are available, data catalogs are largely populated and maintained manually, because much of the useful information about a data asset is business-related information and not inherent properties of the asset. Therefore, organizational discipline is necessary to keep the data catalog up-to-date and accurate.