ANNOUNCEMENT: Octopai has reached Microsoft's Co-Sell Partner Status for Microsoft Azure Customers: Read More

Is your organization Octopied?

With effortless onboarding and no implementation costs, Octopai’s data intelligence platform gives you unprecedented visibility and trust into the most complex data environments.

How to automate metadata collection?

Automated metadata collection is built into many automated metadata management tools. What tool you choose and how you collect the metadata depends on what metadata management use you need it for.

Why would you want to collect and manage metadata?

Metadata keeps track of the who, what, where, when and why of your data. Data without efficient metadata management has been compared to a library without a card catalog. Even if you know exactly what data you’re looking for, it may take a while to locate it. And if you only have a vague idea of what data you want, then good luck. You’ll need it.

In contrast, when you’re on top of your metadata, you can use it to power data catalogs, data lineage, data governance and data search and discovery.

But the first step in metadata management is metadata collection: identifying and pulling out the metadata from your data systems.

What’s wrong with manual metadata collection?

In the olden days (read: the 1990s), we created and manipulated less data. We therefore had less metadata (which is made every time data assets are created or changed), and manual metadata collection and management was a realistic option. 

Now the average enterprise has terabytes upon terabytes (or even petabytes) of data, with more being created every second. Manual metadata collection and management has become a Sisyphean task. 

How to collect metadata effectively

1. Identify the intended purpose of your metadata collection and management. 

How do you want your metadata to help your enterprise? Making more accurate business decisions? Improving use of resources? Streamlining regulatory compliance? 

2. Identify the appropriate metadata management tool. 

If you want to streamline regulatory compliance, for example, data lineage and/or data governance tools would be appropriate. If your goal is empowering business users to use your enterprise’s data resources more effectively, a data catalog would be the tool of choice. 

3. Check if the tool will automatedly collect the metadata from your data systems.

Even if it can, how much instruction or direction does it need? Will it periodically run itself to update the existing metadata it already has – or do you need to manually tell it to run a collection/update?

4. Check how the tool automatedly collects metadata, and if that method is as comprehensive as you need.

Can it only identify metadata stored or represented in a certain way? Can it read and understand the algorithms used to process, transform and transmit the data, thereby inferring non-explicit metadata?

5. Is there a seamless transfer of collected metadata to its point of use?

Ideally, a data catalog that can harvest metadata by itself will be able to then use it to self-update, without any involvement on your part. The same goes for any other tool: does your automated data lineage update its visualizations? If a data governance system detects a significant change in metadata, will it automatically update the data owner or steward? 

Automated metadata collection is the first step toward efficient metadata management and effective data management.