ANNOUNCEMENT: Octopai has reached Microsoft's Co-Sell Partner Status for Microsoft Azure Customers: Read More

A Close Look at Data Mapping Automation Using Machine Learning Approaches

A Close Look at Data Mapping Automation Using Machine Learning Approaches

A lone datum stands at the side of an information highway, its satchel slung over its back and its thumb in the air. 


A passing truck rumbles to a stop and the trucker motions for the datum to get in. “Where are you heading?” the trucker asks, once he starts moving again.


“Brazil,” comes the datum’s answer.


The trucker lets out a low whistle. “That’s a good way away.”


A few minutes go by, the trucker shooting a few surreptitious glances over at his hitchhiker. “You got Brazilian citizenship?” he finally asks.


The datum shakes its head. “No.” 


The trucker raises an eyebrow. “You got a passport, then, I betcha.”


“No.”


“You speak Portuguese?”


“Nope.”


“Do you even have a map?!” the trucker nearly explodes.


The datum furrows its brow. “Why would I need any of those things? If I know where I want to go, I’ll get there eventually, right?”

Daimler Truck GIF - Find & Share on GIPHY


Culture shock

If you’re planning on taking a trip that crosses international borders, you would be wise to plan for inevitable differences, like:


Language

At best, your bright and cheery “good morning!” will be met with a blank stare – because your conversation partner has no English and therefore no way of processing what you just said. 

Worse is if your “good morning” sounds like something else in the local language, and your addressee assumes he did process what you said. (Childhood memories of the Sesame Street film “Big Bird in Japan” spring to mind, where Big Bird is wished “good morning” (“ohayō” in Japanese) by numerous passing locals, and comments, “Wow, there sure are a lot of people here from Ohio!”)


Currency

If you see an item priced at “99.90” and whip out a hundred-dollar bill to pay for it, you’re out of luck if you’re in Kuwait, where the Kuwaiti dinar is worth about 3 US dollars (or more). What monetary standard are you coming from? What monetary standard is accepted where you currently are? How do you bridge the gap between the two?    


Etiquette

Had a great restaurant experience and want to show the waiter your appreciation? Tipping would be a great gesture… except where it isn’t. In Japan and some other East Asian countries, it’s insulting to tip wait staff. That American etiquette standard would not cross the border and be interpreted successfully.  

Insulting Parker Young GIF by CBS - Find & Share on GIPHY


What’s that got to do with data mapping?

Enterprise data crosses borders like nobody’s business. From source system to ETL to database to analysis to reporting – the journey is long, the border crossings are many, and standards may change at each crossing.


What’s a globetrotting datum to do?


Don’t leave home without your data map

Data mapping lays out the paths of cross-border data transfers, along with any information or instructions needed for the data to transfer successfully. 


For any given data movement from source to target, the data map would include:

  • Attributes of the data elements in the source system and which attributes they correspond to in the destination system (e.g. “First Name” in source system maps to “F Nm” in destination system)
  • Differences between the source schema and the destination schema and what transformation rule to apply (e.g. state names are encoded in the source system by their full names – “New York” – and in the destination system by their two-letter code – “NY”)
  • Business and validation rules for the destination system (e.g. destination fields may contain no more than 10 characters)
  • Frequency of transfer for data integration cases (e.g. transfer data from source to target every 12 hours)


If you’re aiming for uninterrupted data flow and accurate data, thorough data mapping is a critical piece of the puzzle.


Where data mapping automation enters the picture

How long would it take you to construct an instructional map that could direct commuter flow through a subway system during an average morning rush hour? 


It depends.


If we’re talking about the Catania Metro of Sicily, with one line and about 10 stations, you might be able to manage with a few people standing at the entrances and exits to all the stations counting heads and polling people as to their routes. 


If, on the other hand, we’re discussing the New York City Subway system, with 25 lines, 424 stations and several million daily riders, you’d be counting heads and routes from today until next decade (at which point, of course, your map will be long outdated). Manual mapping doesn’t scale for huge systems with constantly moving parts.

L Train Subway GIF - Find & Share on GIPHY


When it comes to mapping large, dynamic systems, automation is the name of the game.


And enterprise data environments definitely qualify as large, dynamic systems.


The different faces of automated data mapping

Automated data mapping tools come in several varieties:

  • Data mapping software that provides a drag-and-drop interface with reusable parts, enabling data engineers or citizen integrators to speed up their process of matching fields and coding transformations
  • Automated tools that independently suggest data mapping using machine learning techniques, leaving it to the data engineer/citizen integrator to review and approve or correct


Even within the category of tools that use machine learning and artificial intelligence, there are a range of approaches:

  • ML/AI data mapping tools that “learn” from pre-existing data maps you provide, and apply the conclusions to new datasets and pathways
  • ML/AI data mapping tools that analyze the connections between fields/columns to draw conclusions about connections to other fields
  • ML/AI data mapping tools that analyze the content of fields/columns to draw conclusions about attributes or connections to other fields


Each approach or model has situations in which it works better than the alternatives. When considering software to automate data mapping, identify its approach and keep the strong and weak areas of that approach in mind as you evaluate your data environment and its data mapping needs.


So what is the difference between data mapping and data lineage?

You’ve probably noticed that we talk a lot about data lineage on this blog. And so, you ask, what exactly is the difference between data mapping and data lineage?


Good you asked.


In a nutshell:


Data mapping tells the data where to go.


Data lineage tells you where the data is going (or has gone).


The actual contents of a data map when visualized and a data lineage visualization may be very similar. They will both show you the source system, the destination system and what transformations happened to the data along the way. But the data map is meant for the hitchhiking datum, and data lineage is meant for you, as you try to track the hitchhiking datum on its journey.

Pick Up Stop GIF by Arrow Video - Find & Share on GIPHY


Like data mapping, in order for data lineage tools to deliver effective results for modern enterprise data systems, they usually require automation


Let’s go, data

Smooth cross-border data transfers. Uninterrupted data flow. Preservation of data accuracy. 


Data mapping automation is a key piece in helping your enterprise data get where it needs to go, without any… hitches.

Is your organization Octopied?

With effortless onboarding and no implementation costs, Octopai’s data intelligence platform gives you unprecedented visibility and trust into the most complex data environments.

Categories