What is DataOps? Principles and Benefits

What is DataOps

Data analytics ain’t what it used to be. 


As a data analyst, you’re no longer just providing data analytics services. You’re providing data analytics products


It may not sound like such a big difference, but that switch affects your users’ expectations – and, therefore, what makes your data analytics team a productivity and profitability success. 


Let’s take a look at the average person’s expectations about paying for products:


You order a car direct from the manufacturer. When you came to pick it up, the seller says, “So, it actually took longer than expected to build this car. Something went wrong in the construction of the engine and we had to take it apart and put it back together. That took two of my workers another five hours each. So we’ll have to charge you a bit more.”

Come On What GIF by The Late Late Show with James Corden - Find & Share on GIPHY


If you don’t just gape at him with your mouth hanging open – or punch him in the nose – then you would probably say something like, “I’m not paying you for the time it takes you to build me a car! I’m paying for the car!”   


A successful product manufacturer is one who can increase the number of products he manufactures in a given time while ensuring (and ideally, improving) the quality his users expect.


Today, your business users have the same perspective on data analytics. Your dashboards, charts, visualizations… they’re all products. 


A successful data analytics team is one that can increase the quantity of data analytics products they develop in a given time while ensuring (and ideally, improving) the level of data quality.


Enter DataOps.


What is DataOps?

DataOps is an approach to data management that increases the quantity of data analytics products a data team can develop and deploy in a given time while drastically improving the level of data quality.


Common elements of DataOps strategies include:

  • Collaboration between data managers, developers and consumers
  • A development environment conducive to experimentation
  • Rapid deployment and iteration
  • Automated testing
  • Very low error rates


The term “DataOps” was coined by Lenny Leibman in 2014, both on his own blog and in a well-publicized (but no longer extant) article on the IBM Big Data & Analytics Hub. But the approaches and principles that form the basis of DataOps have been around for decades.


What are the principles and benefits of DataOps?

The three main approaches that form the basis of DataOps are:

  • Lean manufacturing
  • Agile development
  • DevOps


Let’s take a look at them one by one and see what they contribute to our understanding of DataOps principles and practical benefits.


Lean manufacturing

Created by Toyota (who calls it “the Toyota Production System”), a lean manufacturing system quickly and efficiently produces products of sound quality – in Toyota’s case, cars. How do they do it – and what’s the application to data analytics products?


Lean manufacturing principle #1: jidoka. For those of us who don’t speak Japanese, it’s loosely translated as “automation with a human touch.” Any automated part of the manufacturing process is designed so that it can detect errors or malfunctions – and immediately halt production. (The application of jidoka goes back to 1896, when Sakichi Toyoda equipped Toyota’s Power Loom with a weft-breakage automatic stopping device.) Through jidoka, quality problems are stopped in their tracks and prevented from reaching the consumer. 

Zoom In Kung Fu GIF by BrownSugarApp - Find & Share on GIPHY


When it comes to data analytics products, you really, really don’t want them to be populated by bad data. DataOps automation prevents that by using automated tests and statistical process control on your data pipelines. Issue detected? Halt! The problematic data is prevented from feeding your data analytics products, ensuring a very low error rate. 


Lean manufacturing principle #2: “Just-in-Time.” Each manufacturing process produces only what is needed for the next process in a continuous flow. Otherwise, you end up with wastes of time, materials, manpower and machinery. “Just-in-Time” manufacturing increases production while optimizing resources.


“Just-in-Time”’s application to a DataOps pipeline is well expressed by Lenny Leibman: your infrastructure needs to match your workload. 


Agile development

The Agile development methodology says: if you split a development process into small, discrete chunks, you decrease the chances that the process will get stuck or go widely off course. This means that more focused development can happen faster. 

You Got This Jimmy Fallon GIF by The Tonight Show Starring Jimmy Fallon - Find & Share on GIPHY


DataOps applies the Agile approach in its construction of supportive, safe test environments for data analytics product innovation. Strong collaboration and communication between data analyst teams and data consumers facilitate rapid iteration and deployment of better analytics products sooner.


DevOps

“Development” is the planning and production of a product or system. “Operations” is making sure that what was produced works as planned. 


The DevOps approach says: don’t wait until the last minute to check if everything works. Make the Operations (i.e. is it going to work right?) into an integral part of the Development process.

Season 3 Ian GIF by Outlander - Find & Share on GIPHY


So what is DataOps vs. DevOps?


DataOps applies the same approach to data analytics product development. Data is automatically prepared, checked and cleaned before it populates your analytics products, and often at even earlier stages of processing. Self-serve development sandboxes give your data analysts a safe environment in which to experiment – and then to test their experiments before sending them on as ideas or builds. Instead of an impact review board, hundreds of automated verification tests can be performed for each build. 


DataOps-directed environments also often enable and promote the turning of processes into reusable components available to everyone involved. This push toward standardization increases the chances that someone’s new idea will work well with everyone else’s new (and old) ideas. It also means that your development and operations won’t be up a creek if an important staff member leaves, because their tribal knowledge has been taken out of their head and preserved in your data systems.


Who leads DataOps?

Data engineers are the most likely heads of your DataOps implementation, execution and management. In the role of DataOps Engineer, they are responsible for:

  • Automation of data workflows, processes and tools
  • Aligning the different development environments
  • Creating collaborative processes
  • Making sure hardware, software and data resources are always available when needed


When a DataOps framework is implemented effectively, one data engineer can easily support many data analysts – and the productivity of your enterprise data operations goes up exponentially.


Is it time for DataOps?

Do your business users expect rapid responsiveness to their requests? Are they impatient with your team’s pace and perceived productivity levels?

Waiting GIF - Find & Share on GIPHY


Do errors creep into your data pipelines, processes and finished products? Does your team spend a significant amount of its time checking and cleaning data – or tracking down the source of errors?


Is your team brimming with great ideas for dashboards, visualizations and other insight-laden data analytics products? Is the percentage of those ideas that actually make it into development – and then into deployment – frustratingly small?


It may be time for a change. A lean, agile, operational change. A change that will increase both the quantity and the quality of the insight you provide to your users. 


DataOps, here you come!

Octopai's Data Lineage is Key to Successful DataOps
Want to understand why and how?
Schedule a Demo

Is your organization Octopied?

With effortless onboarding and no implementation costs, Octopai’s data intelligence platform gives you unprecedented visibility and trust into the most complex data environments.