DataOps: Adjusting DevOps for Analytics Product Development

What is DataOps: a brief introduction

How DataOps relates to Agile, DevOps, and MLOps.

DataOps vs DevOps

The table shows how DevOps helps tech giants accelerate deployment of updates with no loss in quality. Source : The Phoenix Project

DataOps vs MLOps

What MLOps has in common with DataOps.

Shared Ops principles

  • cross-functional collaboration,
  • shared responsibility for outcomes,
  • component reusability,
  • pervasive automation,
  • observability or the ability to track and measure results,
  • learning from mistakes, and
  • continuous improvement through multiple iterations.

DataOps process structure

Data analytics pipeline exists within a CI/CD framework.

Data analytics pipeline: key stages

CI /CD for data operations

DataOps people

Data stakeholders uniting around business requirements.
  • data managers — data architects, data engineers, ETL developers, and data stewards;
  • data consumers — data analysts, data scientists, dashboard developers, BI teams, machine learning engineers and others who use data to deliver results via visualizations, APIs, ML models, applications or other mediums; and
  • a DevOps team — software developers and IT operations professionals.

Best practices to support DataOps

Treat data as code

Create a data catalog

Consider ELT

  • ELT is faster and cheaper to run.
  • ELT creates rich and easy-to-access pools of historical data, with no details missed during transformations. Businesses can use it anytime for analysis and generating BI.
  • With ETL, transformations are owned by data engineers. In ELT, changes happen in the warehouse, where data analysts can also contribute to transformations writing them in SQL.
  • You can design new data models multiple times without revising the entire pipeline.
ETL and ELT approaches to moving data.

Build Directed Acyclic Graphs (DAG) for orchestration

Visualizing of DAG dependencies in Apache Airflow.

Technologies to run DataOps

Data pipeline tools

  • Piperr, a suite of pre-built pipelines to run data integration, cleaning, and enrichment;
  • Genie, an open-source engine designed by Netflix to orchestrate Big Data jobs;
  • Apache Oozie, a workflow scheduler for Hadoop jobs;
  • Prefect, a platform for data flow automation and monitoring;
  • Pachyderm for data version control and data lineage; and
  • dbt (data built tool), a development environment to write and execute data transformation jobs in SQL. In essence, dbt is the T in the ELT (extract, load, transform) process.
Here’s how dbt fits into data workflows. Source: Medium

Automated testing and monitoring tools

  • initial data testing,
  • data structures testing (validating database objects, tables, columns, data types, etc.),
  • ETL testing (if a company opts for ETL),
  • integration testing verifying that all pipeline components work well together, and
  • BI / Report testing.
ETL testing with iCEDQ.

DataOps platforms

StreamSet DataOps platform structure.

DataOps implementation tips

Choose the right time

Make sure your staff has core competencies

Keep in mind that success may not come quickly

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
AltexSoft Inc

AltexSoft Inc

2.7K Followers

Being a Technology & Solution Consulting company, AltexSoft co-builds technology products to help companies accelerate growth.