Why and how to enable DataOps in an organization?

It can be a daunting task to drive a DataOps initiative forward in an organization. By understanding it's implications, you will increase your odds to succeed.

When talking with my colleague about the introductory post to the blog series I was asked if we can already jump into the technological side of DataOps in the second part. Unfortunately not. Technology is an important part of the phenomenon, but the soft side is even more important.

Defining the undefined

There still are no official standards or frameworks regarding DataOps. So how can we then even talk about enabling DataOps in an organization if we only understand it on a high level? To start get things started, we have to break up the concept.

The DataOps Manifesto [1] that DataKitchen has brewed together does a good job describing the principles that are part of DataOps. However, it is somewhat limited to analytics/machine learning point of view. Modern data platforms can be much more than just that. Gartner [2] has the same spirit in their definition but it is not as tightly scoped to analytics. The focus is more in thinking DataOps as a tool for organizational change regarding the data domain. CAMS-model which was coined by Damon Edwards and John Willis [3] for describing DevOps does work fine also with DataOps. CAMS stands for Culture, Automation, Measurement and Sharing. As you can see, automation is only one of the four elements. Today we will dive into the cultural aspect.

DataOps culture everywhere

How to build DataOps culture? One does not simply build culture. Culture is a puzzle that is put together piece-by-piece. Culture will savour your beloved DataOps initiative as breakfast if you try to drive it forward as a technological project. You can’t directly change culture. But you can change behavior, and behavior becomes culture [3].

Let’s take an example of what the implications of cultural aspects can be. I like the “Make it work, make it fast, make it last” mentality which prioritizes delivering value fast and making things last once business already benefits from the solution. The problem is that culture seldom supports the last of the three.

Once value has been delivered, no one prioritizes the work related to making lasting solutions as it does not produce imminent business value.

By skipping the last part, you slowly add technical dept which means that larger part of development time goes to unplanned work instead of producing new features to the business. The term death spiral (popularized in the book Phoenix Project [4]) describes the phenomenon well.

改善

Important part of DataOps is that the organization makes a collective commitment to high quality. By compromising this the maintenance cost of your data platform slowly starts to rise and new development will also get slower. Related to this we also need some Kaizen mentality. Kaizen (kai/改=change, zen/善=good) means continuous improvement that involves everyone. In the data development context this means that we continuously try to find inefficiencies in our development processes and also prioritize the work that removes that waste. But can you delimit the effects there? Not really. This can affect all stakeholders that are involved with your data, meaning you should understand your data value streams in order to control the change.

As Gartner [2] states “focus and benefit of DataOps is as a lever for organizational change, to steer behaviour and enable agility”. DataOps could be utilized as a tool for organizational transformation. Typical endgame goals for DataOps are faster lead time from idea to business value, reduced total cost of ownership and empowered developers and users.

Faster time to value

This usually is the main driver for DataOps. The time from idea to business value is crucial for an organization to flourish. Lead time reduction comes from faster development process and less waiting between different phases but also from the fact that building and making releases in smaller fragments makes it possible to take the solutions in use gradually. Agile methodology and lean thinking play big part in this and the technology is there to support.

If your data development cycle is too slow it tends to lead to shadow IT meaning each business will build their own solution as they feel they have no other choice. Long development cycles also mean that you will build solutions no one uses. Faster you get feedback better you can steer your development and build a solution the customer needs instead of what the initial request was (let’s face it, usually at the beginning you have no clue about all the details needed to get the solution done).

All in all faster time to value should be quite universal goal because of its positive effects on the business.

Reduced Total Cost of Ownership (TCO)

Reduced TCO is a consequence of many drivers. The hypothesize is that the quality of solutions is better resulting in less error fixing, faster recovery times and less unplanned work in general.

Many cloud data solutions have started small and gradually grown larger and larger. By the time you realize that you might need some sort of governance and practices the environment can already be challenging to manage. By utilizing DataOps you can make the environment a lot more manageable, secure and easier to develop to.

Empowered developers and users

One often overlooked factor is that how does DataOps affects the people that are responsible for building the solutions. Less manual work and more automation means that developers can focus more on the business problems and less on doing monkey work. This can lead to more sensible content of work. But at the same time it can lead to skill caps that can be agonizing for the individual and also a challenge for the organization on how to organize the work. Development work can actually also feel more stressful as there will be less waiting (for loads to complete, etc.) and more of the actual concentrated development.

Some definitions of DataOps [5] emphasize collaborational and communicational side of DataOps. Better collaboration builds trust towards the data and between different stakeholders. Faster development cycles can in part bring developers and users closer to each other and engage the user to take part in the development itself. This can raise enthusiasm among end users and break the disbelief that data development can’t support business processes fast enough.

Skills and know-how

One major reason DataOps is hard to approach is that doing it technically requires remarkably different skillset than traditional ETL-/Data Warehouse -development. You still need to model your data (There seems to be a common misconception that you just put all your data to a data lake and then you utilise it. Believe me database management systems (DBMS) were not invented by accident and there really is a need for curated data models etc. But this is a different story.).

You also need to understand the business logic behind your data. This remains to be the trickiest part of data integrations as it requires a human to interpret and integrate the logic.

So on higher level you are still doing the same things, integrating and modelling your data. But the technologies and development methods used are different.

Back in the day as a ETL/DW-developer you could have done almost all of your work with one GUI-oriented tool be it Informatica, SSIS or Data Stage for example. This changes in the DataOps world. As a developer you should know cloud ecosystems and their components, be able to code (Python, NodeJS and C# are a good start), understand serverless and its implications, be committed to continuous integration and things it requires.

And the list goes on. It’s overwhelming! Well it can be if you try to convert your developers to DataOps without help. There are ways to make the change easier by using automation and prebuilt modular components, but I still tease you a bit on this one and come to the solutions later as this is a big and important subject.

Yesterday’s news

One could argue that data engineers and organizations have been doing this DataOps stuff for years but at least from what I have seen the emphasis has been on the data development side and the operations part has been an afterthought.

This has led to “data platforms” that are technologically state of the art but when you show the environment to an experienced data professional she is horrified. Development has been done to produce quick business value, but the perceived value is rigged. Value will start to fade as the data platform is a burden to maintain and point solutions created all live their own lives instead of producing cumulative value on top of each other.

Success factors for enabling DataOps

In the end I would like to share a few pointers on how to improve your chances in succeeding if you dare to embark on your DataOps journey. Unfortunately, by blindly adopting “best practices” you can fall a victim of cargo cult meaning that you try to adopt practices that do not work in your organization. But still there are some universal things that can help you in making DataOps run in your organization.

Start small and show results early

You need to build trust towards what you are building. What has worked in organisations is that you utilise vertical slicing (building a narrow end-to-end solution) and delivering value as soon as possible. This can prove that new ways of working bring the promised benefits and you’ll get mandate to go forward with your initiative.

Support from senior executives

Oh, the classic. Even if it might sound a bit clichéd you still can’t get away from it. You will need a high-level sponsor in order to improve your odds to succeed. Organizational changes are built bottom-up, but it will even your way if you get support from someone high up. As DataOps is not only a technological initiative you need to break a few established processes along the way. People will question you and by having someone backing you up can help you great deal!

Build cross-functional teams

If you don’t have a team full of unicorns, you will be better off mixing different competences in your development team and not framing their roles too tightly. Also mix your own people with consultants that have deep knowledge in certain fields. You need expertise in data development and also in operating the cloud. The key is to enable close collaboration so that people learn from each other and the flow of information is seamless.

But remember the most important thing! Without it you will not succeed. The thing is to actually start and commit to the change. Like all initiatives that affect people’s daily routines, this will also be protested against. Show the value, engage people and you will prevail!

REFERENCES:

[1] The DataOps Manifesto. http://dataopsmanifesto.org/dataops-manifesto.html
[2] Nick Heudecker, Ted Friedman, Alan Dayley. Innovation Insight for DataOps. 2018. https://www.gartner.com/document/3896766
[3] John Willis. DevOps Culture (Part 1). 2012. https://itrevolution.com/devops-culture-part-1/
[4] Gene Kim, Kevin Behr, George Spafford. The Phoenix Project: A Novel about IT, DevOps, and Helping Your Business Win. 2013. IT Revolution Press.
[5] Andy Palmer. From DevOps to DataOps. 2015. https://www.tamr.com/from-devops-to-dataops-by-andy-palmer/