Zero Outage and Agile Development

Motivation

Whenever I introduce ZOIS to customers and colleagues alike, I get questions how this relates to other standards, frameworks and methodologies, and Agile is always high on the list, especially since many people seem to think that the concepts may contradict each other.

Agile development has become the de-facto standard for software development, and software will always be a critical component of any Zero Outage service, hence it is mandatory to understand the touchpoints and requirements.

Terminology clarification

Before digging into the subject, it always helps to first get some common grounding on terms:

  • ZOIS: the guarantee of zero business outage, of which software is one component. Resilience can be provided with different means at different levels of the service architecture.
  • Agile software development describes a set of principles for software development under which requirements and solutions evolve through the collaborative effort of self-organizing cross-functional teams. It advocates adaptive planning, evolutionary development, early delivery, and continuous improvement, and it encourages rapid and flexible response to change.
  • DevOps (a clipped compound of “software DEVelopment” and “information technology OPerationS”) is a term used to refer to a set of practices that emphasize the collaboration and communication of both software developers and information technology (IT) professionals while automating the process of software delivery and infrastructure changes. It aims at establishing a culture and environment where building, testing, and releasing software can happen rapidly, frequently, and more reliably.

Simplifying the definition, on the one hand we’re striving for the highest level of quality in service delivery, of which software is a key component providing functional business value. On the other hand we’re developing software iteratively as fast as possible.

Is there an inherent problem?

I personally believe that there is no problem, on the contrary, I believe and want to articulate that the principles of agile and ZOIS nicely complement each other. Especially the research in DevOps (e.g. the DevOps handbook by Gene Kim and others), which scientifically demonstrates that when applied correctly, DevOps can optimise time, cost and quality.

However, anecdotal feedback also reports the contrary, that agile development often compromises quality, which contradicts the principles of Zero Outage and requires consideration.

Admittedly, I have not done a statistically significant analysis, however, I had the chance to discuss agile experiences with many customers through the IT4IT work. The alignment of IT4ITTM and agile is a very similar question. It should be noted that the audience were primarily IT and software architects in large, global enterprises. These discussions revealed a couple of very interesting experiences:

  • Agile software development, as in Scrum and the related R&D toolchain, typically demonstrate a high level of maturity with a fairly low number of defects on the functional value side.
  • Also the level of collaboration within R&D, between team members, even between teams seems very mature.
  • This is not necessarily true, though, for collaborating across the functions, in particular strategy and product management. In a number of cases agile has not led to faster time to market, even though R&D runs like a well-oiled machine. In these cases, planning was not well integrated, agile execution and waterfall planning don’t integrate well. Actually SAFe tackles that problem of maturing portfolio and product management and aligns well with IT4IT.
  • Furthermore, collaborating across organizations, like business and IT, often does not work well either. Examples show that agile innovation in the business delivers good and fast new customer value (e.g. financial app on Smartphone) but does neglect the proper integration of core IT value (e.g. legacy customer database) leading to critical issues (e.g. security and scaling).
  • Consequently quality issues typically occur around non-functional areas, like availability, performance and security, which are critical for Zero Outage.

Do these experiences reveal an inherent problem? I think not, but they point to an obvious lack of professionalism in articulating non-functional requirements and translating into appropriate and sustainable architectural consequences.

We should not forget that there are great success examples of using DevOps to get innovations out fast and well built, primarily with start-ups and products that are fairly stand-alone in nature. But it seems that the industry was not able yet to extrapolate this to the mainstream enterprise IT, therefore I think that both ZOIS and IT4IT can potentially provide value.

Building the right thing right

To recap, it looks like there are two major root causes driving the lack of collaboration and non-functional quality issues in large enterprise IT development efforts:

  1. Making agile successful in delivering quality customer value with fast time to market requires an integrated end-to-end approach, an agile operating model with clear roles and responsibilities along the value chain.
  2. Agile is all about getting a Minimal Viable Product (MVP) in customer hands as quickly as possible and then iteratively improve. What we often see, though, is that the MVP is mainly defined along functional value only and that it is typically difficult and costly to add non-functional value at a later stage.

DevOps articulates key operating model concepts (encompassing people, process and technology) of applying agile principles well (e.g. flow, feedback, continual learning and experimentation). The IT value chain concept, articulated by IT4IT, leveraged and expanded in the Zero Outage Map, also serves as an operating model basis and provides a meaningful basis for IT to structure and guide an integrated end-to-end approach of doing the right thing and doing it right.

Therefore, I think the different methodologies can complement each other in driving more effective collaboration across IT and higher value for the customer, but we need to determine the touchpoints driving the required synergy.

Building Zero Outage services is likely also an iterative approach. It is not typical to create services with Zero Outage quality as a start, but more as a final stage, dependent on evolving business requirements. Building as a start may result into a rocket science project delivering the wrong value at a time when it’s no longer needed….

While it is fairly easy to iteratively add functional value, it is not that obvious on the non-functional side, as this typically results into architectural requirements and considerations that add time and cost to the project. Adding non-functional requirements later may well result into re-designing if not re-writing major parts, which is even more costly and time consuming in the long run.

I think it is appropriate and critical for ZOIS to prescribe a practical way to incrementally grow the architecture of a service towards the Zero Outage quality level. We not only need to look at Zero Outage characteristics, but also at a methodology to get there in a feasible and affordable manner.

Minimal Viable Architecture

That discussion consequently raises another question: what is the Minimal Viable Architecture for the Minimal Viable Product? And, how can an architecture be designed in a sustainable way that allows to iteratively grow the maturity of non-functional quality? I think that is exactly the question that ZOIS should tackle, providing architecture policies for classes of non-functional requirements to develop high quality software for IT services.

Googling “Minimal Viable Architecture” on the internet reveals not only a wonderfully intuitive pictorial explanation but also a good number of consulting firms offering services, albeit no generally accepted definition nor an approach to get there. One also finds statements that require consideration, especially the more radical opinions:

  1. “First find business model and market fit”
  2. “The best code you can write now is the code you will discard in a couple of years’ time (Martin Fowler)”
  3. “If you don’t end up regretting your early technology decisions, you probably over engineered …Re-architecting is a sign of success; if you never need to, either you overbuilt or nobody cares”
  4. “Just enough architecture, Do things that don’t scale (Paul Graham)”
  5. “Modularity discipline, detailed logging, segregate business logic”
  6. “Understand the non-functional requirements of your product”

I certainly agree with finding the right strategy at first and the architectural excellence in comment #5. I also agree with #6, but it is easy to say that, but difficult to explain how. That is a clear area of immaturity in software development, and always been, not a particular agile problem.

But then I tend to question #2 to 4, especially when looking at it from the ZOIS perspective. Not only my own experience of more than 20 years of software development, but also others suggest that we always have good intentions to add non-functional value and architectural clean-up later, but in reality we never get to it. The main reason is simply cost and time. This is especially true with larger portfolios of related products or services, rather than distinct, stand-alone applications.

Zero Outage Maturity

This still leaves us with the question of how to define the Minimal Viable Architecture for software being part of Zero Outage services. I do not have an answer to this, neither has the industry at large, but I suggest that we create a focused software work-stream as part of the ZOIS association that creates a structure and methodology to

  • Determine non-functional requirements for software components of Zero Outage services, starting from the ZO map and layered model as guidance. The following areas seem to be of high importance and relevance:
    • Availability and Resilience
    • Performanceo Security
    • Reliability and Manageability
    • Integration – not a classical NFR, some may call it functional, however it is still very similar in its impact and architectural relevance to warrant being included
  • Investigate and develop software architecture principles enabling those
  • Creating a modular software reference architecture that allows to structure the principles and articulates the dependencies
  • Articulating a maturity model that prescribes the recommended way to successively mature software architecture towards the Zero Outage quality level

When thinking about the non-functional requirements, we also need to think about the relevant technologies. For example, resilience is key to ZOIS and not only a requirement to a component, but also to the relationship to other components. Resilience of software is different in a native cloud deployment compared to traditional on premise data centre, hence the architecture policy needs to specific to the technology. On the other hand, use of certain technology can also help to standardize an architectural policy, e.g. the use of containers enables architectural sustainability of defining and using APIs, it makes it almost impossible to create “Spaghetti integrations”.

I think this is highly critical to make ZOIS successful, and highly innovative fun to work on …

 


Footnotes

(1) See Zero Outage Value Proposition
(2) See “agile software development” and “DevOps” definitions in Wikipedia
(3) IT4IT™ is a trademark of The Open Group
(4) Scaled Agile Framework for the Enterprise, see www.scaledagileframework.com

Disclaimer

The information contained in this document is contributed and shared as thought leadership in order to evolve the Zero Outage Best Practices. It represents the personal view of the author and not the view of the Zero Outage Industry Standard Association.

Zero Outage