Introduction and Overall Picture
Outages of IT services are mostly associated with the misbehavior, or errors of operating personnel, or with technical problems which may, for example, be attributed to errors in the design. Therefore, measures in the areas “people”, “platforms” and also “processes” are obviously necessary to prevent such outages. So how does the fourth area in the Zero Outage Industry Standard come into play in these areas?
So-called DDoS attacks are a well-known example causing outages of IT services for which the dark side of the society is responsible for. IT security measures are put in place to prevent such attacks or to mitigate their impact. But there are other cases demonstrating a rather close relationship between outages and security.
- Wouldn’t it be an outage if a business process is seriously in a mess and not reliable at all due to access rights not being properly assigned?
- Wouldn’t it be an outage if hackers steal a large amount of money causing a business breakdown since the IT has not been protected from outside attacks properly?
- Wouldn’t it be considered an outage if the IT fails to protect the company’s most critical assets (“crown jewels”) from illegal access which in this case means that intellectual property in form of research results worth billions of Euro are lost?
If such examples are not sufficient to consider security a topic of “zero outage”, one may consider this last example: an IT service provider operates an application for a user organization using different components and services from its suppliers. An incident occurs that results in people being seriously hurt or even killed. How could this happen? The IT controls robots in an assembly line. They get out of control. Why? The IT controlling them is hacked or abused due to poor security standards. How could this happen? The IT service provider was not aware that the application was critical. The suppliers were not aware that they must take care of security too. The list is by far not complete but poor communication and collaboration is one reason why IT systems may fail to be properly secured.
Division of labor and industrialization are underestimated but indeed relevant with respect to security:
- Today’s IT service provision is characterized by a high degree of division of labor, specialization and standardization. An apparent element of this division of labor is the fact that user organizations draw on the IT services of specialized IT service providers.
- This process, referred to as “IT outsourcing” and “cloud”, transfers the responsibility for service provision, including security, to the IT service provider.
- However, there are other elements. Division of labor can be observed in the internal organizations of large-scale, industrialized IT production, but it expands to the whole supply chain, since partners and suppliers are decisively involved in modern IT service provisioning (“industrialization of IT”).
The chain is as strong as its weakest link.
- This means that communication and collaboration is a key element for the IT industry to provide secure IT services. As a minimum, one needs to have a representation of the common context, explaining the scope and dependency of the service and its components.
- Security must be a joint effort requiring the definition of common practices tailored to the circumstances typical in today’s IT industry. As a minimum, security management must overcome the limitation on a single organization. Moreover, the provisioning processes of modern IT and the organization of large-scale IT production should be considered.
- Existing security standards, practices and norms do not reflect this situation and do not meet the challenges sufficiently. This is why the Zero Outage Industry Standard association defines its goal as being one that gathers and enhances existing concepts in order that a de-facto industry standard which fills the gaps be created.
The association will not reinvent the wheel. Instead it aims to focus on aspects which are underestimated or not considered at all although they are essential for reaching a sufficient level of security as demanded by today’s business applications. But there is also the need to think ahead how the world is changing and whether a different “wheel” is required to meet expected challenges in the future. – Here are examples of “enemies of good security” and ways to “combat” them.
- Complexity: The Zero Outage Industry Standard association wants to provide the means to help to manage complexity.
- “Not my business”, or “we’ll take care of security later”: The Zero Outage Industry Standard association wants to standardize a “secured by definition” approach which makes security a genuine, indisputable part of any IT service management activity.
- Lack of cooperation, integration and binding of security functionality and action: The Zero Outage Industry Standard association wants to develop methods helping to actually consider the whole picture and leave the limits of special topics, isolated teams and technical silos. This shall foster cooperation while still taking the economic reality into account.
- Barriers and interfaces between companies: The Zero Outage Industry Standard association wants to help improve transparency in the supply chain and make contractual agreements on security part of business as usual.
- Lacking due care: The Zero Outage Industry Standard association wants to foster the standardization of security measures to meet the expectations of industrialized IT production and today’s market economy.
- Unknown loopholes: The Zero Outage Industry Standard association plans to specify e.g. rules and practices for third party access to IT infrastructures (example: privileged supplier access to components in an IT service provider’s IT stack).
These are example only. The Zero Outage Industry Standard association will continue to phrase and develop such cases and elaborate solutions for them.
As a result, Zero Outage is based on four pillars. Security is one of them (Figure 1).
Information security is gaining ever more significance. This development is due to the constant increase and widening use of IT – also in areas that had no former IT support or simply did not even exist. As a result, the number of threats to IT and data is on the rise as well. Some of these have only become possible because of the way we use IT and what we do with it. The business world strongly depends on having reliable IT services.
Zero Outage uses architectures to translate the mission of building IT services with zero business disruption into real deployments and delivered services providing consumers with the corresponding experience. Refer to the “Zero Outage Architecture, Executive Summary” for more detail. – The Zero Outage Industry Standard work stream “security” has published architectural models as a means helping to e.g. manage the complexity. In this chapter both approaches are summarized and it is shown how the two relate to each other. Though being different, they follow a similar philosophy and provide proven guidelines that are unique, specific and applicable in their context.
The main goal of the Zero Outage Industry Standard is to strive for zero business outages. Companies following this standard will benefit from the combined industry experience of providers as well as consumers of highly available and reliable IT solutions. It contains recommendations to find the right balance between reactive and proactive activities.
The Zero Outage collection of best practices offers specific guidance to enable IT professionals to plan, build, deliver and run end-to-end IT solutions suited for the most critical business functions and processes.
When looking at the IT4IT® Value Chain, the commonality and applicability to the Zero Outage problem is obvious. All phases, significant in the evolution of the Zero Outage interpretation of the value chain, are articulated by the Zero Outage Map (see Figure 2).
The Zero Outage Industry Standard uses the ESARIS Security Taxonomy as its classification and organization schema. In Figure 3, it is shown outlined by an amber box. A synopsis of the Capability View of the Zero Outage Map (Figure 2) and the Taxonomy (Figure 3) will follow in section 2.3. Prior to this, more detail about the Taxonomy will be provided.
The details are described in the document “ESARIS Security Taxonomy…”  also released as Zero Outage Industry Standard. The Taxonomy comprises 31 areas where security standards and measures need to be defined and implemented. The following lists some of the reasons why this Taxonomy has been introduced.
- Complexity: IT security is a multifaceted and sometimes intricate subject. That’s why one schematic is required helping to manage this complexity and to digest the large number of security aspects and standards it classifies and organizes.
- Specialization: Not everybody, not every team, not every company need to observe all possible security aspects. The Taxonomy helps to identify the content which is relevant in a given context. Only when it comes to customer facing (end-user) services, the whole picture needs to be observed, provided that the security standards for all areas are defined in a way that they take interdependencies into account.
- Definition of scope and security target: The selection relevant areas, topics and measures from the Taxonomy helps to specify the security requirements or features of the IT service, product or component under consideration. The selection method is called Provider Scope of Control and the basis for the next topic.
- Supply chain management and contracting: User organizations must rely on IT service providers. IT service providers must rely on their partners and suppliers, and even those have their suppliers and vendors. Contracts are agreements which should describe all essential deliverables including security. The Taxonomy and the specification concept behind it serve as the basis for such contracting.
- Speed and flexibility: IT services are produced on a large scale to meet market expectations on cost and quality. Cloud computing is a synonym for this. An almost comprehensive security specification of this environment would fill thousands of pages. A library is required comprising a large number of small documents. This approach scales but requires a Security Taxonomy. Monolithic documents cannot deliver the required information.
- Language and integration: Many frameworks are easy to understand either for IT experts or for IT security specialists. Secure IT service provisioning requires the cooperation of the two. That’s why the Taxonomy uses IT slang. – Additionally, the Taxonomy fully integrates IT service management and IT security management for the first time. In this way, it paves the way to “secured by definition”.
- Technical progress: Let’s consider this as the last example. One network security standard, for example, is useless. An IT service provider usually deals with about four types of networks which are totally different with respect to technology and security. Therefore, the Taxonomy splits them – an approach which is not common to many standards.
A more comprehensive rationale and more details about the structure can be found in the literature. The Taxonomy appears clearly structured and understandable for any IT-aware person. This is why IT-related terms are preferably used.
Now the Capability View of the Zero Outage Map (Figure 2) and the ESARIS Security Taxonomy (Figure 3) are compared.
Both models comprise a life-cycle model though this is obvious for the Zero Outage Map only. The idea that security must be considered right from the start is not new. Hence, the ESARIS Security Taxonomy organizes and specifies security measures relating to all phases throughout Operations. The Taxonomy integrates IT security management on the one hand with IT service management  on the other hand. ISO/IEC 20000  and ITIL  organize IT service management into several processes. The most widely known are Incident Management, Change Management, Problem Management, Release Management, Asset Management and so on. The ESARIS Security Taxonomy takes such processes as the basic for its mission of “secured by definition” which is more than “secure by design” since the whole life-cycle is covered. Namely, Operations (“Run”) is explicitly addressed.
Now we take a closer look onto each of the four phases Plan, Build, Deliver and Run and compare both models by means of Figure 4.
In the Zero Outage Map this phase covers the development of a service strategy, the enterprise architecture, the management of market demands and the elaboration of a conceptual model of desired IT services in the form of a portfolio. With respect to security, these activities find their expression in the Hierarchy of Security Standards (refinement pyramid) shown on the left-hand side of Figure 4 (outside the ESARIS Security Taxonomy). The development of security standards starts with elaborating overarching guiding principles and policies. They are refined with the definition of the ESARIS Security Taxonomy in the first step (refer to light blue rectangle surrounding the Taxonomy). The Taxonomy is introduced in the Orchestration Layer of the pyramid of stepwise refinement and used as the means for conceptual planning. It defines 31 areas, where about one half relates to activities in the life-cycle and one half to technology areas . Their introduction (or the decision to disregard some of them) is part of the planning phase. This becomes apparent when considering the enterprise architecture development in the planning phase. The design of enterprise business processes (in the upper half of the Taxonomy) is a strategic issue. The portfolio topic is another case. The corporation decides which technology areas (lower half of the Taxonomy) are necessary and which are not required and discarded since the services portfolio does not need them. Note that the Taxonomy covers everything. Corporations select those areas which are relevant for the business they are in. In addition, the ESARIS Security Taxonomy comprises one area (upmost on the left) which contains results relating to IT service security in a concrete way. This area is about culture and commitment, management principles, treatment of industry practices and standards on security, transparency and certification by third parties and similar topics.
This phase is much easier to compare. It covers design, development, testing and Release Management in the Zero Outage Map. Basically, there are three areas in the ESARIS Security Taxonomy about this. The Taxonomy distinguishes between products and services from third parties (partners, vendors) and products and services which are developed by the corporation itself. Both are treated differently with respect to security. The first case and area concentrates on the selection of suppliers, a clear definition of security requirements for the product or service, their consideration in contracts, the verification if they are met by the delivered product or service and similar topics. The second case and area defines a systematic process of developing software and IT systems. For an IT system, it covers initiation, development, implementation, transition to operations and planning of retirement (end-of-life). A final IT service usually comprises both types of elements, some being purchased and others (including the final IT services itself) developed by the corporation itself. Such IT services are finally tested and completed as a new Release which is covered by the third area in the Taxonomy.
In the Zero Outage Map this phase starts with the publishing of the IT service in the service catalogue. Ensuring that all relevant security measures are implemented is an overarching aspect “above the Taxonomy” which contains all security measures. Hence, ESARIS defines the so-called ESARIS Attainment Model which explicitly requires the integration into the service catalogue including the specification of security characteristics in the service specification elaborated to inform the consuming party (customer). But going back to the ESARIS Security Taxonomy, there is one area on the upper left which introduces the interface between the IT service provider and the user organization. This interface becomes important here because the customer enters the stage. (ITIL and ISO/IEC 20000 ignore this.) After contract signing, the IT service is actually instantiated for the customer. Then, the Zero Outage Map mentions the internal and external ordering process and the actual instantiation and activation of components and the IT service. With respect to security, two areas in the Taxonomy are primarily important (refer to the upper right in the Taxonomy): the maintenance of an inventory (Asset and Configuration Management) and the practical disciplines of hardening, configuration, provisioning and preparation for operations. Note that the ESARIS Security Taxonomy is not a pure life-cycle model – half of it is about technology and technical security measures, respectively. That’s why the lower part of the Taxonomy in Figure 4 has also a grey background since the Deliver phase is the one when these components really come into play.
In the Capability View of the Zero Outage Map (Figure 2), the Run phase, also known as Operations, comprises monitoring and predictive analysis, service assurance or maintenance, systematization and automation, and configuration and change management. The ESARIS Security Taxonomy has a very similar thinking since it focuses on transparency and analysis. Actually, the first area, not existing in ITIL and ISO/IEC 20000, is about continuous identification and assessment of threats and vulnerabilities and therefore deals with predicting and identifying the causes of security breaches. The next area deals with the observation and analysis of what is actually happening in the IT environment. Log data, events, alerts etc. are collected and evaluated and the results are reported to stakeholders. Then Incident, Change and Problem Management are adopted from the IT service management principles as areas in the Taxonomy since they are also essential with respect to security and require continuation and advancement. The Taxonomy also considers Patch Management as a major issue requiring specific consideration though this is not the case in ITIL and ISO/IEC 20000. The last area to mention is Business or Service Continuity Management which is marked to belong to the Run phase though most activities are more preparatory; but the real effect comes during Run.
Important: The last example has already shown that assigning areas of the Taxonomy to the phases Plan, Build, Deliver and Run is not always easy and one to one. The ESARIS Security Taxonomy is not structured along a life-cycle (though it comprises all ingredients). Some of its areas include aspects from more than one life-cycle phase. This was necessary since the Taxonomy’s primary classification principles are the support of specialization and division of labor and the consideration of the structure of today’s IT production and industry. – It is the nature of architecture  to look at a topic from different perspectives thus creating different views.  The above comparison has demonstrated using Figure 4 that the Capability View of the Zero Outage Map (Figure 2) and the ESARIS Security Taxonomy (Figure 3) are not contradicting. They follow a common philosophy nonetheless provide different views to serve unique purposes.
Technology areas: The lower half of the Taxonomy is based on simple principles of IT architecture. It uses a model formerly called “client-server-model”: The users’ equipment is shown on the left; the IT components residing in data centers are shown on the right hand side, and the network elements are in between. The more complex data center infrastructure is organized into elements of a primary IT stack and a second supporting one. Boundaries (especially the interface of the data center to the outer networks) are important in terms of security and therefore defined as extra areas. Note that the lower half contains four areas for four different types of networks because their nature and protection requirements are quite different.
The separation of technical standards (lower half) on the one hand and IT service management standards on the other (upper half) is a major requirement for using the Taxonomy in the context of an industrialized IT production. All IT services and technical components shall be developed, implemented and managed according to the same unique processes and practices. Moreover, IT services comprise several service models (such as IaaS, PaaS, and SaaS) and differ with respect to the inclusion of management activities (managed, partly managed, and unmanaged). They are also provided in different service models (such as classic, private, virtual private, hybrid, and public). The decomposition into different technical and procedural areas help to understand and specify the differences between the IT services and models just mentioned as they are delivered by the IT service provider to the user organization. The decisions are taken in the Deliver phase (grey in Figure 4). In the Run phase (amber in Figure 4) the IT stack is always complete though the user organization provides the application e.g. in case of the IaaS service model. Zero Outage security requires completing the upper half the Taxonomy too. No technical part can be left “unmanaged”. The Taxonomy helps to assign the activities (upper half) to one party involved in the business. This assignment to the IT service provider, the user organization or to both defines the division of labor necessary to ensure the security management to be comprehensive.
 Eberhard von Faber and Wolfgang Behnsen: Secure ICT Service Provisioning for Cloud, Mobile and Beyond, ESARIS: The Answer to the Demands of Industrialized IT Production Balancing Between Buyers and Providers, 2017, ISBN- 978-3- 658-16481- 2 
Today’s IT systems are complex and IT production on a large scale is a complex undertaking as well. Technologies and tasks are distributed amongst many parties in the supplier network, on a global basis. At the same time, our society is more dependent than ever on the reliable, continuous provisioning of IT services. IT security is one important element in making IT services reliable. Implementing security measures in technical platforms and processes and enabling the people running them to act suitably reduces risks for the business and our global community. A down-to-earth analysis, however, reveals that volumes of IT security standards and best practice were not sufficient to bring the susceptibility to attacks, abuse and misbehavior close to zero. Whereas other industries like aviation succeeded in solving safety issues, to this day the IT industry has failed to really solve the IT security problem which starts to affect safety as well. The security working group of the Zero Outage Industry Standard association takes this seriously. It did not propose a rigorous testing and certification schema as it exists in today’s aviation industry, but has started developing methodologies to help organizations tackle the real issue.
In principle we know what it would take to prevent damage as a result of security breaches. We know what it would take to detect suspicious activities and react appropriately, and yet we still see breaches and security issues occurring. It is time to look behind the curtain and consider what we may have missed. The security work stream of the Zero Outage Industry Standard has identified, and continues to do so, areas of concern.
It is our mission to propose solutions for zero outage of security and prevent outages as a result of security issues too.
A References and Applicable Documents
 ISO/IEC 27001 – Information technology – Security techniques – Information security management systems – Requirements
 ISO/IEC 20000 – Information technology – Service management – Part 1: Service management system requirements, Part 2: Guidance on the application of service management systems
 ESARIS Security Taxonomy – Synopsis, Scope and Content; Zero Outage Industry Standard, Release 1 about Security, February 2017, zero-outage.com/security
 Managing security in the supplier network – Third Party Integration Model; Zero Outage Industry Standard, Release 2 about Security, August 2017, zero-outage.com/security
 Eberhard von Faber and Wolfgang Behnsen: Secure ICT Service Provisioning for Cloud, Mobile and Beyond, ESARIS: The Answer to the Demands of Industrialized IT Production Balancing Between Buyers and Providers, 2017, ISBN- 978-3-658-16481-2
 Information Security Forum (ISF): Security Architecture, Navigating Complexity; March 2016