Monday, 9 Feb 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > Security > A Guide to Cloud Resilience: Maximize Security, Minimize Downtime | DCN
Security

A Guide to Cloud Resilience: Maximize Security, Minimize Downtime | DCN

Last updated: January 24, 2024 3:36 am
Published January 24, 2024
Share
cloud with binary code and padlock
SHARE

It comes as no surprise that cloud resilience is a top IT buzzword of the 2020s. Ensuring resilience against cyber-attacks and ransomware extortion, as well as the ability to recover from IT disruptions quickly, are critical imperatives for organizations today. Without a resilient IT and application infrastructure, operational business processes are susceptible to breakdowns.

All the big cloud providers offer services and features for resilience. However, no CIO or IT professional should assume that shifting all workloads to the cloud guarantees complete resilience. The clouds offer building blocks, not ready-to-play-with fairytale castles. Instead, security architects and business continuity management experts must combine features and services cleverly.

Related: How to Combat Runaway Cloud Costs and ‘Cloud-flation’ in 2024

Figure 1 provides guiding roadmap that highlights four central scenarios for cloud resilience:

HallerCloud resilience scenarios

Figure 1 – Cloud resilience scenarios

  • Solution-internal resilience (1) looks at the challenges of applications or databases crashing without the impact of external events or issues in the underlying infrastructure or any impact from other components.
  • Infrastructure resilience (2) addresses problems in the underlying hardware or technology layers and the network.
  • Crash cascade resilience (3) aims to suppress domino effects, i.e., that the crash of one application impacts others.
  • Cyber-attack resilience (4) for dealing with external attackers who break into a data center cloud tenant.

Scenario 1: Solution-Internal Resilience

Related: How ISO, CIS, MITRE, and CSA Impact Your Cloud Security Architecture

The main risks solution-internal resilience must cover are coding and configuration errors, unexpected data constellations, and peaking resource requirements.

Resilience regarding workload peaks is much easier to achieve in the cloud. First, platform-as-a-service (PaaS) packages such as Cosmos DB have autoscaling features. Second, in the infrastructure-as-a-service (IaaS) cloud world, load balancers combined with groups of VMs (e.g. Azure Virtual Machine Scale Sets or Amazon EC2 Auto Scaling) is an e easy-to-implement solution. This approach guarantees that there are always sufficient VMs by scaling up and down depending on demand and replacing crashed VMs with new ones.

With such powerful preventative features, the classic corrective pattern – increasing or replacing the hardware and resources, restoring the backup, and restarting the application – moves to the background.

See also  The DOJ indicts Iranians for alleged Trump campaign ‘hack-and-leak’ scheme

The main preventive measures to increase resilience for coding, configuration, or data constellation issues are more testing and better software design. If bugs slip into production, causing a crash, fixing the bug and redeploying the code is the university textbook corrective measure. While a necessity for repeated crashes, restarting the application – “Have you tried turning it off and on again?” – is an immediate tactical measure to bring the application back online. Scale Sets and similar services automate these self-healing restarts, though application teams should investigate frequent crashes. Finally, as always, restoring a backup is the last option, be it the configuration, the data, or the application code.

HallerCloud resilience strategies: Prevent, contain and recover

Figure 2 – Cloud resilience strategies of prevent, contain, and recover

Scenario 2: Infrastructure Resilience

Failures in the hardware or network layer sound like something from the 1980s but are still an issue today. In the IaaS world, the application teams must handle VM and disk failures. Manual restart is the default (recovery) option. However, the already mentioned Scale Sets (and similar services) are convenient preventive measures in the cloud to minimize the likelihood of outages.

The approach differs for PaaS services such as storage accounts, Amazon S3 buckets, DBaaS, or Lambda Functions. Many offer various redundancy options for customers to choose from. Ideally, an organization’s cloud platform team defines and enforces minimal requirements for production environments. Then, all operational responsibilities are with the cloud provider.

The network layer has more facets. Customers decide how to set up the connectivity between clouds (e.g., their AWS and Google Cloud tenants) and between on-prem data centers and the cloud. Does an organization connect with GCP via the internet or the more reliable GCP Cloud Interconnect service? And if with Cloud Interconnect, does the organization rely on one network carrier, or do they partner with two or more? The customer decides. They also set up their routings and DNS services. However, they rely entirely on the cloud provider regarding the lower layers of the network backbone and connectivity within the data centers.

Scenario 3: Crash Cascades Resilience

Crash cascade resilience addresses the necessity that a crash of one application should not impact other applications, thereby causing domino-style cascading application crashes. For example, a bank should ensure that issues in the core banking system do not affect the ATM solution, which approves (or declines) money withdrawals from customers around the globe in real-time, 24/7. However, architects and managers must understand that there are clear limitations.

See also  The AI Advantage of Moving Away from Data Center NEMs | DCN

Resilience patterns can buy some time in this context – maybe five minutes, five hours, or five days. The bet is that the application comes back online before there is any impact on others. As with the money withdrawal example, such patterns can only be temporary solutions. No ATM application can operate for weeks without updates of the customer account balances and credit scoring changes.

One implementation pattern is straightforward: asynchronous integration patterns for application interactions, i.e., batch processes, messaging queues, and pub-sub. In contrast, (Rest-)API calls are simply evil. They cause applications to fail even if the counterparty system is down just for a second (or applications must implement complex failure handling logic). There is just one crucial footnote for asynchronous integration patterns. They rely typically on (messaging) middleware. The availability of this middleware is vital for the overall application landscape.

HallerCascading failures and the ATM example

Figure 3 – Cascading failures and the ATM example

In the end, the cloud is not a game changer for this resilience scenario, though the clouds provide ready-to-use middleware and ease restrictions on unwanted, direct inter-application connectivity, which forces applications to use middleware gateways. Furthermore, resilience against crash cascades is application-specific and even only partially an IT topic and more a business design topic. Does the business allow the ATM solutions to approve cash withdrawals based on yesterday’s data if the core banking system is down? Are limited withdrawals even possible if an ATM cannot reach the ATM solution? Only the business, in collaboration with IT, can define such business logic, which can contribute massively to the overall stability of the application ecosystem.

Scenario 4: Cyber-Attack Resilience

Withstanding cyber-attacks is the fourth and last scenario here. Cybersecurity specialists and CISOs have worked on this problem for decades. Thus, many organizations already have mature tools and processes in place.

See also  A new iOS 18 security feature makes it harder for police to unlock iPhones

Preventing and detecting cyber-attacks involves system hardening, penetration testing, access control, malware protection, and intrusion detection systems. The clouds have various features customers can activate quickly, speeding up the implementation of security controls compared to the old-fashioned on-prem world.

For containment, two complementary approaches exist: zone separation and E endpoint detection and response (EDR). EDR tools isolate and quarantine single, infected laptops, servers, and VMs. In contrast, separating network zones is a fire-division wall approach aiming to prevent lateral movement by shutting down connectivity. So, if a company’s network in Australia is compromised, they cut the connectivity to the Singaporean and Swiss network zones. Then, engineers clean up the servers in Australia before reestablishing connectivity with Singapore and Switzerland. This is a solid approach, but only if applications and business are not too interweaved.

After the containment comes recovery, i.e., restoring a pre-attack state from backups or redeploying applications with CI/CD pipelines. However, companies must be aware that attackers know about backups and try to delete them. So, immutable backups are a necessity, i.e., backups nobody can delete – not even admins. To further complicate matters, while containment and recovery tools are ‘mature’, coverage for non-VM workloads – containers and cloud-native services such as AWS Lambda or AWS S3 Buckets – can be limited.

Conclusion

Our exploration of the four critical scenarios – solution-internal resilience, infrastructure resilience, crash cascades resilience, and cyberattack resilience – reveals the multifaceted way to implement truly resilient IT and application landscapes.

While the public cloud brings relief when aiming for redundancy and quick-to-activate security tools, preventing domino-style cascading application crashes remains with individual application architectures. Their application design and business processes decide whether temporary decoupling from other applications and shielding them from external crashes is possible – a nightmare for managers hoping for quick solutions, a dream for ambitious architects loving to work on real challenges.

Source link

Contents
Scenario 1: Solution-Internal ResilienceScenario 2: Infrastructure ResilienceScenario 3: Crash Cascades ResilienceScenario 4: Cyber-Attack ResilienceConclusion
TAGGED: cloud, DCN, Downtime, Guide, Maximize, Minimize, Resilience, security
Share This Article
Twitter Email Copy Link Print
Previous Article Protesters outside the 2022 AWS Summit in New York City DCK’s editor in chief predicts the biggest headaches for 2024 | DCN
Next Article A New Year’s resolution for tech companies: knock it off with the CAPTCHAs A New Year’s resolution for tech companies: knock it off with the CAPTCHAs
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

DeepSeek ban? China data transfer boosts security concerns

US lawmakers are pushing for a DeepSeek ban after safety researchers discovered the app transferring…

February 8, 2025

Data Center, AI Load Growth Could Threaten Grid Reliability – Report

Electrical energy demand from information facilities may greater than double over the subsequent decade, threatening…

June 21, 2024

Amex CISO fights threats at machine speed With AI

Be a part of our every day and weekly newsletters for the newest updates and…

April 15, 2025

Amazon pours $150B into data centers to handle expected AI boom

Amazon is reportedly planning to spend a whopping $150 billion throughout the subsequent 15 years…

March 28, 2024

Transforming data centre branding: Latos leads with vision and innovation

In a daring transfer, two modern North East corporations are collaborating to make their mark…

July 25, 2025

You Might Also Like

Alphabet boosts cloud investment to meet rising AI demand
Cloud Computing

Alphabet boosts cloud investment to meet rising AI demand

By saad
Snowflake and OpenAI push AI into everyday cloud data work
Cloud Computing

Snowflake and OpenAI push AI into everyday cloud data work

By saad
Image of digital globe, with connected data points
Global Market

AI, security tailwinds signal promising 2026 for Cisco

By saad
Nationwide is deepening its use of cloud services with AWS
Cloud Computing

Nationwide is deepening its use of cloud services with AWS

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.