Friday, 27 Mar 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > Security > A Guide to Cloud Resilience: Maximize Security, Minimize Downtime | DCN
Security

A Guide to Cloud Resilience: Maximize Security, Minimize Downtime | DCN

Last updated: January 24, 2024 3:36 am
Published January 24, 2024
Share
cloud with binary code and padlock
SHARE

It comes as no surprise that cloud resilience is a top IT buzzword of the 2020s. Ensuring resilience against cyber-attacks and ransomware extortion, as well as the ability to recover from IT disruptions quickly, are critical imperatives for organizations today. Without a resilient IT and application infrastructure, operational business processes are susceptible to breakdowns.

All the big cloud providers offer services and features for resilience. However, no CIO or IT professional should assume that shifting all workloads to the cloud guarantees complete resilience. The clouds offer building blocks, not ready-to-play-with fairytale castles. Instead, security architects and business continuity management experts must combine features and services cleverly.

Related: How to Combat Runaway Cloud Costs and ‘Cloud-flation’ in 2024

Figure 1 provides guiding roadmap that highlights four central scenarios for cloud resilience:

HallerCloud resilience scenarios

Figure 1 – Cloud resilience scenarios

  • Solution-internal resilience (1) looks at the challenges of applications or databases crashing without the impact of external events or issues in the underlying infrastructure or any impact from other components.
  • Infrastructure resilience (2) addresses problems in the underlying hardware or technology layers and the network.
  • Crash cascade resilience (3) aims to suppress domino effects, i.e., that the crash of one application impacts others.
  • Cyber-attack resilience (4) for dealing with external attackers who break into a data center cloud tenant.

Scenario 1: Solution-Internal Resilience

Related: How ISO, CIS, MITRE, and CSA Impact Your Cloud Security Architecture

The main risks solution-internal resilience must cover are coding and configuration errors, unexpected data constellations, and peaking resource requirements.

Resilience regarding workload peaks is much easier to achieve in the cloud. First, platform-as-a-service (PaaS) packages such as Cosmos DB have autoscaling features. Second, in the infrastructure-as-a-service (IaaS) cloud world, load balancers combined with groups of VMs (e.g. Azure Virtual Machine Scale Sets or Amazon EC2 Auto Scaling) is an e easy-to-implement solution. This approach guarantees that there are always sufficient VMs by scaling up and down depending on demand and replacing crashed VMs with new ones.

With such powerful preventative features, the classic corrective pattern – increasing or replacing the hardware and resources, restoring the backup, and restarting the application – moves to the background.

See also  This security camera’s 1.5-mile range is perfect for your sprawling mansion

The main preventive measures to increase resilience for coding, configuration, or data constellation issues are more testing and better software design. If bugs slip into production, causing a crash, fixing the bug and redeploying the code is the university textbook corrective measure. While a necessity for repeated crashes, restarting the application – “Have you tried turning it off and on again?” – is an immediate tactical measure to bring the application back online. Scale Sets and similar services automate these self-healing restarts, though application teams should investigate frequent crashes. Finally, as always, restoring a backup is the last option, be it the configuration, the data, or the application code.

HallerCloud resilience strategies: Prevent, contain and recover

Figure 2 – Cloud resilience strategies of prevent, contain, and recover

Scenario 2: Infrastructure Resilience

Failures in the hardware or network layer sound like something from the 1980s but are still an issue today. In the IaaS world, the application teams must handle VM and disk failures. Manual restart is the default (recovery) option. However, the already mentioned Scale Sets (and similar services) are convenient preventive measures in the cloud to minimize the likelihood of outages.

The approach differs for PaaS services such as storage accounts, Amazon S3 buckets, DBaaS, or Lambda Functions. Many offer various redundancy options for customers to choose from. Ideally, an organization’s cloud platform team defines and enforces minimal requirements for production environments. Then, all operational responsibilities are with the cloud provider.

The network layer has more facets. Customers decide how to set up the connectivity between clouds (e.g., their AWS and Google Cloud tenants) and between on-prem data centers and the cloud. Does an organization connect with GCP via the internet or the more reliable GCP Cloud Interconnect service? And if with Cloud Interconnect, does the organization rely on one network carrier, or do they partner with two or more? The customer decides. They also set up their routings and DNS services. However, they rely entirely on the cloud provider regarding the lower layers of the network backbone and connectivity within the data centers.

Scenario 3: Crash Cascades Resilience

Crash cascade resilience addresses the necessity that a crash of one application should not impact other applications, thereby causing domino-style cascading application crashes. For example, a bank should ensure that issues in the core banking system do not affect the ATM solution, which approves (or declines) money withdrawals from customers around the globe in real-time, 24/7. However, architects and managers must understand that there are clear limitations.

See also  Crucial Cybersecurity Skills for Today's IT Pros

Resilience patterns can buy some time in this context – maybe five minutes, five hours, or five days. The bet is that the application comes back online before there is any impact on others. As with the money withdrawal example, such patterns can only be temporary solutions. No ATM application can operate for weeks without updates of the customer account balances and credit scoring changes.

One implementation pattern is straightforward: asynchronous integration patterns for application interactions, i.e., batch processes, messaging queues, and pub-sub. In contrast, (Rest-)API calls are simply evil. They cause applications to fail even if the counterparty system is down just for a second (or applications must implement complex failure handling logic). There is just one crucial footnote for asynchronous integration patterns. They rely typically on (messaging) middleware. The availability of this middleware is vital for the overall application landscape.

HallerCascading failures and the ATM example

Figure 3 – Cascading failures and the ATM example

In the end, the cloud is not a game changer for this resilience scenario, though the clouds provide ready-to-use middleware and ease restrictions on unwanted, direct inter-application connectivity, which forces applications to use middleware gateways. Furthermore, resilience against crash cascades is application-specific and even only partially an IT topic and more a business design topic. Does the business allow the ATM solutions to approve cash withdrawals based on yesterday’s data if the core banking system is down? Are limited withdrawals even possible if an ATM cannot reach the ATM solution? Only the business, in collaboration with IT, can define such business logic, which can contribute massively to the overall stability of the application ecosystem.

Scenario 4: Cyber-Attack Resilience

Withstanding cyber-attacks is the fourth and last scenario here. Cybersecurity specialists and CISOs have worked on this problem for decades. Thus, many organizations already have mature tools and processes in place.

See also  Microsoft announces its own Black Hat-like hacking event with big rewards for AI security

Preventing and detecting cyber-attacks involves system hardening, penetration testing, access control, malware protection, and intrusion detection systems. The clouds have various features customers can activate quickly, speeding up the implementation of security controls compared to the old-fashioned on-prem world.

For containment, two complementary approaches exist: zone separation and E endpoint detection and response (EDR). EDR tools isolate and quarantine single, infected laptops, servers, and VMs. In contrast, separating network zones is a fire-division wall approach aiming to prevent lateral movement by shutting down connectivity. So, if a company’s network in Australia is compromised, they cut the connectivity to the Singaporean and Swiss network zones. Then, engineers clean up the servers in Australia before reestablishing connectivity with Singapore and Switzerland. This is a solid approach, but only if applications and business are not too interweaved.

After the containment comes recovery, i.e., restoring a pre-attack state from backups or redeploying applications with CI/CD pipelines. However, companies must be aware that attackers know about backups and try to delete them. So, immutable backups are a necessity, i.e., backups nobody can delete – not even admins. To further complicate matters, while containment and recovery tools are ‘mature’, coverage for non-VM workloads – containers and cloud-native services such as AWS Lambda or AWS S3 Buckets – can be limited.

Conclusion

Our exploration of the four critical scenarios – solution-internal resilience, infrastructure resilience, crash cascades resilience, and cyberattack resilience – reveals the multifaceted way to implement truly resilient IT and application landscapes.

While the public cloud brings relief when aiming for redundancy and quick-to-activate security tools, preventing domino-style cascading application crashes remains with individual application architectures. Their application design and business processes decide whether temporary decoupling from other applications and shielding them from external crashes is possible – a nightmare for managers hoping for quick solutions, a dream for ambitious architects loving to work on real challenges.

Source link

Contents
Scenario 1: Solution-Internal ResilienceScenario 2: Infrastructure ResilienceScenario 3: Crash Cascades ResilienceScenario 4: Cyber-Attack ResilienceConclusion
TAGGED: cloud, DCN, Downtime, Guide, Maximize, Minimize, Resilience, security
Share This Article
Twitter Email Copy Link Print
Previous Article Protesters outside the 2022 AWS Summit in New York City DCK’s editor in chief predicts the biggest headaches for 2024 | DCN
Next Article A New Year’s resolution for tech companies: knock it off with the CAPTCHAs A New Year’s resolution for tech companies: knock it off with the CAPTCHAs
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Sett Raises $27M in Series A Funding

Sett, a Tel Aviv, Israel-based supplier of an Agentic AI platform for cellular gaming studios,…

May 8, 2025

Kay Firth-Butterfield, formerly WEF: The future of AI, the metaverse and digital transformation

Kay Firth-Butterfield is a globally recognised chief in moral synthetic intelligence and a distinguished AI…

April 3, 2025

Valsoft Acquires Equinox Information Systems

Valsoft, a Montreal, Canada-based firm specializing within the acquisition and growth of vertical market software…

August 14, 2024

AMD Buys Enosemi

Santa Clara, CA-based semiconductor firm Superior Micro Gadgets, Inc., (Nasdaq: AMD) acquired Enosemi, a Silicon Valley,…

May 29, 2025

Three options for wireless power in the enterprise

Sensors reminiscent of these could be hooked up to pallets to trace its location, says…

October 19, 2025

You Might Also Like

Oracle introduces “agentic cloud apps” into enterprise workflows
Cloud Computing

Oracle introduces “agentic cloud apps” into enterprise workflows

By saad
AI, artificial intelligence
Global Market

Cisco goes all in on agentic AI security

By saad
Zadara aligns with NVIDIA's software reference guide
Infrastructure

Zadara aligns with NVIDIA’s software reference guide

By saad
Worldstream and Cubbit forge partnership to augment Dutch cloud storage sovereignty
Infrastructure

Worldstream and Cubbit forge partnership to augment Dutch cloud storage sovereignty

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.