By Josh Stella, co-founder, and CTO of Fugue
The term “cloud misconfiguration” may not seem like an adequate term to describe the leading cause of cloud data breaches. It connotes a small, innocent mistake that is easy to fix. However, the recent Capital One data breach teaches three lessons about the vulnerabilities that cloud misconfigurations create: attackers can exploit them quickly without being detected, it’s become very difficult for enterprise security teams to find them before the bad guys do, and the consequences for losing that race can be devastating.
Migrating IT systems from the data center to platforms like AWS and Microsoft Azure can improve collaboration and productivity among employees, even when they’re scattered across remote locations, and relieve IT teams of the dual financial and time management burdens of installing, maintaining and upgrading on-premises systems. Just as the cloud has revolutionized how people get work done every day, it’s also transformed the responsibilities of the security, risk management, and DevOps teams. Cloud service providers like Amazon, Microsoft and Google clearly explain the shared responsibility model – they’re responsible for the security of the cloud, but the customer is responsible for their security in the cloud–including the secure configuration of cloud services they use. Ignoring this responsibility is a recipe for disaster.
New thinking, new strategies, new tools
It’s critical to understand that everything in the cloud—servers, databases, the network, security—is defined through software, specifically via Application Programming Interfaces (APIs) defined by the cloud providers. This provides tremendous flexibility, agility, and power — including the power to know the state of all infrastructure at any point in time. However, it also means there is great risk and potential vulnerabilities stemming from what are effectively software errors—misconfiguration of the resources that make up the cloud infrastructure.
The traditional approach to security of securing the network perimeter with antivirus, firewalls and other outward-facing solutions is not adequate in the cloud because there is no perimeter (if there ever was one). Instead of restricting inbound traffic, the focus must be mitigating cloud infrastructure misconfiguration through the entire stack, whether due to human error, a lack of policy controls in CI/CD pipelines, or bad actors.
That’s easier said than done. Today’s hackers use automation to find and exploit these misconfiguration vulnerabilities before traditional manual remediation methods can fix them. In order to become more proactive and prevent these threats from doing any damage, organizations need to simulate real-world misconfigurations to identify security gaps before they are exploited.
Information on the breach that impacted Capital One (and likely dozens of other organizations) drawn from the FBI complaint and the alleged attacker’s social media posts indicate she discovered a misconfigured firewall in the Capital One Amazon Web Services (AWS) environment and used it to access more than 100 million Capital One customers’ accounts in one of the biggest data breaches ever.
It’s just the latest example of how the nature of the threat landscape has changed, due in large part because the bad guys have grown so adept at using automation technologies to find and exploit vulnerabilities. The process takes mere minutes, making traditional manual remediation methods too slow to be effective.
Consider the amount of time it takes—once you’ve found a vulnerability in your cloud configuration—to create a ticket, get it assigned to an engineer and then have them fix it. Hours or even days could go by before the issue is fixed. We call this “Time To Remediation”, and your “Mean Time to Remediation” (MTTR) needs to be in the order of minutes.
That’s why your organization also needs to leverage security automation for the cloud. Yes, past issues caused by security bots and other security automation tools that inadvertently brought down production systems – have bred an understandable aversion to them among application and IT teams. But we’ve reached a tipping point where the risks of potential harm are so great and advancements in automation make it the only viable solution.
As a best practice, look for cloud security tooling that provides true automated remediation “out of the box.” Otherwise, your engineers will have to write lots of tedious and error-prone code that, without the right application context, can cause destructive changes that can lead to costly downtime events. Additionally, implement regular testing to determine if security automation is working do not focus on whether compute resources reappear on deletion, but rather examines what happens if an IAM policy or Security Group definition is changed. The list doesn’t stop there. Other things you should test are S3 bucket configurations and VPC network configurations. Resilient security demands covering all vulnerabilities an attacker may try to exploit.
Security’s “Shift Left”
Developers use the term “shift left” to describe moving a particular function to earlier phases of their processes to make identifying and fixing bugs and other errors easier and less time-consuming. Security teams should embrace shift left and work with DevOps to implement procedures for identifying and remediating cloud misconfigurations early in the software development life cycle when making a corrective change is faster and less expensive.
This is not only a procedural change, but it’s also a cultural one. Developers typically relegate security and compliance considerations as afterthoughts implemented as a gate during the test phase. Then they grow frustrated when security forces them to perform rework in design, development, and testing, and blame the security team for delays moving applications into production. Automating the shift left of compliance and security into the design and development phases can eliminate these delays and frustrations, and make better systems.
Shared security responsibility
Another important difference in the cloud is that security teams do not have direct access to all network traffic to monitor for intrusions. This is something cloud providers do as part of the shared responsibility model. Therefore, the security team’s chief responsibility becomes protecting the service configuration layer.
Cloud services talk to each other via APIs, and the newer ones use identity to configure access, as opposed to the older IP address space confirmation method. The network perimeter is defined via SDN and security group configurations. Unlike in the data center, configuration changes to your basic security posture are accessed via API and are subject to a lot of change for many reasons. IT’s goal is to establish a more resilient configuration of these services.
This requires a mechanism to revert damaging changes to your cloud configurations back to the healthy ones. The most effective option is to implement self-healing configuration, i.e., capturing a known-good baseline and leverage an engine that knows how to revert all mutable changes. Automating the process relieves the security team of the burden of manually monitoring for and remedying any potentially damaging changes to the environment.
Better security, fewer tradeoffs
The good news is that your cloud infrastructure can be more secure than your data center ever was. The data centers run by cloud services providers like Amazon, Microsoft and Google are more likely secure and more reliably operated than datacenters you are responsible for operating and securing. Additionally, security and compliance are fully programmable, and that provides you with complete, real-time visibility into your cloud environments, down to every configuration detail. That was not possible in the on-premises datacenter with its enormous collection of “black boxes” that require manual configuration.
You no longer have to need to trade speed and agility for security and compliance. In the cloud, you can have both! Equipped with the right tools, developers can move fast and more securely than ever before.
About the Author
Josh Stella is Co-founder and CTO of Fugue, the cloud infrastructure automation, and Security Company. Fugue identifies security and compliance violations in cloud infrastructure and ensures they are never repeated. Previously, Josh was a Principal Solutions Architect at Amazon Web Services, where he supported customers in the area of national security. He has served as CTO for a technology startup and in numerous other IT leadership and technical roles over the past 25 years.