There is a widespread perception that the cloud is expensive.
But is the cloud really at fault? Or is it plain inefficiency?
Working with several large enterprises, I’ve seen everything from poor governance to bad design choices leading to huge cloud spending. I’ve also seen well-defined cloud strategies and ongoing optimization practices helping enterprises keep their cloud costs under control.
I’d like to share some common costly cloud practices that I've seen and offer practical solutions here. By the end of the article, I’ll also touch on how adopting a DevSecFinOps mindset—where development, security, and financial considerations are integrated—could help balance cost efficiency and operational effectiveness.
Please note that this is a highly opinionated take based on real-world experiences, and I welcome discussions and differing perspectives. The goal here isn’t to prescribe a one-size-fits-all solution but to shed light on how mindful decision-making can curb unnecessary cloud spend. Let's challenge the assumption that the cloud is the problem and look at the practices that may drive up expenses.
Uniform Security for All Applications Vs Tailored Approach
One of the most costly practices I’ve seen is the tendency to apply the same level of security to every application, regardless of its sensitivity or risk. At the organizational level, there are often strict security frameworks that cover multiple layers of protection. These are set in stone and applied to every single application, whether it's a mission-critical system handling sensitive data or a simple internal tool. This one-size-fits-all approach inevitably drives up operational costs by forcing low-risk applications to adhere to premium, resource-intensive security measures they don’t need.
A more flexible, tiered security model, implemented with due diligence, would be far more effective. By categorizing applications into different tiers based on their purpose, sensitivity, and risk, organizations can optimize both security and cost efficiency.
Note: The tiered approach outlined below is not meant to downplay or overlook the importance of security. Security remains a top priority. These decisions should always be made with due diligence, following thorough security assessments and risk calculations. The goal is not to compromise on security, but to apply it thoughtfully and proportionately based on the specific needs of each application. |
- Tier 1: High-security applications (such as external-facing apps or those handling sensitive data) should have premium services that support their security requirements and the strict security policies in place.
- Tier 2: Medium-risk applications (such as internal systems with moderately sensitive data) can adopt a middle-ground security model that balances protection with cost.
- Tier 3: Low-risk applications (such as internal systems with non-sensitive data, POCs, prototypes, DEV region, etc.) could utilize more cost-effective services without the burden of unnecessary security measures.
These application tiers should be isolated at the network, IAM, and all possible levels. By isolating the network boundaries between tiers, organizations can further reduce the overall security risk while still maintaining a cost-effective approach. To implement this model effectively, a security risk assessment should be integrated into every stage of the development process, following a true DevSecOps approach.
To implement this model effectively, a security risk assessment should be integrated into every stage of the development process, following a true DevSecOps approach.
To illustrate how applying uniform security policies can drive up costs, let’s take the example of Azure Service Bus implementation in a Tier 3 application. In a Tier 3 application where the data is non-sensitive and there’s no need for advanced disaster recovery (DR) features or any networking requirements, the "one-size-fits-all" approach will force us to use Azure Service Bus Premium due to stringent security policies. The problem? The Premium tier can be up to 50 times more expensive than the Standard tier.
For low-risk applications, this kind of over-provisioning may be a significant waste of resources. Instead of defaulting to the Premium tier for all applications, organizations can enhance security in low-tier applications in more cost-effective ways, such as encrypting messages so that only the intended receiver can understand the content, deploying a secure internal message broker within the network that is shared across internal trusted applications, etc. This approach provides a solid level of security while avoiding the high costs of premium services, allowing businesses to protect data where it’s needed without inflating costs.
Security as an Afterthought
In a few organizations, security checks are often performed at the final stages, usually when an application reaches pre-production or production after it has already passed through the design, development, and QA stages. By the time security policies are enforced, any necessary changes to align with standards can require extensive redesign. This results in additional development cycles and re-testing, both of which are costly and time-consuming. This is exactly where DevSecOps proves invaluable—by embedding security into the design phase from the start, security becomes an integral part of the process, not a last-minute hurdle.
By embedding security into the design phase from the start, security becomes an integral part of the process, not a last-minute hurdle.
For example, consider a case where Azure Service Bus is chosen as the messaging backbone of a platform. During the design phase, the development team may have opted for the Standard Tier of Azure Service Bus Standard and cost projections would have been done accordingly. However, when the application hits the final security checks in pre-prod or production, the security policies may mandate Azure Service Bus Premium due to stricter requirements, significantly increasing operational costs. If a DevSecOps approach had been in place, with the security team involved from the design stage, they would have flagged this issue early on. This would have allowed the team to either account for the higher OPEX costs upfront or explore alternative solutions, such as leveraging an existing, secure message-broker cluster within the organization.
Another common issue is developers overlooking proper network isolation and security policies during design and development. When the application is deployed into a secured infrastructure, misconfigurations or lack of alignment with security policies can lead to deployment failures or security vulnerabilities. Again, a DevSecOps approach addresses this by ensuring that network isolation, firewall rules, and other security considerations are baked into the architecture and development phases, preventing costly fixes and rework later.
By integrating security into every stage of the application lifecycle, DevSecOps allows organizations to avoid expensive last-minute changes while ensuring that security standards are met efficiently. This proactive approach ultimately saves both time and money.
Stagnant Security Policies
Another costly mistake many enterprises make is failing to update their security frameworks as technology evolves. The real problem arises when security policies are rigidly enforced without a proper feedback loop or mechanism to evaluate the suitability of new services. Implementation teams are thus forced to follow old rules that may no longer be the best fit.
For instance, I’ve encountered situations where organizations decided against adopting modern cloud-native ETL pipelines and services like Databricks, Azure Data Factory, AWS Glue, etc., opting instead for legacy solutions such as traditional ETL tools or on-premises data integration platforms. Despite the efficiency and flexibility of newer technologies, these organizations shy away from using them since they aren’t on the "approved list." This reliance on outdated methods often results in longer development cycles and higher costs.
Sometimes, security frameworks are set by decision-makers far removed from the daily challenges of implementation. Proper DevSecOps practices would help solve this problem by involving security teams early in the process and ensuring policies are continuously updated to reflect modern needs. Security policies should evolve in tandem with technology to strike a balance between safety and practicality.
Security policies should evolve in tandem with technology to strike a balance between safety and practicality.
Note: While we highlight the challenges with outdated security policies, we fully acknowledge that adapting to new technologies or updating frameworks cannot happen overnight. The issue here is not the need for immediate change, but rather the lack of urgency to explore and adopt the latest innovations. By playing it safe with already vetted policies, organizations risk falling behind. We recommend a proactive approach with continuous updates and a well-thought-out plan for gradually rolling out these changes to ensure both security and efficiency. |
Ignoring FinOps Early On
FinOps is a term that gets mentioned often, but like DevSecOps, it's rarely implemented as effectively as it should be. Often, FinOps teams only get involved after an application has been deployed to production, at which point they work to optimize resources and reduce operational costs. While this post-production optimization is helpful, it’s far less impactful than if FinOps principles were ingrained in the design and development phases. In fact, a proactive FinOps approach during design could lead to cost savings that are multiple times greater than post-deployment tuning.
Without a FinOps mindset, engineers often design systems using the “latest and greatest” technologies or opt for services they are most familiar with or are easy to implement. While this might speed up implementation, it’s not always the most cost-effective approach. In many cases, there could be alternatives that offer similar performance at a fraction of the cost, but these options are often overlooked simply because cost wasn’t a primary concern during design.
Without a FinOps mindset, engineers often design systems using the “latest and greatest” technologies or opt for services they are most familiar with or are easy to implement. While this might speed up implementation, it’s not always the most cost-effective approach.
Consider the choice between Azure Event Hub Capture and Azure Stream Analytics for a business scenario where we need to perform minor aggregations on incoming Event Hub data and save it to storage; latency wasn’t a concern for the business in this specific scenario. Azure Stream Analytics offers a no-code, real-time solution that's easy to implement while the Event Hub Capture approach requires more development, such as setting up EventGrid to trigger an Azure Function for aggregation and syncing to storage. Although both approaches achieve the same result, engineers might naturally lean towards Stream Analytics due to its simplicity. However, this ease comes with significantly higher operational costs compared to Event Hub Capture. A FinOps mindset from the start would prompt a thorough evaluation of both solutions, ensuring that the team selects the most cost-effective option.
The key takeaway is that FinOps shouldn’t be a reactive practice. Instead of waiting until production to optimize resources, organizations should involve FinOps from the very beginning, ensuring that every design decision balances both performance and cost efficiency. This shift would lead to more thoughtful, scalable designs that don’t require expensive fixes down the road.
Clashing Development and Infrastructure Processes
One of the challenges I’ve encountered in some organizations is the disconnect that occurs when development and infrastructure teams follow different processes. In one case, the development team was operating in Agile, where they planned their infrastructure needs several sprints ahead and expected services to be set up incrementally in line with business requirements. The idea was that as business needs changed, infrastructure would adapt and evolve along with the development cycle.
However, the infrastructure team was following a strict waterfall model, which made this collaboration difficult. Instead of being open to the frequent, incremental changes typical of Agile, they adhered to a rigid, linear process: take the initial infrastructure request, implement it in DEV, then move through QA, pre-production, and finally production—whether or not those environments were immediately necessary. This would often lead to pre-production and production environments being set up months ahead of time, incurring significant costs in the process. Cloud infrastructure costs start accumulating as soon as environments are provisioned, so setting up high-cost environments like production when they aren’t actively used is a clear waste of resources.
Cloud infrastructure costs start accumulating as soon as environments are provisioned, so setting up high-cost environments like production when they aren’t actively used is a clear waste of resources.
This mismatch in processes leads not only to unnecessary costs but also to delays. The development team face bottlenecks, waiting for infrastructure to catch up with their sprint timelines, delaying the project. While Agile expects continuous delivery of services, the waterfall approach forces development to halt while waiting for the next batch of infrastructure to be ready, further stretching timelines.
Now, to be clear, this isn’t an argument against waterfall or an advocacy for Agile. Both methodologies have their merits depending on the project and organization. The key takeaway here is the importance of following compatible processes. When development and infrastructure teams are out of sync, it’s not just about delays—it’s about costs ballooning unnecessarily. If both teams are aligned from the start, with infrastructure scaling as needed in line with development, both time and money can be saved.
Other Commonly Overlooked Causes of High Cloud Costs
It’s no secret that cloud cost optimization is a top priority for enterprises, and most teams are aware of the practices needed to keep costs in check. Yet, despite this understanding, mistakes are routinely overlooked, leading to inflated cloud bills. Below are some of the most commonly ignored factors that drive up cloud costs, despite being well-understood principles.
Lack of Cloud Cost Visibility and Monitoring
- No Real-Time Cost Tracking: Without proper visibility into cloud usage and spending, teams often don’t realize they’re overspending until they receive the bill at the end of the month. Continuous monitoring tools should be in place to track cloud usage in real time and send alerts when spending exceeds certain thresholds.
- Poor Resource Tagging and Allocation: Inconsistent or missing tagging across resources makes it difficult to track costs by team, project, or department, leading to waste and under-optimized resource allocation.
Failure to Right-Size Resources
- Over-Provisioning of Services: Resources are frequently over-provisioned for anticipated load, leading to unused capacity and higher costs. Teams should practice right-sizing by using only what is necessary for current demand and scaling up as needed.
- Using Higher Service Tiers Unnecessarily: Whether for VMs, databases, or storage, using higher tiers of service (like premium or enterprise) without a justifiable business need leads to inflated costs. Regular reviews of service tiers should be conducted to ensure cost-effective usage.
Underutilization of Cloud-Native Features
- Not Leveraging Serverless Architectures: Using traditional VMs or infrastructure for tasks that could be offloaded to serverless solutions like AWS Lambda or Azure Functions results in higher costs. Serverless options scale automatically and are generally more cost-effective for short-running or event-driven workloads.
- Neglecting Auto-Scaling and Elasticity: Not implementing automatic scaling or elasticity features means you could be paying for capacity that isn’t needed during low-traffic periods. Auto-scaling ensures resources match actual demand, preventing over-provisioning.
Mismanagement of Data Storage
- Improper Data Tiering: Many teams fail to optimize their data storage across different tiers. For example, data that is infrequently accessed might still be stored in premium or high-performance storage, when cheaper alternatives like cold storage would suffice.
- Unused Snapshots and Backups: Snapshots, backups, and temporary storage volumes that are no longer needed are often left running, leading to unnecessary costs over time.
Suboptimal Network Design
- High Data Transfer and Egress Fees: Unoptimized network architecture, such as transferring large volumes of data between regions or unnecessarily routing traffic through multiple locations, can result in costly data egress charges. Optimizing data locality can significantly reduce these fees.
- Underutilizing Content Delivery Networks (CDNs): Failing to leverage CDNs for distributing content can lead to excessive bandwidth and data transfer costs. CDNs reduce the load on your primary servers and help deliver content more efficiently.
Poor Lifecycle Management of Cloud Resources
- Orphaned or Idle Resources: Often, resources like VMs, databases, or storage volumes are provisioned but forgotten once a project or sprint is complete, leading to ongoing costs for resources no longer in use. Regular audits should be conducted to identify and clean up orphaned resources.
- Uncontrolled Usage of Test Environments: Test and development environments are frequently left running outside of active hours. Implementing policies to shut down or deallocate non-production resources outside of working hours can save significant costs.
Vendor Lock-In Without Cost Review
- Not Reviewing Cost Structures of Specific Vendors: Enterprises that lock themselves into a particular cloud provider or service without regularly reviewing the cost-benefit analysis can face high costs. Periodic reviews of cloud vendors and negotiating better pricing based on usage patterns can mitigate these risks.
- Failing to Use Multi-Cloud or Hybrid Solutions: Over-reliance on a single cloud provider can lead to higher costs if you don’t take advantage of cost-saving opportunities from other vendors. Adopting a multi-cloud or hybrid strategy or a cloud-portable approach allows you to optimize cost by leveraging the best services at the best prices across providers.
Ignoring Automation and Optimization Tools
- No Automation of Cost Controls: Failing to automate tasks like shutting down idle resources, scaling down underutilized instances, or moving data to lower-cost storage can result in unnecessary ongoing expenses.
- Non-Utilization of Cloud Optimization Tools: Most cloud providers offer built-in tools for identifying cost-saving opportunities (for example, AWS Cost Explorer and Azure Advisor), but teams often overlook these recommendations, leading to inefficient cloud use.
Is DevSecFinOps the Solution?
Looking at all the issues we’ve discussed—from uniform security policies to misaligned development and infrastructure processes, and the lack of FinOps in the design phase—the root cause seems clear: many teams operate in silos, without integrating security and financial considerations into their development lifecycle. This often leads to decisions being made without a complete understanding of how they will impact cost, security, and operational efficiency down the line.
Traditionally, we’ve seen the rise of DevOps to break down barriers between development and operations. Then came DevSecOps, emphasizing security as a critical component of the development process from the very beginning. But is that enough? Should we be thinking bigger?
Traditionally, we’ve seen the rise of DevOps to break down barriers between development and operations. Then came DevSecOps, emphasizing security as a critical component of the development process from the very beginning. But is that enough? Should we be thinking bigger?
Given the recurring issues around cost optimization, it seems that the "fin" aspect is just as crucial. Teams often make decisions in the design and development phases without considering their long-term financial impact, only to face surprises later. What if we took this a step further and adopted a DevSecFinOps approach—where development, security, and financial considerations are baked into every phase of the project lifecycle?
By integrating security and financial insights directly into the development process, teams can make more informed decisions, reducing unnecessary costs while maintaining high levels of security. This model could prevent many of the challenges we’ve covered, ensuring that cloud infrastructure is designed with the right balance of security, agility, and cost efficiency from the outset.
So, the big question is: Should DevSecFinOps be the new standard? Could this integrated mindset help organizations avoid costly mistakes and inefficiencies, ultimately delivering better outcomes for both security and the bottom line?
DevSecFinOps: Not Just for the Cloud
To be clear, this blog post isn’t against on-prem solutions. There are many situations where we recommend on-premise infrastructure based on specific needs and fitment. DevSecFinOps isn’t exclusive to the cloud—it’s a mindset that applies to both cloud and on-prem environments, ensuring smarter, more efficient decisions regardless of the platform.