Scaling to a Million Domains: How We Transformed a Failing System

One of our clients manages a vast portfolio of domain names, each hosting a monetized parking page through Google AdSense. They planned to scale it by a million more domain names, but their current infrastructure was already buckling under the load. Frequent gateway timeouts, outages, and inability to handle sudden traffic surges made downtime a daily issue.

To move forward, they needed a solution built for reliability and growth.

The question was, where to begin?

Dissecting the Bottlenecks

At first glance, the architecture seemed outdated but functional. A deep dive revealed more.

The system was monolithic, held together by technologies that had long reached the end of their lifecycle. Despite being hosted on Amazon Web Services, the platform barely used any cloud-native capabilities. The infrastructure was rigid: a fleet of manually configured EC2 instances with no auto-scaling, redundancy, or failover mechanisms. Load balancing relied on a single HAProxy instance, while DNS management utilized multiple BIND servers. However, scalability was not implemented for the DNS infrastructure. If anything went down, the entire system would collapse with it.

Current system architecture — Current architecture with limited scalability and single points of failure.

To make matters worse, resource-heavy cron jobs ran directly on web servers, often leading to system failures. The database architecture relied on master-slave replication across EC2 instances. Poor connection pooling and misconfigured caching caused constant timeouts. Moreover, there were no safeguards against DDoS attacks, making the system an easy target.

The result was a fragile system that was one step away from failure.

Rebuilding from the Ground Up

It was clear that patchwork fixes wouldn’t cut it. The best way out was a complete re-architecture, taking scalability, high availability, and security into consideration. We decided to break the system into modular components, implement intelligent load balancing, and optimize database performance, ensuring it can handle millions of domains effortlessly.

It was clear that patchwork fixes wouldn’t cut it. The best way out was a complete re-architecture, taking scalability, high availability, and security into consideration.

There were two critical issues to address while rebuilding:

Each new domain required domain record updates via API at the registrar level. SSL certificates had to be issued and attached to the HAProxy server, which then had to be restarted manually, an inefficient process prone to failure.
The architecture’s dependency on a single HAProxy instance. Without automation, HAProxy couldn’t dynamically adjust to changing workloads. With all domains pointing to this single HAProxy server, scalability was impossible.

Even if we introduced auto-scaling for the web servers behind HAProxy, we would still need a script to automatically update HAProxy with the currently running web servers during each scaling process. Additionally, we would need to ensure that HAProxy reloads at the right time, adding further complexity.

We needed an entirely new approach that eliminated manual overhead and complexity, enabled real-time scaling, and ensured seamless handling of domains and their SSL certificates.

We explored various options, starting with AWS Route 53 and Application Load Balancer (ALB) as potential solutions. But as we evaluated them, critical limitations emerged. AWS Route 53 requires a separate hosted zone for each domain. With a million domains, that approach was impractical and cost-prohibitive. ALB, on the other hand, supported only 25 SSL certificates per listener, meaning managing even 10,000 domains would require hundreds of ALBs.

After extensive testing and research, we settled for Cloudflare SaaS.

Building a Scalable Solution with Cloudflare SaaS

Cloudflare Enterprise provides a seamless way to handle domain management, SSL certificates, and traffic routing without the overhead of maintaining a complicated infrastructure. It automatically issues and renews SSL certificates, eliminating the need for manual intervention. Unlike traditional load-balancing solutions, Cloudflare SaaS can effortlessly manage massive domain portfolios, providing instant scalability. For portfolios exceeding 5,000 domains, coordination with Cloudflare is required to enable further scaling. The Enterprise plan is designed to efficiently support extremely large domain portfolios.

Unlike traditional load-balancing solutions, Cloudflare SaaS can effortlessly manage massive domain portfolios, providing instant scalability.

Security was another major advantage. With built-in DDoS protection, a Web Application Firewall (WAF), and intelligent traffic filtering, Cloudflare can strengthen the system’s resilience. Additionally, wildcard subdomains under Cloudflare’s Enterprise plan allow flexible domain configurations, making SSL management even more efficient.

Though Cloudflare itself can act as a load balancer, the next question was how to update Cloudflare regions with web server IPs during the auto-scaling process. Adding an additional layer with the support of AWS Lambda and SNS to update Cloudflare regions with auto-scaled web servers was an option, but it would introduce unnecessary complexity. So we opted for a different approach.

Cloudflare SaaS would proxy traffic through a central domain pointing to an AWS Application Load Balancer (ALB), which would distribute requests across an auto-scalable fleet of EC2 instances. By configuring ALB rules to decouple traffic routing from application logic, scalability could be achieved without overwhelming individual components.

The next consideration was encryption in transit. Cloudflare automatically attaches SSL certificates to configured domain names, but these certificates facilitate SSL termination only at the Cloudflare level, not end-to-end. This means that data transmission from Cloudflare to the ALB is not secure by default. To maintain end-to-end encryption between Cloudflare and our infrastructure, we attached an SSL certificate to the proxy domain pointing to the ALB. Additionally, we restricted internet access to the ALB, allowing only Cloudflare by applying the appropriate security groups. This ensured that data transmission from the custom domain name to the ALB would be encrypted and secure.

We conducted a Proof of Concept (POC) to validate that the architecture met our requirements. Its success led us to the final solution.

Breaking the Monolith

Another critical part of the transformation was modularizing the application. We identified four core modules that needed separation:

Visitor-Facing Pages – The layer that handled domain parking pages. 99% of traffic was to these pages.
Admin Panel – A management interface for campaign owners.
Background Processing – Operations such as logging, analytics, and updates.
Reporting and Analytics – Real-time insights into performance and monetization.

Each module was deployed as a separate service with its own auto-scaling policies. This improved performance and ensured that high traffic on one module didn’t impact others. We also strongly recommended upgrading technologies to their latest versions and ensured that all applications functioned smoothly without issues.

By integrating ElastiCache as a caching layer for visitor-facing pages, we drastically reduced database hits and improved response times. For real-time logs and monetization tracking, we implemented AWS Kinesis with Lambda subscribers, allowing high-velocity data streams to be processed efficiently.

Additionally, we enabled proper logging and alert mechanisms for cloud services and application features, supporting effective error handling and recovery processes. Regular database and service backups were scheduled to ensure data integrity and reliability.

A Smarter Database Strategy

As mentioned earlier, the database was hosted on a set of EC2 instances following a master-slave strategy. We recommended migrating the entire database to Amazon RDS. To further optimize efficiency, we introduced Amazon RDS Proxy, a connection pooling layer that reduced the overhead of frequent database connections, improved query caching, and ensured smooth scaling. By balancing queries across multiple instances in an RDS cluster, we provided a robust foundation for future growth.

New architecture with modular components and load balancing — New architecture with modular components, load balancing, and fault-tolerance built in.

With Cloudflare SaaS managing domains and SSL, AWS ALB handling traffic distribution, and a modular, auto-scalable architecture in place, the new system was a complete transformation from the unstable, unreliable setup we started with. Downtime was eliminated, scalability was no longer a concern, and security threats were neutralized. Our client can now scale their system from hundreds of thousands to millions of domains without worrying about infrastructure failures.

Cloud Enablement

Data & AI

Digitalization

End-to-End

Digital Marketing

SaaS

Retail

Healthcare

Hospitality

Insurance

Productivity

Technology

Marketing

START A CONVERSATION