Scaling Is Architecture, Not Instinct
Adding more servers is not scaling.
Scaling means:
- Handling traffic spikes
- Replacing failed instances
- Maintaining availability
- Controlling cost growth
Elastic systems respond automatically.
Manual scaling is operational debt.
1. Why Load Balancing Exists
Without a load balancer:
- Single instance failure = outage
- Traffic unevenly distributed
- Scaling becomes manual
- SSL termination is inconsistent
Load balancer introduces:
- Traffic distribution
- Health-based routing
- Centralized entry point
It becomes the front door of your application.
2. Application Load Balancer (ALB)
The ALB operates at Layer 7 (HTTP/HTTPS).
Responsibilities:
- Distribute traffic across instances
- Perform health checks
- Terminate SSL
- Route based on path or host
ALB should reside in public subnets.
Application instances remain private.
Traffic Flow Model
Internet
↓
Application Load Balancer
↓
Target Group (App Instances)
↓
Private Subnets
If app instances are public, segmentation is broken.
3. Target Groups
Health-Aware Distribution
Target groups define:
- Which instances receive traffic
- Which port they listen on
- Health check path
Example health check:
GET /health
If instance fails health check:
- It is removed from rotation
- No traffic is sent
- User impact is reduced
Health checks prevent cascading failures.
4. Horizontal vs Vertical Scaling
Vertical Scaling
- Increase instance size
- More CPU, more RAM
- Limited by instance type
Risk:
Single large instance becomes failure point.
Horizontal Scaling
- Add more instances
- Distribute load
- Replace failed nodes automatically
Preferred approach for cloud-native systems.
Resilience increases with distribution.
5. Auto Scaling Groups (ASG)
ASG manages:
- Desired capacity
- Minimum capacity
- Maximum capacity
- Replacement of unhealthy instances
Scaling policies can trigger based on:
- CPU utilization
- Request count
- Custom metrics
Example policy:
Scale out if CPU > 70% for 5 minutes.
Scale in if CPU < 30% for 10 minutes.
Scaling must be measured — not emotional.
6. Multi-AZ Scaling
True resilience requires:
- Instances across Availability Zones
- Load balancer spanning AZs
- ASG distributing evenly
Single-AZ scaling is partial resilience.
AZ failure should not cause outage.
7. Cost Awareness in Scaling
Scaling increases cost.
Each additional instance adds:
- Compute cost
- Network cost
- Storage cost
Poor scaling policy leads to:
- Overprovisioning
- Budget shock
- Underutilized infrastructure
Architecture must balance:
Performance
Availability
Cost
8. Failure Simulation in Scaling
Test resilience intentionally.
Scenario 1 – Instance Failure
Terminate one instance.
Observe:
- ASG launches replacement
- ALB removes failed instance
- Traffic continues
If traffic stops, design is flawed.
Scenario 2 – Traffic Spike
Simulate load:
- Use stress tool
- Generate HTTP requests
- Observe scaling event
Watch:
- Instance count increase
- CPU metrics change
- Response time stabilize
Elasticity should respond automatically.
9. Common Scaling Mistakes
- Scaling only vertically
- No health checks configured
- Single instance in ASG
- Misconfigured scaling thresholds
- Ignoring cost implications
- No monitoring during scaling
Scaling must be validated, not assumed.
10. Lab Assignment
Design and deploy:
- Application Load Balancer
- Target group with health check
- Auto Scaling Group (min 2 instances)
- Multi-AZ deployment
Simulate:
- Kill one instance.
- Generate traffic spike.
- Observe scaling.
- Record time to recover.
Document:
- How traffic is routed.
- How instance replacement occurs.
- What metric triggered scaling.
- What cost impact scaling had.
If you cannot trace scaling behavior, you do not control elasticity.
11. Production Reflection
Consider:
- What happens if scaling threshold is too low?
- What happens if health check path is wrong?
- How do you prevent scale-in during traffic dips?
- How do you protect database during scaling?
Scaling must coordinate across tiers.
Load balancing is only one layer.
Module Completion Criteria
You are ready for Module 4 when:
- Load balancer distributes traffic across AZs.
- Auto Scaling replaces failed instances.
- Scaling policy reacts predictably.
- You understand cost impact of elasticity.
- You can simulate controlled failure.
Next:
→ Module 4 – Infrastructure as Code (Terraform)