Scaling Is Architecture, Not Instinct

Adding more servers is not scaling.

Scaling means:

Handling traffic spikes
Replacing failed instances
Maintaining availability
Controlling cost growth

Elastic systems respond automatically.

Manual scaling is operational debt.

1. Why Load Balancing Exists

Without a load balancer:

Single instance failure = outage
Traffic unevenly distributed
Scaling becomes manual
SSL termination is inconsistent

Load balancer introduces:

Traffic distribution
Health-based routing
Centralized entry point

It becomes the front door of your application.

2. Application Load Balancer (ALB)

The ALB operates at Layer 7 (HTTP/HTTPS).

Responsibilities:

Distribute traffic across instances
Perform health checks
Terminate SSL
Route based on path or host

ALB should reside in public subnets.

Application instances remain private.

Traffic Flow Model

Internet
   ↓
Application Load Balancer
   ↓
Target Group (App Instances)
   ↓
Private Subnets

If app instances are public, segmentation is broken.

3. Target Groups

Health-Aware Distribution

Target groups define:

Which instances receive traffic
Which port they listen on
Health check path

Example health check:

GET /health

If instance fails health check:

It is removed from rotation
No traffic is sent
User impact is reduced

Health checks prevent cascading failures.

4. Horizontal vs Vertical Scaling

Vertical Scaling

Increase instance size
More CPU, more RAM
Limited by instance type

Risk:
Single large instance becomes failure point.

Horizontal Scaling

Add more instances
Distribute load
Replace failed nodes automatically

Preferred approach for cloud-native systems.

Resilience increases with distribution.

5. Auto Scaling Groups (ASG)

ASG manages:

Desired capacity
Minimum capacity
Maximum capacity
Replacement of unhealthy instances

Scaling policies can trigger based on:

CPU utilization
Request count
Custom metrics

Example policy:

Scale out if CPU > 70% for 5 minutes.

Scale in if CPU < 30% for 10 minutes.

Scaling must be measured — not emotional.

6. Multi-AZ Scaling

True resilience requires:

Instances across Availability Zones
Load balancer spanning AZs
ASG distributing evenly

Single-AZ scaling is partial resilience.

AZ failure should not cause outage.

7. Cost Awareness in Scaling

Scaling increases cost.

Each additional instance adds:

Compute cost
Network cost
Storage cost

Poor scaling policy leads to:

Overprovisioning
Budget shock
Underutilized infrastructure

Architecture must balance:

Performance
Availability
Cost

8. Failure Simulation in Scaling

Test resilience intentionally.

Scenario 1 – Instance Failure

Terminate one instance.

Observe:

ASG launches replacement
ALB removes failed instance
Traffic continues

If traffic stops, design is flawed.

Scenario 2 – Traffic Spike

Simulate load:

Use stress tool
Generate HTTP requests
Observe scaling event

Watch:

Instance count increase
CPU metrics change
Response time stabilize

Elasticity should respond automatically.

9. Common Scaling Mistakes

Scaling only vertically
No health checks configured
Single instance in ASG
Misconfigured scaling thresholds
Ignoring cost implications
No monitoring during scaling

Scaling must be validated, not assumed.

10. Lab Assignment

Design and deploy:

Application Load Balancer
Target group with health check
Auto Scaling Group (min 2 instances)
Multi-AZ deployment

Simulate:

Kill one instance.
Generate traffic spike.
Observe scaling.
Record time to recover.

Document:

How traffic is routed.
How instance replacement occurs.
What metric triggered scaling.
What cost impact scaling had.

If you cannot trace scaling behavior, you do not control elasticity.

11. Production Reflection

Consider:

What happens if scaling threshold is too low?
What happens if health check path is wrong?
How do you prevent scale-in during traffic dips?
How do you protect database during scaling?

Scaling must coordinate across tiers.

Load balancing is only one layer.

Module Completion Criteria

You are ready for Module 4 when:

Load balancer distributes traffic across AZs.
Auto Scaling replaces failed instances.
Scaling policy reacts predictably.
You understand cost impact of elasticity.
You can simulate controlled failure.

→ Module 4 – Infrastructure as Code (Terraform)