Skip to content

Scaling Is Architecture, Not Instinct

Adding more servers is not scaling.

Scaling means:

  • Handling traffic spikes
  • Replacing failed instances
  • Maintaining availability
  • Controlling cost growth

Elastic systems respond automatically.

Manual scaling is operational debt.


1. Why Load Balancing Exists

Without a load balancer:

  • Single instance failure = outage
  • Traffic unevenly distributed
  • Scaling becomes manual
  • SSL termination is inconsistent

Load balancer introduces:

  • Traffic distribution
  • Health-based routing
  • Centralized entry point

It becomes the front door of your application.


2. Application Load Balancer (ALB)

The ALB operates at Layer 7 (HTTP/HTTPS).

Responsibilities:

  • Distribute traffic across instances
  • Perform health checks
  • Terminate SSL
  • Route based on path or host

ALB should reside in public subnets.

Application instances remain private.


Traffic Flow Model

Internet

Application Load Balancer

Target Group (App Instances)

Private Subnets

If app instances are public, segmentation is broken.


3. Target Groups

Health-Aware Distribution

Target groups define:

  • Which instances receive traffic
  • Which port they listen on
  • Health check path

Example health check:

GET /health

If instance fails health check:

  • It is removed from rotation
  • No traffic is sent
  • User impact is reduced

Health checks prevent cascading failures.


4. Horizontal vs Vertical Scaling

Vertical Scaling

  • Increase instance size
  • More CPU, more RAM
  • Limited by instance type

Risk:
Single large instance becomes failure point.


Horizontal Scaling

  • Add more instances
  • Distribute load
  • Replace failed nodes automatically

Preferred approach for cloud-native systems.

Resilience increases with distribution.


5. Auto Scaling Groups (ASG)

ASG manages:

  • Desired capacity
  • Minimum capacity
  • Maximum capacity
  • Replacement of unhealthy instances

Scaling policies can trigger based on:

  • CPU utilization
  • Request count
  • Custom metrics

Example policy:

Scale out if CPU > 70% for 5 minutes.

Scale in if CPU < 30% for 10 minutes.

Scaling must be measured — not emotional.


6. Multi-AZ Scaling

True resilience requires:

  • Instances across Availability Zones
  • Load balancer spanning AZs
  • ASG distributing evenly

Single-AZ scaling is partial resilience.

AZ failure should not cause outage.


7. Cost Awareness in Scaling

Scaling increases cost.

Each additional instance adds:

  • Compute cost
  • Network cost
  • Storage cost

Poor scaling policy leads to:

  • Overprovisioning
  • Budget shock
  • Underutilized infrastructure

Architecture must balance:

Performance
Availability
Cost


8. Failure Simulation in Scaling

Test resilience intentionally.

Scenario 1 – Instance Failure

Terminate one instance.

Observe:

  • ASG launches replacement
  • ALB removes failed instance
  • Traffic continues

If traffic stops, design is flawed.


Scenario 2 – Traffic Spike

Simulate load:

  • Use stress tool
  • Generate HTTP requests
  • Observe scaling event

Watch:

  • Instance count increase
  • CPU metrics change
  • Response time stabilize

Elasticity should respond automatically.


9. Common Scaling Mistakes

  • Scaling only vertically
  • No health checks configured
  • Single instance in ASG
  • Misconfigured scaling thresholds
  • Ignoring cost implications
  • No monitoring during scaling

Scaling must be validated, not assumed.


10. Lab Assignment

Design and deploy:

  • Application Load Balancer
  • Target group with health check
  • Auto Scaling Group (min 2 instances)
  • Multi-AZ deployment

Simulate:

  1. Kill one instance.
  2. Generate traffic spike.
  3. Observe scaling.
  4. Record time to recover.

Document:

  • How traffic is routed.
  • How instance replacement occurs.
  • What metric triggered scaling.
  • What cost impact scaling had.

If you cannot trace scaling behavior, you do not control elasticity.


11. Production Reflection

Consider:

  • What happens if scaling threshold is too low?
  • What happens if health check path is wrong?
  • How do you prevent scale-in during traffic dips?
  • How do you protect database during scaling?

Scaling must coordinate across tiers.

Load balancing is only one layer.


Module Completion Criteria

You are ready for Module 4 when:

  • Load balancer distributes traffic across AZs.
  • Auto Scaling replaces failed instances.
  • Scaling policy reacts predictably.
  • You understand cost impact of elasticity.
  • You can simulate controlled failure.

Next:

→ Module 4 – Infrastructure as Code (Terraform)

Back To Top
Search
error: Content is protected !!