Design: A Case Study(cont) 软中国研究院 // Local delivery or routing 工f( LocalDelivery(….)) Deliver(.) 1 else t Route(….) // Send SMTP response through Socket 工f( Writers1e(.)) // various housekeeping skips
Design: A Case Study (cont.) // Local delivery or routing If (LocalDelivery( … )) { Deliver( … ); } else { Route( … ); } // Send SMTP response through Socket If (WriteFile(…)) { // various housekeeping skips… }
Traditional Thread architecture Microsoft )救中国研究院 I thread to receive and dispatch SMTP request 64 worker threads doing Worker Parse SMTP headers Thread SMTP Request Parse sMTP bodies Receiver (Other workers) Socket Local delivery Worker Thread Routing All in the same thread sequentially
Traditional Thread Architecture SMTP Request Receiver (Socket) Worker Thread Worker Thread (Other workers) • 1 thread to receive and dispatch SMTP request • 64 worker threads doing: – Parse SMTP headers – Parse SMTP bodies – Local delivery – Routing – All in the same thread sequentially…
The Evolution of hardware )救中国研究院 Relative Performance(Latency) 800 8600 日cPU E400 口RAM 200 口Disk 0 19921994199619982000 Time
The Evolution of Hardware Relative Performance (Latency) 0 200 400 600 800 1992 1994 1996 1998 2000 Time Performance CPU RAM Disk
Bridge the Gap-Caches 软中国研究院 CPU LI cache 8K instruction cache, plus 8K data cache Closely coupled 0.333 clock/instruction -practical 1 CPI CPU L2 cache 512K static RAM Coupled with full clock-speed, 64-bit, cache bus Latency: 4-1-1-1-7 clocks/instruction 1O caches(RAM based file caches)
Bridge the Gap - Caches • CPU L1 cache – 8K instruction cache, plus – 8K data cache – Closely coupled – 0.333 clock/instruction – practical 1 CPI • CPU L2 cache – 512K static RAM – Coupled with full clock-speed, 64-bit, cache bus – Latency: 4-1-1-1 – 7 clocks/instruction • I/O caches (RAM based file caches)
The Price of failure )救中国研究院 Let's look at the costs Assume I second to zero a register LI cache hit-1 second(1x L2 cache hit- 4 seconds(plus 3 seconds extra work-7x) RAM hit-25-150 seconds(24x-150x) Disk or net hit -3 weeks(2, 000,000x)
The Price of Failure • Let’s look at the costs: – Assume 1 second to zero a register – L1 cache hit - 1 second (1x) – L2 cache hit - 4 seconds (plus 3 seconds extra work - 7x) – RAM hit - 25-150 seconds (24x-150x) – Disk or net hit - 3 weeks (2,000,000x)