





#### Goals

- I want my memory lightning fast
- I want my memory to be gigantic in size
- Register access viewpoint
  - data access as fast as HW register
  - data size as large as memory
- Memory access viewpoint
  - data access as fast as memory
  - data size as large as disk



HW solution



HW help for SW solution





#### Principle of locality (paikallisuus)

- In any given time period, memory references occur only to a small subset of the whole address space
- = The reason why memory hierarchies work



- Average cost is close to the cost of small data set
- How to determine data for that small set?
- How to keep track of it?

Sta06 Fig 4.2



### **Principle of locality**

- In any given time period
  - memory references occur only to a small subset of the whole address space
- Temporal locality (ajallinen)
  - it is likely that a data item referenced a <u>short time ago</u> will be referenced <u>again</u> soon
- Spatial locality (alueellinen)
  - it is likely that a data items <u>close</u> to the one referenced a short time ago will be referenced soon

MEM:





# **Computer Organization II**

# Cache





### Cache Memory (välimuisti)

- How to access main memory as fast as registers?
- Locality → Use (CPU) cache!
  - Keep most probably referenced data in fast cache close to processor, and rest in memory
  - Most of data accesses only to cache
    - hit ratio 0.9-0.99
  - Cache is much smaller than main memory
  - Cache is (much) more expensive (per byte) than memory









#### **Cache Design**

Cache Size

**Mapping Function** Direct Associative Set Associative Line Size

Replacement Algorithm

Number of caches Least recently used (LRU) Single or two level First in first out (FIFO) Unified or split Least frequently used (LFU)

Write Policy

Write through

Write back

Write once

Random

■ Cache Size & Line Size

■ Many blocks help for temporal locality

■ Large blocks help for spatial locality

■ Larger cache is slower

■ Multi-level cache

Typical sizes:

L1: 8 KB - 64 KB

L2: 256KB - 8 MB

(Sta06 Table 4.2)



#### **Mapping**

- Which block contains the memory location?
- Is the block in cache?
- Where is it located?
- Solutions
  - direct mapping (suora kuvaus)
  - fully associative mapping (täysin assosiatiivinen)
  - set associative mapping (joukkoassosiatiivinen)

Cache simulation tools:

http://www.ecs.umass.edu/ece/koren/architecture/Cache/frame0.htm













#### **Fully Associative Mapping**

- Lots of circuits
  - tag fields are long wasted space?
  - each cache line tag must be compared <u>parallelly</u>
  - with the memory address tag
    - lots of wires, comparison circuits
    - large surface area on chip
- Final comparison "or" has large gate delay
  - did any of these 64 comparisons match?
    - log2(64) = 6 levels of binary OR-gates
  - how about 262144 comparisons?
    - 18 levels?
- ⇒ Can use it only for small caches









#### **Set Associative Mapping**

- Set associative cache with set size k=2
  - = 2-way cache (common)
- Degree of associativity = nbr of blocks in a set = v
  - Large degree of associativity?
    - More data items in one set
    - Less "collisions" within set
    - Final comparison (matching tags?) gate delay?
  - Maximum (nr of cache lines)
    - ⇒ fully associative mapping

Whole cache is one set!

■ Minimum (1)

Each cache line is a set!

⇒ direct mapping



#### **Cache Replacement Algorithm**

- Which cache block to replace to make room for new block from memory?
- Direct mapping: trivial
- First-In-First-Out (FIFO)?
- Least-Frequently-Used (LFU)?
- Random?
- Which one is best / possible?
  - Chip area?
  - Fast? Easy to implement?



### Cache Write Policy - memory writes?

- Write through (*läpikirjoittava*)
  - Each write goes always to cache and memory
  - Each write is a cache miss!
- Write back (lopuksi/takaisin kirjoittava)
  - Each write goes only to cache
  - Write cache block back to memory
    - only when it is replaced in cache

      A bit set
  - Memory may have stale (old) data
  - cache coherence problem (yhdenmukaisuus, yhtäpitävyys)
- Write once ("vain kerran kirjoittava?")
  - Write-invalidate Snoopy-cache coherence protocol for multiprocessors
  - Write invalidates data in other caches
  - Write to memory at replacement time, or when some other cache needs it (has read/write miss)

Coherence

More users of the same data:

memory valid? cache valid?

- multiple

processors

with own caches





## **Types and Number of Caches**

- Same cache for <u>data</u> and <u>code</u>, or not?
  - Data references and code references behave differently
- Unified vs. split cache (yhdistetty/erilliset)
  - split cache: can optimise structure separately for data and code

    Trend towards split caches: Pentium, Power PC,... (instruction pipelining)
- One cache too large for best results
- Multiple levels of caches
  - L1 on same chip as CPU
  - L2 on same package or chip as CPU
    - older systems: same board
  - L3 on same board as CPU







### **Main Memory Types**

| Метогу Туре                            | Category           | Erasure                   | Write Mechanism | Volatility  |
|----------------------------------------|--------------------|---------------------------|-----------------|-------------|
| Random-access<br>memory (RAM)          | Read-write memory  | Electrically, byte-level  | Electrically    | Volatile    |
| Read-only<br>memory (ROM)              | - Read-only memory | Not possible              | Masks           |             |
| Programmable<br>ROM (PROM)             |                    |                           | Electrically    | Nonvolatile |
| Erasable PROM<br>(EPROM)               | Read-mostly memory | UV light, chip-level      |                 |             |
| Electrically Erasable<br>PROM (EEPROM) |                    | Electrically, byte-level  |                 |             |
| Flash memory                           |                    | Electrically, block-level |                 |             |

(Sta06 Table 5.1)

- Random access semiconductor memory
  - Direct access to each memory cell
  - Access time same for all cells



#### **RAM**

■ Dynamic RAM, DRAM

Analog: Charge on capacitors

- Periodic refreshing required
- Refresh required after read
- Simpler, slower, denser, bigger (bytes per chip)
- Access time ~ 60 ns
- Main memory? (early systems)
- Static RAM, SRAM

Digital: flip-flop gates

- No periodic refreshing needed
- Data remains until power is lost
- More complex (more chip area/byte), faster, smaller
- Access time ~ 2-5 ns
- Level 2 cache?







#### **SDRAM (Synchronous DRAM)**

- CPU clock synchronizes also the bus
  - Runs on higher clock speeds than ordinary DRAM
  - CPU knows how long it takes to make a reference,
  - can do other work while waiting
- 16 bits in parallel
  - Access 4 DRAMs (4 bits each) in parallel
  - Access time ~ 18 ns, transfer rate ~ 1.3 GB/s
- DDR SDRAM, double data rate
  - Current main memory technology
  - Supports transfers both on rising and falling edge of the clock cycle
  - Consumes less power
  - Access time ~ 12 ns, transfer rate ~ 3.2 GB/s





#### Flash memory

- Based on transistors that are separated by a thin oxide layer
  - Flash cell is analog, not digital storage: uses different charge levels to store 2 (or more) bits in each cell
- Non-volatile, data remains with power off
  - Electrical erasing in blocks = "flash"
  - Slow to write
  - Access time ~ 50 ns
- Used as a solid state storage
  - No moving parts
  - FlashBIOS in PC's, USB-memory
  - In phones, digital cameras, hand-held devices,....





#### **MRAM**

- Magnetoresistive Random Access Memory (MRAM)
  - Data stored with magnetic fields on two plates
  - Magnetic field directions determine bit value
- Non-volatile, data remains with power off
  - Fast to read/write
  - No upper limit for write counts (Flash has upper limit)
  - Access time comparable to DRAM
  - Almost as fast as SRAM
- Future open
  - Small market share now
  - Expensive now (2006: \$25 4Mbit)
  - Still under development
  - May replace flash in a few years
  - May replace SRAM later on
  - May replace DRAM and become "universal memory"



http://www.research.ibm.com/journal/rd/501/maffitt.html



# Kertauskysymyksiä/Review questions

- Memory hierarcy and principle of locality?
- Different ways to use locality in cache solutions?
- Differences of associative and set associative mappings?
- Why to have separate caches for instructions and data?
- Muistihierarkia ja paikallisuus?
- ☐ Millä tavoin paikallisuutta huomiodaan välimuistiratkaisussa?
- Assosiatiivisen ja joukkoassosiatiivisen kuvauksen erot?
- Miksi käskyille oma välimuisti ja datalle oma?