Thomas Gassmann (ETH)

Reverse engineering set-associative caches

2023-06-14 - 6 min read

Digital Design and Computer Architecture has been one of the more interesting courses I took at ETH this semester. Even if there were a lot of readings, learning how a computer works from the ground up and building a simple single-cycle MIPS processor has been a very rewarding experience (I am not including Vivado here!).

While solving old exams and the not-so-optional optional homeworks I came across some of the "cache reverse-engineering" tasks. In my opinion, they can be a bit confusing, especially in the beginning, because the search space of cache parameters is quite large and it's not always immediately obvious which configuration one should try at first. Thus, the main purpose of this post is to provide a basic cache simulator where one can try inputting a cache access sequence and get immediate feedback on e.g. the hit rate. I will only give a very brief introduction of how set-associative caches work and of how the cache reverse-engineering tasks are structured.

Set-associative caches

Set-associative caches are a compromise between direct-mapped and fully associative cache designs. They divide the cache into multiple sets, each containing a fixed number of cache lines or ways (called the associativity). Each cache line within a set can hold a memory block from main memory. The number of ways (i.e. the associativity) in a set is typically a power of two.

When a memory request is made by the processor, the address gets split into three parts: the tag, the index, and the offset. The index bits select the set, the offset bits determine the position within a cache line/block and the tag differentiates blocks that are mapped to the same cache set.

During a lookup the cache controller can perform a parallel search across all ways within the selected set using the index bits. It compares the tag bits of each cache line with the tag bits extracted from the memory address. If a match is found, we have a cache hit.

There are also different strategies determining how cache conflicts are handled. Here we will consider LRU and FIFO:

  • LRU: the cache line (in the corresponding set) that hasn't been accessed for the longest time is evicted
  • FIFO: the cache line that was first introduced to the set is evicted

The selected cache line is then overwritten with the new data fetched from the main memory.

To illustrate this, consider the following figure from the Harris and Harris book "Digital Design and Computer Architecture".

image

The above figure shows a two-way set associative cache of size 8. One clearly sees the four sets, each having two entries (the associativity). Each cache line also has a V bit determining whether something is currently stored in the block. When a cache lookup is performed, the two set bits index into the corresponding set while the tag bits are used to compare which of the two cache blocks in the set match (or whether any of them matches at all). Depending on the Hit1Hit_1 or Hit0Hit_0 signal we can then use a multiplexer to select the correct cache block from the set (or fetch it from memory if HitHit is low).

A good overview of the modular arithmetic involved in set-associative caches can also be found here, but in short the relationship between associativity AA, cache size CC, block size BB and number of sets SS is given by:

S=CABS = \frac{C}{A \cdot B}

Since the number of sets SS is uniquely determined by C,A,BC, A, B, these are the only variables one can specify in the simulator. To learn more about set-associative caches I recommend reading the Harris & Harris book or watch the lecture (a very high-bandwidth lecture, that is) here on YouTube.

Reverse engineering the cache configuration

Now for the reverse-engineering part. The basic idea is that given a sequence of accesses one can reverse-engineer the above cache parameters. Let's say you are given the following sequence of cache accesses:

Access number#0#1#2#3#4#5#6#7
address01624251024255110305
block number (for block size 16)01116415619
block number (for block size 32)000032739

Also assume that the cache is empty before this access pattern is executed. Let's say we have a hit rate of 2/8 after the sequence is executed. What can we say about e.g. the block size?

Assume our options are B{16,32}B \in \{16, 32\}. If B=32B = 32, #0 would miss, but #1, #2 and #3 would hit, giving us a hit rate greater than (or equal to) 3/8. Therefore B=32B = 32 is not the configuration we are looking for. On the other hand, if B=16B = 16, we get exactly two hits (accesses #2 and #3), which is what we're looking for. So we've just determined the cache block size based on the access sequence and the corresponding hit rate above.

The same can also be done for the associativity, (total) cache size and the replacement policy. Since I just wanted to give the basic idea of how properties of a cache can be determined, I won't describe how to do this for other cache configuration properties here.

To learn more about how to determine them, I personally found the problem solving sessions quite useful. Some useful resources were also shared on the ETH D-INFK Discord server.

The simulator

Unfortunately, to be entirely sure that a given cache configuration matches the the access pattern and its associated hit rate a lot of probing can be required. This is especially true because different parameters can influence each other. The cache simulator below can be used to play around with different configurations and see whether a given access pattern produces the desired hit rate:

hit rate: 0 / 0

tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV
tag:
NV
block:
NV

The above cache simulator is quite basic and I didn't thoroughly test its functionality, but hopefully it helps visualizing some of the tasks that were given in the optional homeworks. There's also a nice (comparatively more feature-rich) cache simulator from the University of Washington I discovered after writing the above cache simulator.