Computer Organization II - Spring 2010, Exercise 5

Computer Organization II, Spring 2010, Exercise 5

These homework exercises will be covered in practice session on Wednesday 17.2.2010.

This fifth week's homework covers computer Processor Structure and Functions, Pipelining and RISC (Sections 12 and 13).

SPECIAL TASK:

No submission on this week. Next submission on Wednesday 24.2.2010 during the exercise meeting.

Learning diary
Continue maintaining the learning diary through out the course.
Article
Next weeks submission is about the computer architecture. You may select one of these articles as the base of your text
1. C. Webb: IBM z10: the next-generation mainframe microprocessor. IEEE Micro, March-April 2008, pp. 19-29
  OR
2. J. Andrews and N. Baker: Xbox 360 system architecture. IEEE Micro, March-April 2006, pp. 25-37
The IEEE Micro's March/April issue each year seems to be a special issue on new processor architectures. These articles are from different years and the processors are targeted for different markets.
Please select the article based on your interest. The writing task is going to be more or less the same. You will be asked to comment on the article and compare the presented processor with the ones covered in the class.

HOMEWORK:

Branch prediction state diagrams
1. Problem 12.13 [Stal09 & Stal06] (12.5 [Stal03])
  Consider the state diagram of Figure 12.28.
  1. Describe the behavior of each.
  2. Compare these with the branch prediction state diagram in Section 12.4. Discuss the relative merits of each of the three approaches to branch prediction.
2. Give an example situation for each of the three diagram, where it is better than the two others.
Pipelines
1. Problem 12.8: Assume a pipeline with 4 stages: fetch instruction (FI), decode instruction and calculate address (DA), fetch operand (FO), and execute (EX). Draw a diagram similar to Fig 12.10 (see slide 18) for a sequence of 7 instructions, in which the third instruction is a branch that is taken and in which there are no data dependencies.
2. Problem 12.9: A pipelined processor has a clock rate of 3GHz and executes a program with 3 million instructions. The pipeline has nine stages, and instructions are issued at a rate of one per clock cycle. Ignore penalties due to branch instructions and out-of-sequence executions.
  1. What is the speedup of this processor of this program compared to a non-pipelined processor, making the same assumptions used in Section 12.4?
  2. What is throughtput (in MIPS) of the pipelined processor?
Conditional branches and prefetching
1. A computer with five-stage pipeline deals with conditional branches by stalling for the next three cycles after hitting one. How much does stalling hurt the performance if 20% of all instructions are conditional branches? Ignore all sources of stalling except conditional branches. (Tan06:4.27)
2. Suppose that a computer prefetches up to 20 instructions in advance. However, on the average, four of these are conditional branches, each with a probability of 90% of being predicted correctly. What is the probability that the prefetching is on the right track? (Tan06:4.28)
Problem 13.6
Consider the following loop:
```
   S:=0;
   for K:=1 to 100 do
      S:=S-K;
```
A straightforward translation of this into a generic assemblylanguage would look something like this:
```
     LD     R1,0           ;keep value of S in R1
     LD     R2,1           ;keep value of K in R2
LP   SUB    R1,R1,R2       ;S:=S-K
     BEQ    R2,100,EXIT    ;done if K=100
     ADD    R2,R2,1        ;else increment K
     JMP LP                ;back to the start of loop
```
A compiler for a RISC machine will introduce delay slots in this code so that the processor can employ delayed branch mechanism. The JMP instruction is easy to dceal with, because this instruction is always followed by the SUB instruction; therefore we can simply place a copy of the SUB instruction in the delay slot after the JMP. The BEQ presents a difficulty. We can't leave the code as is, because the ADD instruction would then be exectued one too many time. Therefore a NOP instructions is needed
1. Show the resulting code
2. Can you think of alternative way to modify the code so that the delayed branch mechanism can be used?
Problem 13.7
A RISC machine may do both a mapping of symbolic registers to actual registers and a rearrangement of instructions for pipeline efficiency. An interesting question arises a to the order in which these operations should be done. Consider the following program fragment:
```
            LD     SR1, A
            LD     SR2, B
            ADD    SR3, SR1, SR2
            LD     SR4, C
            LD     SR5, D
            ADD    SR6, SR4, SR5
```
1. First do the register mapping and then any possible instruction reordering. How many machine registers are used? Has there been any pipeline improvement?
2. Starting with the original program, now do the instruction reordering and then any possible mapping. How many machine registers are used? Has there been any pipeline improvement?

Tiina.Niklander@cs.helsinki.fi

Department of Computer Science

Department information

Computer Organization II, Spring 2010, Exercise 5

SPECIAL TASK:

HOMEWORK: