270 ## 國立成功大學九十六學年度碩士班招生考試試題 編號: 255 系所:電機工程學系丁組, 电影与迫信所甲科目:計算機組織 本試題是否可以使用計算機: □可使用 , □不可使用 (請命題老師勾選) 1. [20 points] Given the following MIPS instruction code segment, please answer each question below. | 16 | L1: | addi | \$t0, | \$t0, 4 | |----|-----|------|---------------|--------------| | 20 | | lw | <b>\$</b> s1, | 0(\$t0) | | 24 | | sw | <b>\$</b> s1, | 32(\$t0) | | 28 | | iw | \$t1, | 64(\$t0) | | 32 | | slt | \$s0, | \$t1, \$zero | | 36 | | bne | | \$zero, L1 | - (a) [10 points] Given a pipeline processor which has 5 stages: IF, ID, EX, ME, WB. Assume no forwarding unit is available. There are hazards in the code, please detect the hazards and point out where to insert no-ops (or bubbles) to make the pipeline datapath execute the code correctly. You don't need to rewrite the entire code segment. You can simply indicate the location where you would insert the no-ops. For example, if you want to insert 6 no-ops between the instruction addi at address 16 and lw at address 20, you can state something like "6 no-ops between 16 and 20". - (b) [10 points] Assume a forwarding unit is available to only forward data from ME and/or WB to EX. Please reorder/rewrite the code to maximize its performance. Note that you should consider maximizing the performance based on the assumption that the loop might be iterated a few times. You may insert no-ops in the code segment to resolve inevitable hazards if any. - 2. [20 points] Assume you are asked to design the architecture of the memory hierarchy for a computer with a 32-bit 4 GHz MIPS processor. The processor has a 64 KB 1<sup>st</sup> level cache and a 256 KB 2<sup>nd</sup> level cache on chip. The 1<sup>st</sup> level cache is 2-way associative and the 2<sup>nd</sup> level cache is 8-way associative. Assume the word size is 32 bits and the block size for both caches is 8 bytes. Assume both caches are virtually addressed. The size of the physical memory is 2 GB. The memory space is byte-addressing. Based on the given information, please answer the following questions. - (a) [4 points] Please locate virtual address 0x0000 ABCD in both caches. That is, show which set the address will be if it's in the 1<sup>st</sup> level cache and 2<sup>nd</sup> level cache, respectively. - (b) [6 points] Suppose the update policy of the 1<sup>st</sup> level cache is write allocate, write back, and LRU replacement. Execute each of the following instruction and indicate whether it's a hit or a miss for (1) to (5) on 1<sup>st</sup> level cache. (Assume initially the content of \$s0 = 0x0000 0000, \$s1 = 0xFEDC 0000, \$s2 = 0x8000 0000, and both caches are empty.) | Instruction | | | Cache hit or miss | | |-------------|-------|--------------|-------------------|----------| | lb | \$t0, | 0x001F(\$s0) | miss | _ | | lb | \$t1, | 0x801D(\$s1) | (1) | $\dashv$ | | 1b | \$t2, | 0x0018(\$s1) | (2) | $\dashv$ | | sb | \$t1, | 0x0018(\$s0) | (3) | | | 1b | \$t0, | 0x001C(\$s2) | (4) | ㅓ | | sb | \$t0, | 0x001A(\$s1) | (5) | $\dashv$ | Finally, after executing the piece of codes, has the memory been updated? Please answer yes or no. (c) [10 points] Suppose the access time to main memory with 2<sup>nd</sup> level cache disabled is 100ns, including all the miss handling. Suppose the base CPI of the processor is 2, assuming all references hit in the 1<sup>st</sup> level cache. Further assume the test program you use to test the memory hierarchy has a 4% miss rate per instruction for 1<sup>st</sup> level cache. Now with 2<sup>nd</sup> level cache enabled, the test program has a miss rate of 0.2%. Suppose the access time of 2<sup>nd</sup> level cache is 20ns for either hit or miss. How much performance improvement you will get with the 2<sup>nd</sup> level cache enabled? 276 國立成功大學九十六學年度碩士班招生考試試題 共 ⊃頁,第2頁編號: 255 系所:電機工程學系丁組, 电隔系统 (255 不可使用 ) 科目:計算機組織 本試題是否可以使用計算機: □可使用 , □不可使用 (請命題老師勾選) ## 3. [10 points] True or false: - (a) In processor implementation, single-cycle implementation is not as good as multi-cycle implementation because single-cycle implementation tends to have a longer clock cycle and higher CPI than multi-cycle implementation. - (b) Thrashing occurs if a program constantly accesses more virtual memory than it has physical memory, causing continuously swapping between memory and disk. - (c) RAID 3, 4, and 5 all have the capability of performing parallel reads and writes. - (d) Suppose a program runs in 60 seconds on a machine, with multiplication responsible for 40 seconds of the time. According to Amdahl's law, we can simply improve the speed of multiplication to have the program run at 3 times faster. - (e) The idea of using two levels of cache is that 1<sup>st</sup> level cache is to minimize the cache miss ratio and the 2<sup>nd</sup> level cache is to reduce the cache hit time. - 4. [20 points] Design a direct memory access (DMA) controller in a multi-master bus-based system. - (a) Show a generic design that can be used for transferring data between the main memory and the I/O. Specify the functionality of the registers used and the interface signals of the DMA controller. (10 points) - (b) Using the interface signals, elaborate the DMA operations that transfer a block of data from memory to the I/O. (10 points) | 5. | [30 pc | 30 points] Fill in the appropriate term or terminology for the underline fields: (6 points each) | | | | | |----|--------|--------------------------------------------------------------------------------------------------|--|--|--|--| | | (a) | move \$s1, \$zero = addi, | | | | | | | (b) | CPU execution time = Instruction count × × clock cycle time. | | | | | | | (c) | After a silicon ingot is sliced, it is called a | | | | | | | (d) | For a 32-bit register, if the least significant byte (B0) is stored at memory address 4N where | | | | | | | | N is an integer $\geq 0$ , this storage order is called endian. | | | | | | | (e) | For a 32-bit register, if the least significant byte (B0) is stored at memory address 4N+3 | | | | | | | | where N is an integer $\geq 0$ , this storage order is called endian. | | | | |