# An introduction to processor design bako@ieee.org - a program counter : PC - an accumulator - an instruction register - instruction decode and control logic - an arithmetic-logic unit - a program counter - an accumulator : ACC - an instruction register - instruction decode and control logic - an arithmetic-logic unit - a program counter - an accumulator - an instruction register : IR - instruction decode and control logic - an arithmetic-logic unit - a program counter - an accumulator - an instruction register - instruction decode and control logic - an arithmetic-logic unit - a program counter - an accumulator - an instruction register - instruction decode and control logic - an arithmetic-logic unit : ALU ## A 16-bit processor - 16-bit processor - 12-bit address space ## A 16-bit processor ### Instruction format instruction format: a 16-bit word ### Instruction format instruction format: a 16-bit word ## Instruction fetch FETCH: load new instruction to Instruction Register ## Instruction decode DECODE: decode new instruction selection and control signals generation ### Instruction execute EXECUTE: execute the instruction depending on opcode #### **Execution: load instruction** | instruction | opcode | function | |-------------|--------|---------------| | LDA A | 0000 | ACC <= MEM(A) | #### **Execution: store instruction** | instruction | opcode | function | |-------------|--------|---------------| | STO A | 0001 | MEM(A) <= ACC | #### **Execution: add instruction** | instruction | opcode | function | |-------------|--------|-------------------| | ADD A | 0010 | ACC <= ACC+MEM(A) | #### **Execution:** subtract instruction | instruction | opcode | function | |-------------|--------|-------------------| | ADD A | 0011 | ACC <= ACC-MEM(A) | #### **Execution: jump instruction** | instruction | opcode | function | |-------------|--------|----------| | JMP A | 0100 | PC <= A | $PC \le A$ IR: 4/12-bit unconditional jump to new address A #### Jump if greater or equal instruction | instruction | opcode | function | | |-------------|--------|----------|------------------| | JGE A | 0101 | PC <= A | (if $ACC >= 0$ ) | if ACC>=0 PC <= A IR: 4/12-bit conditional jump to new address A #### Jump if greater or equal instruction | instruction | opcode | function | | |-------------|--------|----------|-------------| | JNE A | 0110 | PC <= A | (if ACC!=0) | if ACC!=0 $PC \le A$ IR: 4/12-bit conditional (if ACC not zero) jump to new address A ### Execution: stop instruction | instruction | opcode | function | |-------------|--------|----------| | STP | 0111 | PC <= PC | PC <= PC IR: 4/12-bit #### Control Path & Data Path #### Control Path & Data Path P. Bakowski 23 #### Instruction fetch – IR loaded IR: 16-bit clock decoder sequencer data bus ACC: 16-bit MEM ALU PC: 12-bit address bus P. Bakowski 24 #### Instruction decode IR: 16-bit clock decoder sequencer data bus ACC: 16-bit MEM ALU PC: 12-bit address bus P. Bakowski 25 #### Instruction execute (add) IR: 16-bit clock decoder sequencer data bus ACC: 16-bit MEM ALU PC: 12-bit address bus P. Bakowski 26 A+B: normal adder output A-B: A+!B+1 (Cin=1) B: A=0, Cin=0 B+1: A=0, Cin=1 #### ALU design at logic level one bit slice of ALU - extending address space: 12 to 24 (32) bits - adding address modes - introducing stack for subprogram calls - introducing register block - introducing interruptions - extending address space: 12 to 24 (32) bits - adding new address modes - introducing stack for subprogram calls - introducing register block - introducing interruptions - extending address space: 12 to 24 (32) bits - adding address modes - introducing stack for subprogram calls - introducing register block - introducing interruptions - extending address space: 12 to 24 (32) bits - adding address modes - introducing stack for subprogram calls - introducing register block - introducing interruptions - extending address space: 12 to 24 (32) bits - adding address modes - introducing stack for subprogram calls - introducing register block - introducing interruptions # High performance processor - extending address space: 12 to 24 (32) bits - adding new address modes P. Bakowski 37 # High performance processor - extending address space: 12 to 24 (32) bits - adding new address modes # High performance processor - extending address space: 12 to 24 (32) bits - adding new address modes - data movement: load and store - data processing: logic and arithmetic - control flow: jump, conditional jump, call, return, ... - state instructions: execution mode, interruption and memory control - data movement: load and store - data processing: logic and arithmetic - control flow: jump, conditional jump, call, return, .. - state instructions: execution mode, interruption and memory control - data movement: load and store - data processing: logic and arithmetic - control flow: jump, conditional jump, call, return, ... - state instructions: execution mode, interruption and memory control - data movement: load and store - data processing: logic and arithmetic - control flow: jump, conditional jump, call, return, ... - state instructions: execution mode, interruption and memory control, ... ### Orthogonal instruction types - instruction type is a set of similar instructions: e.g. add, subtract, .. with similar addressing schemes - different instruction types are executed via different architectural blocs - the use of separate architectural blocs allows for independent execution – concurrent execution ## Orthogonal instruction types - instruction type is a set of similar instructions: e.g. add, subtract, .. with similar addressing schemes - different instruction types are executed via different architectural blocs - the use of separate architectural blocs allows for independent execution – concurrent execution # Orthogonal instruction types - instruction type is a set of similar instructions: e.g. add, subtract, .. with similar addressing schemes - different instruction types are executed via different architectural blocs - the use of separate architectural blocs allows for independent execution – concurrent execution Z zero flag 00..00 47 M mode flag [0,1] I interruption flag [0,1] ### Subprograms and system calls 53 ### Subprograms and system calls - data movement 45% - control flow 22% - arithmetic operations 14% - comparisons 13% - logic operations 5% - other 1% - data movement 45% - control flow 22% - arithmetic operations 14% - comparisons 13% - logic operations 5% - other 1% - data movement 45% - control flow 22% - arithmetic operations 14% - comparisons 13% - logic operations 5% - other 1% - data movement 45% - control flow 22% - arithmetic operations 14% - comparisons 13% - logic operations 5% - other 1% - data movement 45% - control flow 22% - arithmetic operations 14% - comparisons 13% - logic operations 5% - other 1% - data movement 45% - control flow 22% - arithmetic operations 14% - comparisons 13% - logic operations 5% - other 1% Instruction elaboration stages: instruction fetch fetch dec reg exec mem res Instruction elaboration stages: - instruction fetch - decode fetchdecregexecmemresfetchdecregexecmemres - instruction fetch - decode - read operands ``` dec exec mem res reg fetch dec exec mem reg res fetch dec reg exec mem res ``` P. Bakowski 63 execute/ calculate memory address ``` exec mem res reg exec dec mem reg res fetch dec exec mem reg res fetch dec exec mem reg res ``` read-memory memory ``` exec mem res exec mem res reg dec mem exec res reg fetch dec exec mem reg res fetch dec reg exec mem res ``` write the result #### Pipeline hazards - read after write bypass - jump instructions sequence - memory waits stalls # Pipeline hazards - read after write bypass - jump instructions sequence - memory waits stalls #### Pipeline hazards - read after write bypass - jump instructions sequence - memory waits stalls time ### Risc architecture (basics) - a fixed 32-bit instruction/word size - load-store architecture where calculation instructions operate only on registers - large register bank of 32 32-bit registers ## Risc architecture (basics) - a fixed 32-bit instruction/word size - load-store architecture where calculation instructions operate only on registers - large register bank of 32 32-bit registers ### Risc architecture (basics) - a fixed 32-bit instruction/word size - load-store architecture where calculation instructions operate only on registers - large register bank of 32 32-bit registers # Risc organization (basics) - hard-wired instruction decode logic - pipelined execution - single-cycle execution (throughput) # Risc organization (basics) - hard-wired instruction decode logic - pipelined execution - single-cycle execution (throughput) ### Risc organization (basics) - hard-wired instruction decode logic - pipelined execution - single-cycle execution (throughput) # Risc advantages - small die size - short development time - high performance regular structure ## Risc advantages - small die size - short development time - high performance simple structure # Risc advantages - small die size - short development time - high performance fast clock – simple pipeline stages ## Risc drawback Main drawback of RISC architecture is lower instruction code density than in CISC architectures #### 2 instructions #### 1 instruction # Risc drawback The solution to this problem is code compression/decompression mechanism (ARM) ## Risc drawback The solution to this problem is code compression/decompression mechanism (ARM) ### 2 instructions switching power par transition # Total dynamic power consumption $$P_c = 0.5 * f * Vdd^2 * \sum A_g * CI$$ A<sub>g</sub> - gate activity factor $$P_c = 0.5 * f * Vdd^2 * \sum A_g * CI$$ - minimize supply voltage : technology - minimize circuit activity - minimize number of gates - minimize clock frequency $$P_c = 0.5 * f * Vdd^2 * \sum A_g * CI$$ - minimize supply voltage - minimize circuit activity: utilization - minimize number of gates - minimize clock frequency $$P_c = 0.5 * f * Vdd^2 * \sum A_g * CI$$ - minimize supply voltage - minimize circuit activity - minimize number of gates : design - minimize clock frequency $$P_c = 0.5 * f * Vdd^2 * \sum A_g * CI$$ - minimize supply voltage - minimize circuit activity - minimize number of gates - minimize clock frequency : problem ! - a simple 16-bit processor model - instruction elaboration phases - instruction types - control path and data path - high performance 32-bit processor - RISC concept advantages and drawbacks - low power consumption features - a simple 16-bit processor model - instruction elaboration phases: fetch, decode, execute, write-back - instruction types: arithmetic, load/store, control - control path and data path - high performance 32-bit processor - RISC concept advantages and drawbacks - low power consumption features - a simple 16-bit processor model - instruction elaboration phases - instruction types: arithmetic, load/store, control - control path and data path - high performance 32-bit processor - RISC concept advantages and drawbacks - low power consumption features - a simple 16-bit processor model - instruction elaboration phases - instruction types: arithmetic, load/store, control - control path and data path - high performance 32-bit processor - RISC concept advantages and drawbacks - low power consumption features - a simple 16-bit processor model - instruction elaboration phases - instruction types: arithmetic, load/store, control - control path and data path - high performance 32-bit processor - RISC concept advantages and drawbacks - low power consumption features - a simple 16-bit processor model - instruction elaboration phases - instruction types: arithmetic, load/store, control - control path and data path - high performance 32-bit processor - RISC concept advantages and drawbacks - low power consumption features - a simple 16-bit processor model - instruction elaboration phases - instruction types: arithmetic, load/store, control - control path and data path - high performance 32-bit processor - RISC concept advantages and drawbacks - low power consumption features