Before executing a program in Fish, the programmer has to do a two-step preparation:
-
He has to put his program into memory
-
He has to put his data into registers.
and subsequently the program will be executed instruction by instruction by the arriving clock edges.
In Frog, we want to get rid of the second step of this preparation, ie, “put the data into registers” step. Instead, we want to put both the data and the program into the memory. Then, the program will load the required data from memory into the registers as it executes. For this, we need a new type of instruction, called “load immediate” or LDI instruction. And, in order to distinguish this new type of instruction from the previous instructions, we need a new field in our machine code, called the “opcode field“.
1. Instruction Set
In Fish, we only loaded instructions from the memory to CPU, and the data is directly loaded into registers at the start of the program . The data was 16 bit numbers, and instructions required only 9 bits. Hence we make our CPU registers 16 bits wide and memory locations 9 bits wide.
In frog, things have changed. Both data (ie, 16 bits wide) and instructions (9 bits wide) are loaded from memory into CPU. Therefore, memory locations must be widened to 16 bits to accommodate data.
As Frog has two different instructions, it needs a single bit for the opcode field. But we will allocate four bits, instead of one bit as opcode, because in future we plan to increase the number of instructions up to 16.
1.1 ALU Instruction format
ALU instructions of frog have the same fields with fish, except the opcode field, which is new.
Note that we expect an instruction to fit into a memory location, which is 16 bits. Assume that we are using 8 registers for frog. In this case, each of the Src1, Src2 and Dst fields will take three bits, which will make 9 bits in total. If we add AluOp (4 bits) and Opcode (4 bits) on top of that, the total instruction length will become 17 bits. This will not fit into a single memory location, which is a problem..
Possible solutions are
-
Reduce Opcode width to 3 bits. Downside: This will restrict the future expansion of our instruction set, as in this case we can have at most 8 instructions.
-
Reduce AluOp fields to 3 bits. Downside: This will reduce the number of available ALU operations to 8.
-
Increase the register and memory widths to 17, and use 17-bit wide data. Downside: the processor will be highly unconventional, as the memory and register widths are standardized to 8, 16, 32 bits..
-
Use two consecutive memory locations for each ALU instructions. Downside: Two memory locations contain 32 bits, and we will use 17 of them.
-
Reduce the width of src1, src2 and dst fields from 3 to 2 bits. Downside: This will reduce the number of registers from 8 to 4.
We choose solution number 5. Why? Because when we investigate reptile we will introduce a trick which allows us to have 8 registers while still keeping the opcode and aluop fields 4 bits wide and the instruction 16-bits long.
Also note that the order of fields are changed. This is for making you remember that when designing an instruction set order of fields in an instruction is not important..
1.2 LDI Instruction Format
2. Finite state machine and Control Unit.
In Fish, we had only one instruction: the ALU instruction. In Frog, the number of instructions is increased to two: ALU instruction and LDI instruction. Hence, when we bring down an instruction from memory into the CPU, it cannot be executed instantly, as in fish. First, it must be stored in some register and its opcode field has to be inspected to determine if it is a LDI or ALU instruction. In our design, this is achieved in the following way:
-
The opcode field of the instruction is stored in the control unit (cu in the FSM diagram below)
-
The rest of the instruction bits is stored in a special puropse register called the instruction register.
Next, frog must choose an appropriate course of action for each kind of instructions.
3. Architecture
Let us first draw the CPU without its control unit:
As can be seen from the circuit diagram, at the core of the frog processor is a fish processor. On top of fish, there is a new register, a new multiplexer, and some new connections:
-
In fish, we had a single type of instruction: (LD is shown as REGLD)
MicroCode
Address | FETCH | IRLD | MUX | LD(REGLD) | PCINC |
0x0 | 1 | 1 | 0 | 0 | 1 |
0x1 | 0 | 0 | 0 | 1 | 0 |
0x2 | 0 | 0 | 1 | 1 | 1 |
Adress | Data |
0x0 | 0x18 |
0x1 | 0x02 |
0x2 | 0x07 |
4. Verilog Design
Realizing Frog will be our first project in Verilog. We will take this opportunity …
4.1 How NOT to write the Frog in Verilog
When realizing a circuit idea in Verilog, students generally follow the following steps:
-
Draw a schematic of the circuit
-
Write a Verilog module corresponding to every major block in the schematic
-
connect all these modules within a top module to generate the circuit they draw.
When applied to frog, this will result in approximately the following:
-
Draw the schematic of Frog, which is Fig. …
-
Write modules for register bank, ALU, memory and control unit, etc..
-
Join all these modules on a top CPU module.
So, the resulting Verilog code will look like:
module cpu() logic IRLOAD, REGLOAD, PCINC, MUX logic [15:0] datain; logic [1:0] srcr, srcl, regdest; logic [3:0] aluop; logic [7:0] pc; logic [15:0] ir; logic [15:0] regbank[7:0]; memory mem1( .address(pc), .dataout(datain) ) control_unit ctrl( .PCINC(PCINC), .IRLOAD(IRLOAD), .REGLOAD(REGLOAD) .MUX(MUX), .din(datain[15:14]) ) always_ff @(posedge clk) if (PCINC) pc <= pc+1; always_ff @(posedge clk) if (IRLOAD) ir <= datain; assign aluop = ir[11:8] assign regdest = ir[1:0] assign srcr = ir[4:3] assign srcl = din[7:5] alu alu1(.aluinl(regbank[srcl]), .aluinr(regbank[srcr]), .aluop(aluop), .aluout(aluout) ) always_comb if (MUX) regbankin = datain; else regbankin = aluout registerbank rgb1( .REGLOAD(REGLOAD), .whichregtowrite(regdest), regbank(regbank), .din(regbankin) ); endmodule module_memory( output [15:0] dataout, input [10:0] address ) logic [15:0] memory [0:511]; assign dataout = memory[address]; initial $readmemh("prog.txt", memory); //must be exactly 512 locations endmodule module alu( input [15:0] aluinl, input [15:0] aluinr, output logic [15:0] aluout, input [3:0] aluop ) always_comb case( aluop ) 4'h0: aluout = aluinl + aluinr; 4'h1: aluout = aluinl - aluinr; 4'h2: aluout = aluinl & aluinr; 4'h3: aluout = aluinl | aluinr; 4'h4: aluout = aluinl ^ aluinr; 4'h5: aluout = ~aluinr; 4'h6: aluout = aluinr; 4'h7: aluout = aluinr + 16'h0001; 4'h8: aluout = aluinr - 16'h0001; default: aluout = 0; endcase; endmodule; module registerbank( input clk; input [15:0] din; input [3:0] whichregtowrite; input REGLOAD output logic [15:0] regbank[7:0]; ) always_ff @(posedge clk) if (REGLOAD) regbank[whichregtowrite] <= din; endmodule; module ControlUnit( output logic IRLOAD, REGLOAD, PCINC, MUX input [1:0] din; ) logic FETCH; logic [4:0] microcodeRom [0:3]; logic [1:0] command; always_ff @(posedge clk) if (FETCH) command <= din; //if I am doing fetch now, one clock later I need the microcode addr of the command. else command <= 2'b0; assign { IRLOAD, REGLOAD, PCINC, MUX, FETCH } = microcodeRom[command]; endmodule
This is a very suboptimal way of writing a Verilog code, akin to programming in assembly language. The resulting code will not be necessarily wrong, but it will be very long, inefficient and very hard to debug.
4.2 How to write the Frog in Verilog
A better way of writing verilog is not to start from the schematic, but from the finite state machine which gave rise to that schematic.
module FirstCPU(input logic clk); logic [1:0] state; logic [15:0] memory[0:511]; logic[15:0] aluinr, aluinl; logic[3:0] aluop; logic [15:0] pc, aluout, ir; logic [15:0] regbank[0:3]; localparam FETCH = 2'b00; localparam LDI = 2'b01; localparam ALU = 2'b10; always_ff @(posedge clk) case(state) FETCH: begin state<=memory[pc][13:12]; ir<=memory[pc][11:0]; pc<=pc+1; end LDI: begin state<=FETCH; regbank[ ir[1:0] ] <= memory[pc]; pc<=pc+1; end ALU: begin state<=FETCH; regbank[ ir[1:0] ] <= aluout; end endcase assign aluinl=regbank[ir[7:6]]; assign aluinr=regbank[ir[4:3]]; assign aluop=ir[11:8]; always_comb case( aluop ) 4'h0: aluout = aluinl + aluinr; 4'h1: aluout = aluinl - aluinr; 4'h2: aluout = aluinl & aluinr; 4'h3: aluout = aluinl | aluinr; 4'h4: aluout = aluinl ^ aluinr; 4'h5: aluout = ~aluinr; 4'h6: aluout = aluinr; 4'h7: aluout = aluinr + 16'h0001; 4'h8: aluout = aluinr - 16'h0001; default: aluout = 0; endcase initial begin // $readmemh("prog.txt", memory); //must be exactly 512 locations state = FETCH; end endmodule
As can be seen, there is no control unit, no register bank etc in this version of the program. All these low level details are expected to be inferred by the Verilog compiler. In other words, the programmer is only required to express the FSM in Verilog. It is the compiler’s duty to construct the schematic from this FSM. And, as compilers are much better in optimization compared to humans, the resulting circuit is likely be more efficient.
5. Assembler for Frog
#include <stdio.h> #include <stdlib.h> #include <string.h> //Converts a hexadecimal string to integer. int hex2int( char* hex) { int result=0; while ((*hex)!='\0') { if (('0'<=(*hex))&&((*hex)<='9')) result = result*16 + (*hex) -'0'; else if (('a'<=(*hex))&&((*hex)<='f')) result = result*16 + (*hex) -'a'+10; else if (('A'<=(*hex))&&((*hex)<='F')) result = result*16 + (*hex) -'A'+10; hex++; } return(result); } main() { FILE *fp; char line[100]; char *token = NULL; char *op1, *op2, *op3; char ch; int chch; int program[1000]; int counter=0; //holds the address of the machine code instruction fp = fopen("name_of_program","r"); while(fgets(line,sizeof line,fp)!= NULL) { token=strtok(line,"\n\t\r "); //get the instruction mnemonic or labe if (strcmp(token,"ldi")==0) //---------------LDI INSTRUCTION-------------------- { op1 = strtok(NULL,"\n\t\r "); //get the 1st operand of ldi, which is the register that ldi loads op2 = strtok(NULL,"\n\t\r "); //get the 2nd operand of ldi, which is the data that is to be loaded program[counter]=0x1000+hex2int(op1); //generate the first 16-bit of the ldi instruction counter++; //move to the second 16-bit of the ldi instruction if ((op2[0]=='0')&&(op2[1]=='x')) //if the 2nd operand is twos complement hexadecimal program[counter]=hex2int(op2+2)&0xffff; //convert it to integer and form the second 16-bit else if (( (op2[0])=='-') || ((op2[0]>='0')&&(op2[0]<='9'))) //if the 2nd operand is decimal program[counter]=atoi(op2)&0xffff; //convert it to integer and form the second 16-bit else //if the second operand is not decimal or hexadecimal, it is a laber or a variable. { //in this case, the 2nd 16-bits of the ldi instruction cannot be generated. printf("unrecognizable LDI offset\n"); } counter++; //skip to the next memory location } else if (strcmp(token,"add")==0) //----------------- ADD ------------------------------- { op1 = strtok(NULL,"\n\t\r "); op2 = strtok(NULL,"\n\t\r "); op3 = strtok(NULL,"\n\t\r "); chch = (op1[0]-48)| ((op2[0]-48)<<3)|((op3[0]-48)<<6); program[counter]=0x7000+((chch)&0x00ff); counter++; } else if (strcmp(token,"sub")==0) { //to be added } else if (strcmp(token,"and")==0) { //to be added } else if (strcmp(token,"or")==0) { //to be added } else if (strcmp(token,"xor")==0) { //to be added } else if (strcmp(token,"not")==0) { op1 = strtok(NULL,"\n\t\r "); op2 = strtok(NULL,"\n\t\r "); ch = (op1[0]-48)| ((op2[0]-48)<<3); program[counter]=0x7500+((ch)&0x00ff); counter++; } else if (strcmp(token,"mov")==0) { //to be added } else if (strcmp(token,"inc")==0) { op1 = strtok(NULL,"\n\t\r "); ch = (op1[0]-48)| ((op1[0]-48)<<3); program[counter]=0x7700+((ch)&0x00ff); counter++; } else if (strcmp(token,"dec")==0) { //to be added } else //------WHAT IS ENCOUNTERED IS NOT A VALID INSTRUCTION OPCODE { printf("no valid opcode\n"); } } //while fclose(fp); fp = fopen("RAM","w"); fprintf(fp,"v2.0 raw\n"); //needed for logisim, remove this line for verilog.. for (i=0;i<counter+dataarea;i++) //complete this for memory size in verilog fprintf(fp,"%04x\n",program[i]); } //main
6. Key Concepts
-
LDI instruction
-
Opcode field
-
Instruction register
-
Control unit
7. Problems
-
Add an instruction which performs immediate ALU operations. With this instruction, do we still need the LDI instruction?
-
In verilog, connect the output of register 0 to a 7-segment display.
-
This is an extension of Problem 2. Add two DIP-switches to your design. The 7-segment display will show the output of any register depending on the switch configuration.