Frog: A Calculator that can load data into registers

Before executing a program in Fish, the programmer has to do a two-step preparation:

He has to put his program into memory
He has to put his data into registers.

and subsequently the program will be executed instruction by instruction by the arriving clock edges.

In Frog, we want to get rid of the second step of this preparation, ie, “put the data into registers” step. Instead, we want to put both the data and the program into the memory. Then, the program will load the required data from memory into the registers as it executes. For this, we need a new type of instruction, called “load immediate” or LDI instruction. And, in order to distinguish this new type of instruction from the previous instructions, we need a new field in our machine code, called the “opcode field“.

1. Instruction Set

In Fish, we only loaded instructions from the memory to CPU, and the data is directly loaded into registers at the start of the program . The data was 16 bit numbers, and instructions required only 9 bits. Hence we make our CPU registers 16 bits wide and memory locations 9 bits wide.

In frog, things have changed. Both data (ie, 16 bits wide) and instructions (9 bits wide) are loaded from memory into CPU. Therefore, memory locations must be widened to 16 bits to accommodate data.

As Frog has two different instructions, it needs a single bit for the opcode field. But we will allocate four bits, instead of one bit as opcode, because in future we plan to increase the number of instructions up to 16.

1.1 ALU Instruction format

ALU instructions of frog have the same fields with fish, except the opcode field, which is new.

Note that we expect an instruction to fit into a memory location, which is 16 bits. Assume that we are using 8 registers for frog. In this case, each of the Src1, Src2 and Dst fields will take three bits, which will make 9 bits in total. If we add AluOp (4 bits) and Opcode (4 bits) on top of that, the total instruction length will become 17 bits. This will not fit into a single memory location, which is a problem..

Possible solutions are

Reduce Opcode width to 3 bits. Downside: This will restrict the future expansion of our instruction set, as in this case we can have at most 8 instructions.
Reduce AluOp fields to 3 bits. Downside: This will reduce the number of available ALU operations to 8.
Increase the register and memory widths to 17, and use 17-bit wide data. Downside: the processor will be highly unconventional, as the memory and register widths are standardized to 8, 16, 32 bits..
Use two consecutive memory locations for each ALU instructions. Downside: Two memory locations contain 32 bits, and we will use 17 of them.
Reduce the width of src1, src2 and dst fields from 3 to 2 bits. Downside: This will reduce the number of registers from 8 to 4.

We choose solution number 5. Why? Because when we investigate reptile we will introduce a trick which allows us to have 8 registers while still keeping the opcode and aluop fields 4 bits wide and the instruction 16-bits long.

Also note that the order of fields are changed. This is for making you remember that when designing an instruction set order of fields in an instruction is not important..

1.2 LDI Instruction Format

2. Finite state machine and Control Unit.

In Fish, we had only one instruction: the ALU instruction. In Frog, the number of instructions is increased to two: ALU instruction and LDI instruction. Hence, when we bring down an instruction from memory into the CPU, it cannot be executed instantly, as in fish. First, it must be stored in some register and its opcode field has to be inspected to determine if it is a LDI or ALU instruction. In our design, this is achieved in the following way:

The opcode field of the instruction is stored in the control unit (cu in the FSM diagram below)
The rest of the instruction bits is stored in a special puropse register called the instruction register.

Next, frog must choose an appropriate course of action for each kind of instructions.

3. Architecture

Let us first draw the CPU without its control unit:

As can be seen from the circuit diagram, at the core of the frog processor is a fish processor. On top of fish, there is a new register, a new multiplexer, and some new connections:

In fish, we had a single type of instruction: (LD is shown as REGLD)

MicroCode

Address	FETCH	IRLD	MUX	LD(REGLD)	PCINC
0x0	1	1	0	0	1
0x1	0	0	0	1	0
0x2	0	0	1	1	1

Adress	Data
0x0	0x18
0x1	0x02
0x2	0x07

4. Verilog Design

Realizing Frog will be our first project in Verilog. We will take this opportunity …

4.1 How NOT to write the Frog in Verilog

When realizing a circuit idea in Verilog, students generally follow the following steps:

Draw a schematic of the circuit
Write a Verilog module corresponding to every major block in the schematic
connect all these modules within a top module to generate the circuit they draw.

When applied to frog, this will result in approximately the following:

Draw the schematic of Frog, which is Fig. …
Write modules for register bank, ALU, memory and control unit, etc..
Join all these modules on a top CPU module.

So, the resulting Verilog code will look like:

module cpu()
logic IRLOAD, REGLOAD, PCINC, MUX
logic [15:0] datain;
logic [1:0]  srcr, srcl, regdest;
logic [3:0]  aluop;
logic  [7:0]  pc;
logic  [15:0] ir;
logic [15:0] regbank[7:0];

   memory mem1( .address(pc), .dataout(datain) )

   control_unit ctrl( .PCINC(PCINC), .IRLOAD(IRLOAD), .REGLOAD(REGLOAD) .MUX(MUX), .din(datain[15:14]) )


   always_ff @(posedge clk)
        if (PCINC)
           pc <= pc+1;

   always_ff @(posedge clk)
        if (IRLOAD)
           ir <= datain;

    assign  aluop = ir[11:8]
    assign  regdest = ir[1:0]
    assign  srcr = ir[4:3]
    assign  srcl = din[7:5]

    alu alu1(.aluinl(regbank[srcl]), .aluinr(regbank[srcr]), .aluop(aluop), .aluout(aluout) )

    always_comb
        if (MUX)
            regbankin = datain;
        else
            regbankin = aluout

    registerbank rgb1( .REGLOAD(REGLOAD), .whichregtowrite(regdest), regbank(regbank), .din(regbankin) );
endmodule



module_memory(
output [15:0] dataout,
input [10:0] address
)
 logic [15:0] memory [0:511]; 

 assign dataout = memory[address];

 initial 
     $readmemh("prog.txt", memory);  //must be exactly 512 locations

 endmodule



module alu(
input [15:0] aluinl,
input [15:0] aluinr,
output logic [15:0] aluout,
input [3:0] aluop
)
always_comb
    case( aluop )
        4'h0:  aluout = aluinl + aluinr;
        4'h1:  aluout = aluinl - aluinr;
        4'h2:  aluout = aluinl & aluinr;
        4'h3:  aluout = aluinl | aluinr;
        4'h4:  aluout = aluinl ^ aluinr;
        4'h5:  aluout = ~aluinr;
        4'h6:  aluout =  aluinr;
        4'h7:  aluout = aluinr + 16'h0001;
        4'h8:  aluout = aluinr - 16'h0001;
        default: aluout = 0;
    endcase;
endmodule;


module registerbank(
input clk;
input [15:0] din;
input [3:0] whichregtowrite;
input REGLOAD 
output logic [15:0] regbank[7:0];
)
always_ff @(posedge clk)                   
    if (REGLOAD)
        regbank[whichregtowrite] <= din;
endmodule;



module ControlUnit(
output logic IRLOAD, REGLOAD, PCINC, MUX
input [1:0] din;
)
logic FETCH;
logic [4:0] microcodeRom [0:3];
logic [1:0] command; 

always_ff @(posedge clk)
   if (FETCH)
      command <= din; //if I am doing fetch now, one clock later I need the microcode addr of the command.
   else
      command <= 2'b0;

    assign { IRLOAD, REGLOAD, PCINC, MUX, FETCH } = microcodeRom[command];

endmodule

This is a very suboptimal way of writing a Verilog code, akin to programming in assembly language. The resulting code will not be necessarily wrong, but it will be very long, inefficient and very hard to debug.

4.2 How to write the Frog in Verilog

A better way of writing verilog is not to start from the schematic, but from the finite state machine which gave rise to that schematic.

module FirstCPU(input logic clk);

logic [1:0] state;
logic [15:0] memory[0:511];
logic[15:0] aluinr, aluinl;
logic[3:0] aluop;
logic [15:0] pc, aluout, ir;
logic [15:0] regbank[0:3];

localparam  FETCH = 2'b00;
localparam  LDI = 2'b01;
localparam  ALU = 2'b10;

    always_ff @(posedge clk)
        case(state)

            FETCH: 
            begin
                state<=memory[pc][13:12];
                ir<=memory[pc][11:0];
                pc<=pc+1;
            end

            LDI:
            begin
                state<=FETCH;
                regbank[ ir[1:0] ] <= memory[pc];
                pc<=pc+1;
            end

            ALU:
            begin
                state<=FETCH;
                regbank[ ir[1:0] ]  <= aluout; 
            end

        endcase

assign aluinl=regbank[ir[7:6]];
assign aluinr=regbank[ir[4:3]];
assign aluop=ir[11:8];
always_comb
    case( aluop )
        4'h0:  aluout = aluinl + aluinr;
        4'h1:  aluout = aluinl - aluinr;
        4'h2:  aluout = aluinl &amp; aluinr;
        4'h3:  aluout = aluinl | aluinr;
        4'h4:  aluout = aluinl ^ aluinr;
        4'h5:  aluout = ~aluinr;
        4'h6:  aluout =  aluinr;
        4'h7:  aluout = aluinr + 16'h0001;
        4'h8:  aluout = aluinr - 16'h0001;
        default: aluout = 0;
    endcase

initial begin
  //   $readmemh("prog.txt", memory);  //must be exactly 512 locations
     state = FETCH;
 end

endmodule

As can be seen, there is no control unit, no register bank etc in this version of the program. All these low level details are expected to be inferred by the Verilog compiler. In other words, the programmer is only required to express the FSM in Verilog. It is the compiler’s duty to construct the schematic from this FSM. And, as compilers are much better in optimization compared to humans, the resulting circuit is likely be more efficient.

5. Assembler for Frog

Assembler with frog is almost the same with the assembler for Fish. Only the code to assemle the LDI instruction is added.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


//Converts a hexadecimal string to integer.
int hex2int( char* hex)  
{
    int result=0;

    while ((*hex)!='\0')
    {
        if (('0'<=(*hex))&&((*hex)<='9'))
            result = result*16 + (*hex) -'0';
        else if (('a'<=(*hex))&&((*hex)<='f'))
            result = result*16 + (*hex) -'a'+10;
        else if (('A'<=(*hex))&&((*hex)<='F'))
            result = result*16 + (*hex) -'A'+10; 
        hex++;
    }
    return(result);
}

main()
{     
    FILE *fp;
    char line[100];
    char *token = NULL;
    char *op1, *op2, *op3;
    char ch;
    int  chch;

    int program[1000];
    int counter=0;  //holds the address of the machine code instruction


    fp = fopen("name_of_program","r");

    while(fgets(line,sizeof line,fp)!= NULL)
    {
            token=strtok(line,"\n\t\r ");  //get the instruction mnemonic or labe

            if (strcmp(token,"ldi")==0)        //---------------LDI INSTRUCTION--------------------
            {
                    op1 = strtok(NULL,"\n\t\r ");                                //get the 1st operand of ldi, which is the register that ldi loads
                    op2 = strtok(NULL,"\n\t\r ");                                //get the 2nd operand of ldi, which is the data that is to be loaded
                    program[counter]=0x1000+hex2int(op1);                        //generate the first 16-bit of the ldi instruction
                    counter++;                                                   //move to the second 16-bit of the ldi instruction
                    if ((op2[0]=='0')&&(op2[1]=='x'))                            //if the 2nd operand is twos complement hexadecimal
                        program[counter]=hex2int(op2+2)&0xffff;              //convert it to integer and form the second 16-bit 
                    else if ((  (op2[0])=='-') || ((op2[0]>='0')&&(op2[0]<='9')))       //if the 2nd operand is decimal 
                        program[counter]=atoi(op2)&0xffff;                         //convert it to integer and form the second 16-bit 
                    else                                                           //if the second operand is not decimal or hexadecimal, it is a laber or a variable.
                    {                                                               //in this case, the 2nd 16-bits of the ldi instruction cannot be generated.
                        printf("unrecognizable LDI offset\n");
                    }        
                    counter++;                                                     //skip to the next memory location 
            }                                       
            else if (strcmp(token,"add")==0) //----------------- ADD -------------------------------
            {
                    op1 = strtok(NULL,"\n\t\r ");    
                    op2 = strtok(NULL,"\n\t\r ");
                    op3 = strtok(NULL,"\n\t\r ");
                    chch = (op1[0]-48)| ((op2[0]-48)<<3)|((op3[0]-48)<<6);  
                    program[counter]=0x7000+((chch)&0x00ff); 
                    counter++; 
            }
            else if (strcmp(token,"sub")==0)
            {
                    //to be added
            }
            else if (strcmp(token,"and")==0)
            {
                    //to be added
            }
            else if (strcmp(token,"or")==0)
            {
                    //to be added
            }
            else if (strcmp(token,"xor")==0)
            {
                    //to be added
            }                        
            else if (strcmp(token,"not")==0)
            {
                    op1 = strtok(NULL,"\n\t\r ");
                    op2 = strtok(NULL,"\n\t\r ");
                    ch = (op1[0]-48)| ((op2[0]-48)<<3);
                    program[counter]=0x7500+((ch)&0x00ff);  
                    counter++;
            }
            else if (strcmp(token,"mov")==0)
            {
                    //to be added
            }
            else if (strcmp(token,"inc")==0)
            {
                    op1 = strtok(NULL,"\n\t\r ");
                    ch = (op1[0]-48)| ((op1[0]-48)<<3);
                    program[counter]=0x7700+((ch)&0x00ff);  
                    counter++;
            }
            else if (strcmp(token,"dec")==0)
            {
                                      //to be added
            }
            else //------WHAT IS ENCOUNTERED IS NOT A VALID INSTRUCTION OPCODE
            {
                     printf("no valid opcode\n");
            } 
     } //while

     fclose(fp);
     fp = fopen("RAM","w");
     fprintf(fp,"v2.0 raw\n");  //needed for logisim, remove this line for verilog..
     for (i=0;i<counter+dataarea;i++)  //complete this for memory size in verilog
            fprintf(fp,"%04x\n",program[i]);
} //main

6. Key Concepts

LDI instruction
Opcode field
Instruction register
Control unit

7. Problems

Add an instruction which performs immediate ALU operations. With this instruction, do we still need the LDI instruction?
In verilog, connect the output of register 0 to a 7-segment display.
This is an extension of Problem 2. Add two DIP-switches to your design. The 7-segment display will show the output of any register depending on the switch configuration.