The Virtual Computer

The virtual computer implemented in avida consists of a central processing unit (CPU) and an instruction set. These components define the low-level behavior of each program; the CPU and the instruction set together form the hardware of a Turing machine.

When a genome is loaded into the memory (as the software) of a CPU, the initial state of the Turing machine is set. The hardware, combined with the interaction with other CPUs, then governs the set of transitions between CPU states.

The CPU Structure

The CPU contains the following set of variables, as shown in Fig. 1.1:

  • A memory which contains the assembly source code to be run. Each location in memory contains a single instruction, and a set of flags to denote if the instruction has ever been executed, copied, mutated, etc. Additionally the memory has an instruction pointer (IP) which indicates the next position to be executed.
  • Three registers that are used by the program. These are often operated upon by the various instructions, and can contain arbitrary 32-bit values.
  • Two stacks that are used for storage. These are of variable (though finite) size. The default size of the stack is 10.
  • An input buffer and an output buffer which the creatures use to receive information, and return the processed results. Each buffer also has a pointer to indicate the active position within it.
  • A facing which determines which of the CPU's neighbors it is currently pointing towards.
  • Structure of the virtual CPU in avida.



    The Instruction Set Implementation

    The instruction set in avida is loaded on startup from a configuration file (inst_set.24.base by default). This allows selection of different instruction sets without recompilation, as well as allowing different numbers of instructions to be specified. However, it is not possible to alter the behavior of individual instructions or add new instructions without recompiling avida; this has to be done directly in the source code.

    All of the available instructions are listed in the inst_set.* files with a 1 or a 0 next to an instruction to indicate if it should or should not be included. Changing the instruction set to be used simply involves adjusting these flags.

    The instructions were created with three things in mind:

  • To be as complete as possible (both in a Turing complete sense, and, more practically, to ensure that simple operations only require a few instructions).
  • For each instruction to be as robust and versatile as possible; all instructions should take an appropriate action in any situation where they are executed.
  • To have as little redundancy as possible between instructions. (Those instructions which are redundant will typically not be turned on simultaneously for a run.)
  • One major concept which differentiates this assembly language from its real-world counterparts is in the additional uses of nops (no-operation commands). These have no effect on the CPU when executed, but may modify the behavior of any instruction which precedes them. This occurs in two ways; most of the time it will change the register affected by a command. For example, an inc command followed by the instruction nop-A would cause the contents of the AX register to be incremented, while an inc command followed by a nop-B would increment BX.

    Below, whenever a register name is surrounded by ?'s in an instruction description, it is the default register to be used. If a nop follows the command, the register it represents will replace this default.

    The second way nops can be used is as labels (reference points) for a search or a jump as in tierra. If nop-A follows a jump-forward command, it scans forward for the first complementary label (nop-B) and moves the instruction pointer there. Labels may be composed of more than a single nop instruction.

    The label system used in avida allows for an arbitrary number of different nops. By default, we have three nop instructions, nop-A's complement is nop-B, nop-B's is nop-C, and nop-C's is nop-A.

    A description of all of the instructions implemented in avida follows below. Those commands in red are part of the default instruction set.

    No-operations

    There are three nops in the default instruction set, and a fourth which is a ``pure'' no-operation, as follows:

  • nop-A
  • nop-B
  • nop-C

  • nop-X : A pure no-operation instruction. It will do nothing in all cases.
  • Flow control operations

  • if-n-equ : If the ?BX? register does not equal its complement, execute the next instruction, otherwise skip it. (Thus a nop-A following this command causes AX and BX to be compared; nop-B---the default---compares BX and CX, and finally, a nop-C compares CX and AX.)

  • if-not-0 : If the ?BX? register is not 0, execute the next instruction, otherwise skip it.

  • if-bit-1 : Execute the next instruction if the last bit of ?BX? is one.

  • jump-b and jump-f : If a label follows, search for its complement in the backwards/forwards direction; if a match is found, jump to it. If there is no label, jump by BX instructions in the proper direction. If there is a label, but its complement is not found, the jump will fail.

  • jump-p : Jump specifically into the memory of another CPU, decided by the current facing. Jump to the position in the faced creature at an instruction after the first occurrence of the complement label. If no complement label can be found, this instruction fails. If no label is initially provided to the instruction, the IP (instruction pointer) will move to line BX in the faced CPU's memory. A creature's IP may only move to an immediate neighbor and no further (local interactions only).

  • call : Put the location of the next instruction on the stack, and jump forward to the complement of the label which follows. If there is no label, jump BX instructions.

  • return : Pop the top value from the stack and jump to that index in the creature's memory.
  • Single argument math operations

    All of these affect the ?BX? register.

  • shift-r and shift-l : Rotate the bits of the ?BX? register in the appropriate direction.

  • bit-1 : Set the last bit of ?BX? to 1.

  • inc and dec : Increment or decrement ?BX?.

  • zero : Set ?BX? to zero.

  • push and pop : Push ?BX? onto the stack or pop the stack into ?BX?.

  • set-num : Set BX to the ternary equivalent of the label which follows. To do this, take nop-A as a 0, nop-B as a 1 and nop-C as a 2. Thus nop-C nop-A nop-B would translate to 2 0 1 in ternary, or 19 in decimal. If there is no label, set it to 0.
  • Double argument math operations

  • add : Set ?BX? equal to the sum of the BX and the CX registers : ?BX? = BX+ CX.

  • sub : ?BX? = BX - CX.

  • nand : ?BX?= BX NAND CX (in bitwise fashion).

  • nor : ?BX? = BX NOR CX (bitwise).

  • order : Place BX and CX in the proper order, i.e., such that CX > BX.
  • ``Biological'' operations

  • allocate : Allocate ?BX? instructions of memory at the end of the memory for this CPU and return the start location of this memory into AX. Only one allocate may occur between successful divides; any additional ones will automatically fail. Additionally, not more than twice or less than half of the current memory size can successfully be allocated.

  • divide : Split the memory in this CPU at ?AX?, placing the instructions beyond the dividing point into a new cell. There are a number of conditions under which a divide will fail. Those are:
    1. If either the mother or the daughter would have less than 10 instructions.
    2. If the creature has not completed a successful allocation of memory.
    3. If less than 70 percent of the mother was executed.
    4. If less than 70 percent of the daughter's memory was copied into.
    5. If the daughter would be less than half or more than double the mother's size.

  • copy : Copy a command from the memory location pointed to by the BX register to the memory location pointed to by AX + BX, i.e., copy the instruction at location BX into a location offset by AX. If a location is out of range of the memory, then it will be cycled back into range.

  • read : Copy a command from memory at BX into the CX register.

  • write : Copy a command from the CX register into the memory location at AX + BX.

  • if-n-cpy : Only execute the next line if the contents of memory locations BX and AX + BX are identical; otherwise skip it. This command has an error rate equal to the copy mutation rate. (It can be used to do some level of error checking.)
  • I/O and sensory operations

  • get : This instruction allows a creature to read in an integer from its environment (the get-buffer) into the ?CX? register.

  • put : Write the value of the ?BX? register into the output-buffer, then zero out that register. avida will analyze this value (compared to the inputs the creature has available) to determine if (and how much of) a merit bonus is warranted.

  • search-f and search-b : Search in the appropriate direction for the complement label and return its distance. The returned value is placed in the BX register, and the size of the label that followed is put in CX. If a complement label is not found, a distance of 0 is returned.
  • Additional instructions

  • switch_stack : Toggles the active stack.

  • rotate-l and rotate-r : Rotate the current facing of the CPU in the appropriate direction.

  • inject : This instruction acts somewhat similarly to divide, but rather than killing another creature and replacing it with its offspring, the code is instead injected into the middle of a running CPU's memory. The CPU which the code is to be injected into is chosen by the facing of the active CPU and the position is found by using complement labels. If a complement label can not be found in the memory of the CPU faced, the instruction fails.

  • set-cmut : This command allows a CPU to set its own copy mutation rate. When a command is executed, it simply takes the value in ?BX? and uses that for its new copy mutation rate ×10-4). The minimal value which can be set is, of course, 1.

  • mod-cmut : This instruction modifies the copy mutation rate of a CPU. When executed, the copy mutation rate of the CPU has ?BX? ×10-4 added to it. The minimal value it can be modified by is 0.0001.
  • An Example Program

    What follows is one of the simpler ancestor organisms distributed with avida; it is contained in the file creature.small in the work/genebank/ directory. It is a simple self-replicator, and very short compared to other ancestors included in this distribution. Due to its efficiency in self-replication, this creature is not well suited as an ancestor for adaptation experiments. Note that when an instruction pointer runs off of the end of the program, it will automatically loop back to its beginning, in order to ensure that the chemistry of the program never halts.

    This simple program contains two label pairs (P and Q), one for the purpose of calculating the length, and the other for the implementation of a copy loop. The search-f followed by label P searches forward in the genome for its complement, and returns its distance (from the end of the first label to the end of the second) into the BX register, and the size of the label into CX. The program then adds CX to the BX register, to account for the length of the label itself, and finally increments BX to account for the single instruction before the first label. The creature now has its own size in BX. When the instruction on line five (allocate) is called, the program is doubled in length and the absolute address of the new chunk of memory is put in AX. Now, AX contains the offset of the newly allocated section, and BX contains the length of the cell (which, if a creature is copying itself properly, will always be the same). Lines 6 through 11 move the size of the creature (via the stack) into the CX register, and clear out BX.

    A Sample Avida Program
    00       search-f       find distance to the end label
    01 nop-A label P
    02 nop-A  
    03 add account for the end label's size
    04 inc account for the initial search-f
    05 allocate allocate space for daughter.
    06 push move size from BX onto the stack.
    07 nop-B  
    08 pop move size off of the stack into CX
    09 nop-C  
    10 pop since the stack is empty, pop 0 into BX
    11 nop-B label Q' (Copy Loop start)
    12 nop-C  
    13 copy copy the current line...
    14 inc move onto the next line.
    15 if-n-equ if we aren't done copying...
    16 jump-b ...jump back to the loop's beginning.
    17 nop-A label Q
    18 nop-B  
    19 divide done copying; separate the daughter!
    20 nop-B label P'
    21 nop-B  



    The copy loop follows. It starts by copying from line BX and uses BX + AX as the destination. Initially, BX is 0 and AX is the size (22). This means the first time through the copy loop, line 0 is copied to line 22 (the first line in the newly allocated memory). Then line 14 is executed and BX is incremented. Finally, the if-n-equ tests to see if whether BX and CX are different, and jumps back to the beginning of the loop if this is the case. The loop will continue (copying a new line each time) until BX equals CX and hence all the lines have been copied. Finally, the divide instruction is reached.

    The divide instruction divides the cell at the offset specified in the AX register, creating a parent and daughter cell. The daughter is placed in a neighboring cell (the exact mechanism of which is described in the Reproduction section). At the end of the program, the instruction pointer is looped back to the beginning.

    Here is a trace of the execution of the program, with the values of the registers and stack at each moment in time.

    line instruction AX BX CX stack comments
    00 search-f 0 19 2   
    03 add 0 21 2   
    04 inc 0 22 2  We have the size!
    05 allocate 22 22 2  Allocate space
    06 push 22 22 2 22 push BX
    08 pop 22 22 22  pop CX
    10 pop 22 0 22  pop BX
    12 nop-C 22 0 22  No-Operation
    13 copy 22 0 22  Copy line 0...
    14 inc 22 1 22   
    15 if-n-equ 22 1 22   
    16 jump-b 22 1 22  ...to start of loop
    13 copy 22 1 22  Copy line 1...
    14 inc 22 2 22   
    15 if-n-equ 22 2 22   
    16 jump-b 22 2 22  ...to start of loop
    13 copy 22 2 22  Copy line 2...
    . . .
    13 copy 22 21 22  Copy line 21..
    14 inc 22 22 22   
    15 if-n-equ 22 22 22   
    17 nop-A 22 22 22   
    18 nop-B 22 22 22   
    19 divide 22 22 22  Divided!
    20 nop-B 22 22 22   
    21 nop-B 22 22 22   



    For this program, the gestation time (the number of instructions required to reproduce) varies between 98 and 100 instructions. The first time through requires 98 instructions: the portion before the copy loop consists of 8 executed instructions; the copy loop contains another 4 instructions that are each executed 22 times each, except for the last time the copy loop is executed, where the last jump-b is skipped over; and from there it is another 3 instructions until the divide is issued. So, 8 + (4×22 - 1) + 3 = 98. However, the second time through, an additional 2 instructions are executed because of the P label after the divide instruction, and a gestation time of 100, rather than 98, is averaged into the genotype record.


    NEXT: Mutations
    PREV: Reproduction

    INDEX


    Page maintained by Charles Ofria
    Send all comments to charles@krl.caltech.edu