image of READY prompt

Wang2200.org

The same design team that produced the first generation 2200 also produced the 2nd generation CPU, internally known as the 2600. The microarchitecture document for the 2600 was substantially in place by October, 1974 or earlier. The architecture document 2600 Calculator Structure was authored by Norman Lourie, Bob Kolk, and Bruce Patterson, so they were probably the chief architects. The 2600 CPU was a complete redesign, incorporating the latest technology and a much more efficient microarchitecture. The 2200 MVP architecture document was very well done, leaving little to the imagination.

In the VP microarchitecture, microinstructions could operate on 8b and 16b operands in just 600 ns, whereas the first generation CPU only operated on 4b quantities in 1600 ns. The revised microarchitecture also had a larger AUX register file and a larger subroutine stack. Finally, even though there were only 3 bits more per microword (23b vs 20b), the instruction set was far richer in the new microarchitecture. As an extreme example, loading the PC register (this is the memory pointer, not the instruction pointer) took one instruction (600 ns) in the new microarchitecture versus four instructions (6400 ns) in the old.

Although the microarchitecture retained some of the flavor of the first generation, its differences were great enough that the BASIC interpreter had to be completely rewritten from scratch. Wang BASIC also got a major overhaul with many new features and was dubbed BASIC-2.

Bruce Patterson and Dave Angel wrote almost all the microcode for BASIC-2. Despite the complete rewrite and all the new features, BASIC-2 was 99% upwardly compatible with the original Wang BASIC. A BASIC program running on a 2600 CPU is about 8x faster than the exact same program running on a 2200T CPU; a factor of 2.5 of that was due to the faster cycle time of the machine, and the other factor of three came from the more powerful microarchitecture instruction set combined with more efficient algorithms.

Page 4 of Wang Systems Newsletter #4 has this comparison:

Q. How much faster is the "VP" than the "T" CPU?

A. That's a good question. In general, one can safely state that the VP is 6-8 times faster overall. To help compare the two CPU's, here are some timings against specific functions.

Function 2200VP 2200T
X+Y 0.11 ms 0.8 ms
X*Y 0.38 ms 3.9 ms
X/Y 0.76 ms 7.4 ms
X^Y 6.2 ms 45.4 ms
LOG 3.2 ms 23.2 ms
SQR 1.7 ms 46.4 ms
TAN 7.7 ms 78.5 ms
RND 0.27 ms 24.0 ms

 

One great improvement in the 2600 CPU was that the microcode was no longer stored in ROMs -- it was downloaded from disk on start up, making it much easier to fix bugs in the field. This feature also made it possible to run diagnostics on the machine every so often to make sure the hardware was operating right.

Although the CPU microarchitecture was entirely incompatible, the I/O structure was kept from the first generation 2200, allowing people to upgrade to the VP without having to throw away all of the their I/O cards and peripherals.

Microarchitecture Details (link)

The following information is intended to give the flavor of the microarchitecture, but doesn't cover everything. The view of the CPU presented to the microprogrammer is as follows.

Table 1: Wang VP CPU Register Resources
Register name [array size] Register width Function
IC 16b microcode instruction counter
ICSTACK[96] 16b microcode return stack
PH, PL 16b (8b, 8b) memory address pointer; scratch register
AUX[32] 16b auxiliary PC file
F[8] 8b scratch data registers
CH, CL 16b (8b,8b) memory read data
K 8b 8b data to/from the I/O bus
SH 8b high status register
SL 8b low status register

IC points at the current microinstruction being executed. Each microinstruction is 24b wide, of which one is parity. Most microinstructions take six 10 MHz clock cycles, although a few take eight, eleven, or sixteen clocks. The IC can be loaded with a 16b immediate value (i.e., JUMP or CALL); its value can be saved on the next location in the ICSTACK or its value restored from the same.

ICSTACK holds return addresses from the microcode subroutine calls, and it can also be used to push the current PC (with a -3 to +3 offset) or to pop the newest value into the PC . The stack is 96 deep; if the call nesting gets deeper than 96 levels, the ICSTACK pointer just wraps around and overwrites the oldest entry.

PH, PL are respectively the high and low bytes of the 16b PC register. PC supplies the memory address when an instruction contains a memory access operation. The address is a byte address, which is what limits the architecture to accessing at most 64 KB of RAM. Later versions of the CPU added bank address bits (provided from SL) allowing more RAM to be addressed, although a single process never saw more than 64 KB. The register is often used like an accumulator to generate addresses that get stored elsewhere.

AUX[32] is a file of thirty two 16b registers. These are used for holding and supplying 16b values to the PC. They are required because saving/restoring the PC value to memory takes many microinstructions. When a value is transferred from the PC to an AUX register, the value can be adjusted by -3 to +3. This makes advancing a pointer through memory efficient.

F[8] is a file of eight 8b values. These are used as a scratch pad for holding the results of calculations from the ALU.

CH,CL are a pair of 8b registers that work together. Every memory read gets two bytes and the data is saved in CH,CL. Because PC is byte addressed, PC may be even or odd. The byte address by PC is saved in CH; the byte addressed by (PC^0x0001) is saved in CL.

K is another 8b register. It is used to send 8b values over the I/O bus or to capture 8b values read from the I/O bus.

Finally, there are two 8b status registers. SH contains a collection of ad hoc status/control bits that do things hold the carry flag and detect when I/O operations have completed. SL is just an 8b read/write register that the microcode uses for various state control so it doesn't have to go to memory for this state.

You can see a very simple block diagram of the microarchitecture.

Microinstruction Encoding (link)

There are a few different formats for microcode instructions. The 2200 MVP architecture document contains a wealth of information, including everything required to write the VP CPU emulation code. Because it is so well written, if you really want the details, see the source document. Below are some of the most important details, enough to provide an overview of what the microarchitecture was all about.

The software development manual contains a very helpful table of microword encodings. It has been recreated as an HTML table below.

Table 2: Wang VP Microinstruction Encoding
  22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
I. REGISTER INSTRUCTIONS OPCODE X   Carry DD C-BUS A-BUS B-BUS
OR Or 0 0 0 0 0 X 0 CaCa DD CCCC AAAA BBBB
XOR Exclusive 0 0 0 0 1 X 0 CaCa DD CCCC AAAA BBBB
AND And 0 0 0 1 0 X 0 CaCa DD CCCC AAAA BBBB
SC Binary Subtract with Carry 0 0 0 1 1 X 0 CaCa DD CCCC AAAA BBBB
DAC Decimal Add with Carry 0 0 1 0 0 X 0 CaCa DD CCCC AAAA BBBB
DSC Decimal Subtract with Carry 0 0 1 0 1 X 0 CaCa DD CCCC AAAA BBBB
AC Binary Add with Carry 0 0 1 1 0 X 0 CaCa DD CCCC AAAA BBBB
M Binary Multiply 0 0 1 1 1 X 0 HbHa DD CCCC AAAA BBBB
SHFT Shift 0 0 0 HbHa X 0 0 1 DD CCCC AAAA BBBB
II. IMMEDIATE REGISTER INSTRUCTIONS OPCODE IMMEDIATE
(HIGH)
DD C-BUS IMMEDIATE
(LOW)
B-BUS
ORI Or Immediate 0 1 0 0 0 IIII DD CCCC IIII BBBB
XORI Exclusive Or Immediate 0 1 0 0 1 IIII DD CCCC IIII BBBB
ANDI And Immediate 0 1 0 1 0 IIII DD CCCC IIII BBBB
AI Binary Add Immediate 0 1 0 1 1 IIII DD CCCC IIII BBBB
DACI Decimal Add with Carry Immediate 0 1 1 0 0 IIII DD CCCC IIII BBBB
DSCI Decimal Subtract with Carry Immediate 0 1 1 0 1 IIII DD CCCC IIII BBBB
ACI Binary Add with Carry Immediate 0 1 1 1 0 IIII DD CCCC IIII BBBB
MI Binary Multiply Immediate 0 1 1 1 0 0 - Hb - DD CCCC IIII BBBB
III. MINI INSTRUCTIONS OPCODE   DD   B-BUS
TAP Transfer Aux to PC's 0 0 0 1 0 1 1 1 - DD 0 - - AxAxAxAxAx BBBB
TPA Transfer PC's to Aux 0 0 0 0 0 0 1 1 +/- DD 0 InIn AxAxAxAxAx BBBB
XPA Exchange PC's to Aux 0 0 0 0 0 1 1 1 +/- DD 0 InIn AxAxAxAxAx BBBB
TPS Transfer PC's to Stack 0 0 0 0 1 0 1 1 +/- DD 0 InIn - - - - - BBBB
TSP Transfer Stack to PC's 0 0 0 1 1 0 1 1 - DD - - - - - - - - BBBB
SR,RCM Read Control Memory + SR 0 0 0 0 1 1 1 1 - - - 0 1 1 - - - - - - - - -
SR,WCM Write Control Memory + SR 0 0 0 0 1 1 1 1 - - - 0 1 0 - - - - - - - - -
SR Subroutine Return 0 0 0 0 1 1 1 1 - DD 0 0 - - - - - - BBBB
CIO Control Input/Output 0 0 1 0 1 1 1 1 - 0 0 S TTT TTTT - - - -
LPI Load PC's Immediate 0 0 1 1 II 1 II DD IIII IIII IIII
IV. MASK BRANCH INSTRUCTIONS OPCODE   BRANCH FIELD
(LOW 10-Bits)
MASK B-BUS
BT Branch if True 1 1 0 0 Hb RRRRRRRRRR MMMM BBBB
BF Branch if False 1 1 0 1 Hb RRRRRRRRRR MMMM BBBB
BEQ Branch if = Mask 1 1 1 0 Hb RRRRRRRRRR MMMM BBBB
BNE Branch if != Mask 1 1 1 1 Hb RRRRRRRRRR MMMM BBBB
V. REGISTER BRANCH INSTRUCTIONS OPCODE   BRANCH FIELD
(LOW 10-Bits)
A-BUS B-BUS
BLR Branch if < Register 1 0 0 0 X RRRRRRRRRR AAAA BBBB
BLER Branch if <= Register 1 0 0 1 X RRRRRRRRRR AAAA BBBB
BER Branch if = Register 1 0 1 0 0 RRRRRRRRRR AAAA BBBB
BNR Branch if != Register 1 0 1 1 0 RRRRRRRRRR AAAA BBBB
VI. BRANCH INSTRUCTIONS OPCODE BRANCH FIELD
(LOW 10-Bits)
BRANCH FIELD
(HIGH 6-Bits)
 
SB Subroutine Branch 1 0 1 0 1 RRRRRRRRRR RRRRRR - -
B Unconditional Branch 1 0 1 1 1 RRRRRRRRRR RRRRRR - -

Table 3: Microinstruction Encoding Key
AAAA A-BUS Register Address
BBBB B-BUS Register Address
CCCC C-BUS Register Address
DD Read/Write Specification
00 = no read/write
01 = read (CH<=MEM[PC]; CL<=MEM[PC^1])
10 = write 1 (MEM[PC] <= C-BUS result)
11 = write 2 (MEM[PC^1] <= C-BUS result)
Hb, Ha High/Low 4-bits of register
Ha = 0: select low 4-bits of A-Bus register
Ha = 1: select high 4-bits of A-Bus register
Hb = 0: select low 4-bits of B-Bus register
Hb = 1: select high 4-bits of B-Bus register
II...I Immediate Operand
MMMM Immediate Mask
AxAxAxAxAx Address of auxiliary register
+/- In In Increment/decrement specification 000 = PC's
001 = PC's + 1
010 = PC's + 2
011 = PC's + 3
100 = PC's
101 = PC's - 1
110 = PC's - 2
111 = PC's - 3
CaCa Set carry (SH0) specification
00 = do not set carry
10 = set carry to 0 before ALU operation
11 = set carry to 1 before ALU operation
X Extended operation if X = 1
RR...R Branch address
S Set IOB flip-flops if S = 1
TTTTTT Strobe specification
- Bit ignored (0 or 1 legal)

Table 4: A-, B-, C-Bus Register Addressing
Binary Encoding A-BUS B-BUS C-BUS
0000-0111 File registers (F0-F7) F0-F7 F0-F7
1000 CL with PC's = PC's - 1 PL PL
1001 CH with PC's = PC's - 1 PH PH
1010 CL CL illegal
1011 CH CH illegal
1100 CL with PC's = PC's + 1 SL SL
1101 CH with PC's = PC's + 1 SH SH
1110 Dummy with PC's = PC's + 1 K K
1111 Dummy with PC's = PC's - 1 Dummy Dummy

When the A-BUS or B-BUS is specified as Dummy, a constant zero is supplied. When the C-BUS is specified as Dummy, it means the ALU result won't be stored to a register (although the result can still be stored to memory with a ",W1" or ",W2" specifier, if the microinstruction format has the DD field).

Table 5: Extended Operation Register Pairs
Binary Encoding A-BUS B-BUS C-BUS
0000 F1, F0 F1, F0 F1, F0
0001 F2, F1 F2, F1 F2, F1
0010 F3, F2 F3, F2 F3, F2
0011 F4, F3 F4, F3 F4, F3
0100 F5, F4 F5, F4 F5, F4
0101 F6, F5 F6, F5 F6, F5
0110 F7, F6 F7, F6 F7, F6
0111 CL, F7 PL, F7 PL, F7
1000 CH, CL PH, PL PH, PL
1001 CL, CH CL, PH illegal
1010 CH, CL CH, CL illegal
1011 CL, CH SL, CH illegal
1100 CH, CL SH, SL SH, SL
1101 Dummy, CH K, SH K, SH
1110 Dummy, Dummy Dummy, K Dummy, K
1111 F0, Dummy F0, Dummy F0, Dummy

When a microinstruction has an X bit, X=0 means that an 8b operation is to be performed. When X=1, the instruction is converted into a 16b operation, where the first 8b acts on the registers as specified in the encoding, and the second half acts on the 8b operands selected by the register encoding + 1. Table 5 specifies the possible combinations. Note that the operation is a true 16b operation, not two 8b operations in a row, that is, if the CaCa field indicates that carry is to be set or cleared, it happens before the first byte operation but not the second byte operation; for the 16b versions of BLR and BLER, the comparison is a 16b comparison, not just the top byte of the compare. When an extended microinstruction takes place, the increment and decrement of the PC's that would occur for the 8b version is suppressed and the PC value is unaffected. Extended mode instructions that specify a write to memory, only the high order byte of the result is written. Note that extended mode instructions operate in the same amount of time as a normal mode instruction.

Finally, there are some pseudo-operations that the assembler supported. There are more than one way to achieve the same purpose, but the ones chosen by the assembler are as follows:

Table 6: Standard Pseudo Operations
Mnemonic Actual Code Meaning
NOP ORI 0,, Don't do anything (C-BUS gets zero)
MVI imm, dst ORI imm,,dst Move 8b immediate to register
MV src, dst ORI 0,src,dst 8b register to register move
MVX src, dst ORX 00,src,dst 16b register to register move

Microarchitecture Example Code (link)

The above description gives many details, but they are best understood by looking at real code to see how they work together. In order to compare the VP microarchitecture to that of the 2200T CPU, I've attempted to re-write the code examples from the 2200 microarchitecture page (which was real microcode from a shipping CPU). Because I haven't tried to find the exact same code buried somewhere in BASIC-2, I've just written it myself; perhaps a more experienced VP microcoder could do a better job.

uCode Example #1A: 2200T
IC Mnemonic Behavior
02A1 TA     4 transfer the contents of AUX[4] to the PC register, wiping out the previous contents of PC
02A2 TP+2,R 4 transfer PC+2 back to AUX[4]; read the byte at RAM[PC], storing it in C.
We increment by two because PC is a nibble address, and we are advancing to the next byte.
02A3 BNE    2,CL,02A5 jump to return if low nibble isn't 2
02A4 BEQ    0,CH,02A1 loop back if high nibble is 0; (note the nibble swap: this is seeking HEX(20), which is space)
02A5 SR return to caller

uCode Example #1B: 2200VP
IC Mnemonic Behavior
0100 MVI    20,F0 space character
0101 TAP    4 transfer the contents of AUX[4] to the PC register, wiping out the previous contents of PC
0102 OR,R   +,, read RAM[PC] and save it in CH; increment PC
0103 BER    CH,F0,0102 if the character is a space, get the next character
0104 TPA    4 CH still holds the first non-space character; AUX[4] points to the following byte
0105 SR return to caller

uCode fragment #1 scans a line of code, skipping ahead until a non-space is found. AUX[4] contains the 16b pointer to the current byte being scanned, and returns with C containing the first non-space and AUX[4] pointing to the byte after it. Undoubtedly in the original source code the constant "4" would have been represented by a symbolic name.

The 2200T code takes four instructions (6.4 uS) per byte processed; the 2200VP code takes two instructions per byte (1.2 uS), which is about a five times speed difference. To be fair, the 2200VP code is one instruction longer and uses F0 as a scratch register.

uCode Example #2A: 2200T
IC Mnemonic Behavior
03B9 ANDI   0E,ST1,ST1 clear bit 0 of ST1; this is the carry bit
03BA ACI    0E,F0,F0 subtract two from the 16b quantity stored in {F3,F2,F1,F0}
03BB ACI    0F,F1,F1
03BC ACI    0F,F2,F2
03BD ACI    0F,F3,F3
03BE BF     1,ST1,03C4 test bit 1 of ST1 (carry); if there is no carry, we are done
03BF XP-2   1 this and the next instruction simply decrement PC by 2 using AUX[1] as a temporary register
03C0 XP     1
03C1 AI,W1  0,F5, store {F4,F5} in memory at the byte pointed at by PC
03C2 AI,W2  0,F4,
03C3 B      03B9 loop back to the start of the routine
03C4 SR return from subroutine

This routine uses {F3,F2,F1,F0} as a 16b count of the number of nibbles to fill with a constant byte. The byte is supplied by {F5,F4}. The fill proceeds backwards, that is {F3,F2,F1,F0} initially points to one byte past where the fill should begin. This code takes 11 instructions (17.1 usec) per byte filled.

uCode Example #2B: 2200VP
IC Mnemonic Behavior
0100 SCX,0  F3F2,F3F2,F3F2 subtract {F3,F2} from itself with borrow, so that {F3,F2} = -1
0101 ANDI   0FE,SH,SH clear the carry bit
0102 ACX    F1F0,F3F2,F1F0 {F1,F0} = {F1,F0} + {F3,F2}
0103 BFL    1,SH,03C4 test carry bit; if there is no carry, we are done
0104 OR     -,, decrement PC by 1
0105 ORI,W1 0,F4, store F4 in memory at the byte pointed at by PC
0106 B      0101 loop back to the start of the routine
0107 SR return from subroutine

In the VP version, things are changed a bit. Because the registers are 8b wide, let's assume {F1,F0} contains a byte count, and that F4 contains the fill byte. This code takes six instructions (3.6 usec) per byte filled, about five times faster. Allowing a couple more instructions, the VP code could be brought down to five instructions per byte. Allowing more extensive rearrangement, the inner loop could be brought down to two instructions:

uCode Example #2C: 2200VP
IC Mnemonic Behavior
0100 ORI,W1 -,F4, write F4 to MEM[PC]; PC=PC-1
0101 BLERX  F1F0,PHPL,*-1 keep going while {F1,F0} <= PC