2Arithmetic Where we've been: Performance (seconds, cycles, instructions)Abstractions: Instruction Set Architecture Assembly Language and Machine LanguageWhat's up ahead:Implementing the Architecture32operationresultabALU
3NumbersBits are just bits (no inherent meaning) — conventions define relationship between bits and numbersBinary numbers (base 2) decimal: n-1Of course it gets more complicated: numbers are finite (overflow) fractions and real numbers negative numbers e.g., no MIPS subi instruction; addi can add a negative number)How do we represent negative numbers? i.e., which bit patterns will represent which numbers?
4Possible Representations Sign Magnitude: One's Complement Two's Complement 000 = = = = = = = = = = = = = = = = = = = = = = = = -1Issues: balance, number of zeros, ease of operationsWhich one is best? Why?
5MIPS32 bit signed numbers: two = 0ten two = + 1ten two = + 2ten two = + 2,147,483,646ten two = + 2,147,483,647ten two = – 2,147,483,648ten two = – 2,147,483,647ten two = – 2,147,483,646ten two = – 3ten two = – 2ten two = – 1tenmaxintminint
6Two's Complement Operations Negating a two's complement number: invert all bits and add 1remember: “negate” and “invert” are quite different!Converting n bit numbers into numbers with more than n bits:MIPS 16 bit immediate gets converted to 32 bits for arithmeticcopy the most significant bit (the sign bit) into the other bits > >"sign extension" (lbu vs. lb)
7Novas instruçõesinstruções “unsigned”: (exemplo de aplicação, cálculo de memória)sltu $t1, $t2, $t # diferença é “sem sinal”slti e sltiu # envolve imediato, com ou sem sinalExemplo pag 215: supor $s0 = FF FF FF FF e $s1 =slt $t0, $s0, $s1como $s0 < 0 e $s1 > 0 Þ $s0<$s1 Þ $t0 = 1sltu $t0, $s0, $s1como $s0 e $s1 não tem sinal Þ $s0>$s1 Þ $t0 = 0
8Cuidados com extensão 16 bits beq $s0, $s1, nnn # salta para PC + nnn se teste OKnnn tem 16 bits e PC tem 32 bitsestender de 16 para 32 bits antes daoperação aritméticase nnn > 0preencher com zeros à esquerdase nnn < CUIDADOpreencher com 1´s à esquerdaverificarpor este motivo operação é chamada deEXTENSÃO DE SINAL
9Addition & Subtraction Just like in grade school (carry/borrow 1s) 0101Two's complement operations easysubtraction using addition of negative numbers 1010Overflow (result too large for finite computer word):e.g., adding two n-bit numbers does not yield an n-bit number 0001 note that overflow term is somewhat misleading, it does not mean a carry “overflowed”
10Detecting OverflowNo overflow when adding a positive and a negative numberNo overflow when signs are the same for subtractionCONDIÇÕES DE OVERFLOWEm hardware, comparar o “vai-um” e o“vem-um” com relação ao bit de sinal
11Effects of Overflow An exception (interrupt) occurs Control jumps to predefined address for exception (EPC — EXCEPTION PROGRAM COUNTER)Interrupted address is saved for possible resumptionmfc0 (move from system control): copia endereço do EPC para qualquer registradorDon't always want to detect overflow — new MIPS instructions: addu, addiu, subu note: addiu still sign-extends! note: sltu, sltiu for unsigned comparisons
13Review: Boolean Algebra & Gates Problem: Consider a logic function with three inputs: A, B, and C. Output D is true if at least one input is true Output E is true if exactly two inputs are true Output F is true only if all three inputs are trueShow the truth table for these three functions.Show the Boolean equations for these three functions.Show an implementation consisting of inverters, AND, and OR gates.
14An ALU (arithmetic logic unit) Let's build an ALU to support the andi and ori instructionswe'll just build a 1 bit ALU, and use 32 of themPossible Implementation (sum-of-products):operationresultopabresab
15Review: The Multiplexor Selects one of the inputs to be the output, based on a control inputLets build our ALU using a MUX:SCABnote: we call this a 2-input muxeven though it has 3 inputs!1
16Different Implementations Not easy to decide the “best” way to build somethingDon't want too many inputs to a single gateDont want to have to go through too many gatesfor our purposes, ease of comprehension is importantLet's look at a 1-bit ALU for addition:How could we build a 1-bit ALU for add, and, and or?How could we build a 32-bit ALU?cout = a b + a cin + b cinsum = a xor b xor cin
20Tailoring the ALU to the MIPS Need to support the set-on-less-than instruction (slt)remember: slt is an arithmetic instructionproduces a 1 if rs < rt and 0 otherwiseuse subtraction: (a-b) < 0 implies a < bNeed to support test for equality (beq $t5, $t6, $t7)use subtraction: (a-b) = 0 implies a = b
21Supporting slt Can we figure out the idea? Rs bit de sinal Rt Rd RsRtRdbit de sinalsubtração
23Test for equalityNotice control lines: 000 = and 001 = or 010 = add 110 = subtract 111 = sltNote: zero is a 1 when the result is zero!
24ALU ALUop 32 bits: A, B, result 1 bit: Zero, Overflow A Zero 3 bits: ALUop
25Conclusion We can build an ALU to support the MIPS instruction set key idea: use multiplexor to select the output we wantwe can efficiently perform subtraction using two’s complementwe can replicate a 1-bit ALU to produce a 32-bit ALUImportant points about hardwareall of the gates are always workingthe speed of a gate is affected by the number of inputs to the gatethe speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”)Our primary focus: comprehension, however,Clever changes to organization can improve performance (similar to using better algorithms in software)we’ll look at two examples for addition and multiplication
26Problem: ripple carry adder is slow Is a 32-bit ALU as fast as a 1-bit ALU? atraso (ent Þ soma ou carry = 2G) n estágios Þ 2nGIs there more than one way to do addition?two extremes: ripple carry (2nG) sum-of-products (2G)Can you see the ripple? How could you get rid of it?c1 = b0c0 + a0c0 + a0b0c2 = b1c1 + a1c1 + a1b c2 =c3 = b2c2 + a2c2 + a2b2 c3 =c4 = b3c3 + a3c3 + a3b3 c4 =Not feasible! Why?
27Carry-lookahead adder An approach in-between our two extremesMotivation:If we didn't know the value of carry-in, what could we do?When would we always generate a carry? gi = ai biWhen would we propagate the carry? pi = ai + biDid we get rid of the ripple?c1 = g0 + p0c0c2 = g1 + p1c1 c2 =c3 = g2 + p2c2 c3 =c4 = g3 + p3c3 c4 = Feasible! Why?atraso: ent Þ gi pi (1G) gi pi Þ carry (2G) carry Þ saídas (2G)total: 5G independente de n
28Use principle to build bigger adders Can’t build a 16 bit adder this way... (too big)Could use ripple carry of 4-bit CLA addersBetter: use the CLA principle again!super propagate (ver pag 243)super generate (ver pag 245)ver exercícios 4.44, 45 e 46 (não será cobrado)
29Multiplication More complicated than addition accomplished via shifting and additionMore time and more areaLet's look at 3 versions based on gradeschool algorithmNegative numbers: convert and multiplythere are better techniques, we won’t look at them
32Final VersionNo MIPS:dois novos registradores de uso dedicado para multiplicação: Hi e Lo (32 bits cada)mult $t1, $t # Hi Lo Ü $t1 * $t2mfhi $t # $t1 Ü Himflo $t # $t1 Ü Lo
33Algoritmo de Booth (visão geral) Idéia: “acelerar” multiplicação no caso de cadeia de “1´s” no multiplicador:* (multiplicando) =* (multiplicando)* (multiplicando)Olhando bits do multiplicador 2 a 200 nada01 soma (final)10 subtrai (começo)11 nada (meio da cadeia de uns)Funciona também para números negativosPara o curso: só os conceitos básicosAlgoritmo de Booth estendidovarre os bits do multiplicador de 2 em 2Vantagens:(pensava-se: shift é mais rápido do que soma)gera metade dos produtos parciais: metade dos ciclos
36Divisão29 3 Þ = 3 * Q R = 3 *divisorrestodividendoquociente2910 = = 11Q = R = 21 1Como implementar em hardware?1 0
37Alternativa 1: divisão com restauração hardware não sabe se “vai caber ou não”registrador para guardar resto parcialverificação do sinal do resto parcialcaso negativo Þ restauraçãoq4q3q2q1q0 = = 9R = 11 = 2
42InstruçõesNo MIPS:dois novos registradores de uso dedicado para multiplicação: Hi e Lo (32 bits cada)mult $t1, $t # Hi Lo Ü $t1 * $t2mfhi $t # $t1 Ü Himflo $t # $t1 Ü LoPara divisão:div $s2, $s # Lo Ü $s3 / $s Hi Ü $s3 mod $s3divu $s2, $s # idem para “unsigned”
43mantissa ou significando Ponto FlutuanteObjetivos:representação de números não inteirosaumentar a capacidade de representação (maiores ou menores)Formato padronizado1.XXXXXXXXX * 2yyy (no caso geral Byyy)No MIPS:Sexpmantissa ou significando823 exp mantissa faixa precisãosinal-magnitude (-1)S F * 2E
50Conjunto de instruções do MIPS para fp Fig Pag 291
51Floating Point Complexities Operations are somewhat more complicated (see text)In addition to overflow we can have “underflow”Accuracy can be a big problemIEEE 754 keeps two extra bits, guard and roundfour rounding modespositive divided by zero yields “infinity”zero divide by zero yields “not a number”other complexitiesImplementing the standard can be trickyNot using the standard can be even worsesee text for description of 80x86 and Pentium bug!
52Chapter Four SummaryComputer arithmetic is constrained by limited precisionBit patterns have no inherent meaning but standards do existtwo’s complementIEEE 754 floating pointComputer instructions determine “meaning” of the bit patternsPerformance and accuracy are important so there are many complexities in real machines (i.e., algorithms and implementation).We are ready to move on (and implement the processor) you may want to look back (Section 4.12 is great reading!)