GEKKO PAIRED-SINGLES - WHAT A THESE ? --------------------------------------------------------------------------- 1. Overview Paired-Singles are analog of Intel (and other x86 series) processor's "streamed instructions", known as SSE. This extension is specific for Gekko processor and using to calculate two single-precision numbers ("floats" in C) using only one operation. Floating-Point Registers of Gekko (FPRs) are modified in some way : one half is used for first single number, and other for second. This picture showing FPR format in paired-single mode : --------------------------- bits: |0..........31|32.........63| |-------------+-------------| | PS0 | PS1 | --------------------------- This parts are named as "PS0" and "PS1". Since Gekko is working in big- endian mode, bits are numbered from-left-to-right order. There total 32 PS0 and 32 PS1 registers. PS instructions set is divided on two parts : Load and Store Quantization and Paired-Single Arithmetic instructions. Load and Store Quantization instructions are used for fast integer-float type casting and some specific memory operations, using PS0 and PS1 parts of FPR. Details are given later in this document. Paired-single mode is useful for fast vector and matrix calculations. 2. How to enable Paired-Single Mode To enable PS-mode, you should set some bits of Gekko's HID2 custom System-Purpose Register (HID2 assigned as SPR 920). -------------- bits: | 0 |1| 2 | |----+-+---+--- ... (dont care) |LSQE| |PSE| -------------- LSQE - Paired-Single load and store instructions enabled PSE - Paired-Single mode enabled Next low-level assembly code demonstrate, how to enable PS : mfspr r0, 920 # read HID2 oris r0, 0xA000 # set LSQE and PSE bits mtspr r0, 920 # write back If you try to execute any PS instruction without LSQE and PSE bit set, illegal instruction exception will be generated. 3. Paired-Single Load and Store 4. Paired-Single Arithmetic Sorted opcode list : ------------------------------------ |00100| D |00000| B | 264 |R| ps_abs |00100| D | A | B | 21 |R| ps_add |00100| D 00| A | B | 32 |0| ps_cmpo0 |00100| D 00| A | B | 96 |0| ps_cmpo1 |00100| D 00| A | B | 0 |0| ps_cmpu0 |00100| D 00| A | B | 64 |0| ps_cmpu0 |00100| D | A | B | 18 |R| ps_div |00100| D | A | B | 528 |R| ps_merge00 |00100| D | A | B | 560 |R| ps_merge01 |00100| D | A | B | 592 |R| ps_merge10 |00100| D | A | B | 624 |R| ps_merge11 |00100| D |00000| B | 72 |R| ps_mr |00100| D |00000| B | 136 |R| ps_nabs |00100| D |00000| B | 40 |R| ps_neg |00100| D |00000| B | 24 |R| ps_res |00100| D |00000| B | 26 |R| ps_rsqrte |00100| D | A | B | 20 |R| ps_sub |-----+-----+-----+-----+----------+-| |00100| D | A | B | C | 29 |R| ps_madd |00100| D | A | B | C | 14 |R| ps_madds0 |00100| D | A | B | C | 15 |R| ps_madds1 |00100| D | A | B | C | 28 |R| ps_msub |00100| D | A |00000| C | 25 |R| ps_mul |00100| D | A |00000| C | 12 |R| ps_muls0 |00100| D | A |00000| C | 13 |R| ps_muls1 |00100| D | A | B | C | 31 |R| ps_nmadd |00100| D | A | B | C | 30 |R| ps_nmsub |00100| D | A | B | C | 23 |R| ps_sel |00100| D | A | B | C | 10 |R| ps_sum0 |00100| D | A | B | C | 11 |R| ps_sum1 ------------------------------------ Note : R opcode field (comparsion of result with zero) is implemented, but unused by regular GC programs. Descriptions : PS_ABS - absolute value Clear bit 0 of PS0[B] and copy result to PS0[D] Clear bit 0 of PS1[B] and copy result to PS1[D] PS_ADD - add PS0[D] = PS0[A] + PS0[B] PS1[D] = PS1[A] + PS1[B] PS_CMPO0 - compare ordered high "c" holds result of comparsion If (PS0[A] is NaN or PS0[B] is NaN) then c = 0001b Else if (PS0[A] < PS0[B]) then c = 1000b Else if (PS0[A] > PS0[B]) then c = 0100b Else c = 0010b Save result in D field of condition register (CR[D] = c). PS_CMPO1 - compare ordered low "c" holds result of comparsion If (PS1[A] is NaN or PS1[B] is NaN) then c = 0001b Else if (PS1[A] < PS1[B]) then c = 1000b Else if (PS1[A] > PS1[B]) then c = 0100b Else c = 0010b Save result in D field of condition register (CR[D] = c). PS_CMPU0 - compare unordered high "c" holds result of comparsion If (PS0[A] is NaN or PS0[B] is NaN) then c = 0001b Else if (PS0[A] < PS0[B]) then c = 1000b Else if (PS0[A] > PS0[B]) then c = 0100b Else c = 0010b Save result in D field of condition register (CR[D] = c). PS_CMPU1 - compare unordered low "c" holds result of comparsion If (PS1[A] is NaN or PS1[B] is NaN) then c = 0001b Else if (PS1[A] < PS1[B]) then c = 1000b Else if (PS1[A] > PS1[B]) then c = 0100b Else c = 0010b Save result in D field of condition register (CR[D] = c). These four compare instructions looks same, because I omitted some unecessary FPSCR stuff. PS_DIV - divide PS0[D] = PS0[A] / PS0[B] PS1[D] = PS1[A] / PS1[B] PS_MERGE00 - merge high PS0[D] = PS0[A] PS1[D] = PS0[B] PS_MERGE01 - merge direct PS0[D] = PS0[A] PS1[D] = PS1[B] PS_MERGE10 - merge swapped PS0[D] = PS1[A] PS1[D] = PS0[B] PS_MERGE11 - merge low PS0[D] = PS1[A] PS1[D] = PS1[B] PS_MR - move register PS0[D] = PS0[B] PS1[D] = PS1[B] PS_NABS - negate absolute value Set bit 0 of PS0[B] and copy result to PS0[D] Set bit 0 of PS1[B] and copy result to PS1[D] PS_NEG - negate Invert bit 0 of PS0[B] and copy result to PS0[D] Invert bit 0 of PS1[B] and copy result to PS1[D] PS_RES - reciprocal estimate PS0[D] = 1 / PS0[B] PS1[D] = 1 / PS1[B] PS_RSQRTE - reciprocal square root estimate PS0[D] = 1 / SQRT(PS0[B]) PS1[D] = 1 / SQRT(PS1[B]) PS_SUB - subtract PS0[D] = PS0[A] - PS0[B] PS1[D] = PS1[A] - PS1[B] PS_MADD - multiply-add PS0[D] = PS0[A] * PS0[C] + PS0[B] PS1[D] = PS1[A] * PS1[C] + PS1[B] PS_MADDS0 - multiply-add scalar high PS0[D] = PS0[A] * PS0[C] + PS0[B] PS1[D] = PS1[A] * PS0[C] + PS1[B] PS_MADDS1 - multiply-add scalar low PS0[D] = PS0[A] * PS1[C] + PS0[B] PS1[D] = PS1[A] * PS1[C] + PS1[B] PS_MSUB - multiply-subtract PS0[D] = PS0[A] * PS0[C] - PS0[B] PS1[D] = PS1[A] * PS1[C] - PS1[B] PS_MUL - multiply PS0[D] = PS0[A] * PS0[C] PS1[D] = PS1[A] * PS1[C] PS_MULS0 - multiply scalar high PS0[D] = PS0[A] * PS0[C] PS1[D] = PS1[A] * PS0[C] PS_MULS1 - multiply scalar low PS0[D] = PS0[A] * PS1[C] PS1[D] = PS1[A] * PS1[C] PS_NMADD - negative multiply-add PS0[D] = - (PS0[A] * PS0[C] + PS0[B]) PS1[D] = - (PS1[A] * PS1[C] + PS1[B]) PS_NMSUB - negative multiply-subtract PS0[D] = - (PS0[A] * PS0[C] - PS0[B]) PS1[D] = - (PS1[A] * PS1[C] - PS1[B]) PS_SEL - select If (PS0[A] >= 0) then PS0[D] = PS0[C] else PS0[D] = PS0[B] If (PS1[A] >= 0) then PS1[D] = PS1[C] else PS1[D] = PS1[B] PS_SUM0 - vector sum high PS0[D] = PS0[A] + PS1[B] PS1[D] = PS1[C] PS_SUM1 - vector sum low PS0[D] = PS0[C] PS1[D] = PS0[A] + PS1[B] ... TODO Some of floating-point instructions changes behaviour, when PS is enabled. affect --------------------------------------------------------------------------- Written 2003, 2004 by ORG / Dolwin team. Last updated 28 Mar 2004.