pureikyubu/Docs/HW/ps.txt
2020-04-20 00:20:12 +03:00

273 lines
8.7 KiB
Text

GEKKO PAIRED-SINGLES - WHAT A THESE ?
---------------------------------------------------------------------------
1. Overview
Paired-Singles are analog of Intel (and other x86 series) processor's
"streamed instructions", known as SSE. This extension is specific for
Gekko processor and using to calculate two single-precision numbers
("floats" in C) using only one operation.
Floating-Point Registers of Gekko (FPRs) are modified in some way :
one half is used for first single number, and other for second.
This picture showing FPR format in paired-single mode :
---------------------------
bits: |0..........31|32.........63|
|-------------+-------------|
| PS0 | PS1 |
---------------------------
This parts are named as "PS0" and "PS1". Since Gekko is working in big-
endian mode, bits are numbered from-left-to-right order. There total 32
PS0 and 32 PS1 registers.
PS instructions set is divided on two parts : Load and Store Quantization
and Paired-Single Arithmetic instructions. Load and Store Quantization
instructions are used for fast integer-float type casting and some
specific memory operations, using PS0 and PS1 parts of FPR. Details are
given later in this document.
Paired-single mode is useful for fast vector and matrix calculations.
2. How to enable Paired-Single Mode
To enable PS-mode, you should set some bits of Gekko's HID2 custom
System-Purpose Register (HID2 assigned as SPR 920).
--------------
bits: | 0 |1| 2 |
|----+-+---+--- ... (dont care)
|LSQE| |PSE|
--------------
LSQE - Paired-Single load and store instructions enabled
PSE - Paired-Single mode enabled
Next low-level assembly code demonstrate, how to enable PS :
mfspr r0, 920 # read HID2
oris r0, 0xA000 # set LSQE and PSE bits
mtspr r0, 920 # write back
If you try to execute any PS instruction without LSQE and PSE bit set,
illegal instruction exception will be generated.
3. Paired-Single Load and Store
4. Paired-Single Arithmetic
Sorted opcode list :
------------------------------------
|00100| D |00000| B | 264 |R| ps_abs
|00100| D | A | B | 21 |R| ps_add
|00100| D 00| A | B | 32 |0| ps_cmpo0
|00100| D 00| A | B | 96 |0| ps_cmpo1
|00100| D 00| A | B | 0 |0| ps_cmpu0
|00100| D 00| A | B | 64 |0| ps_cmpu1
|00100| D | A | B | 18 |R| ps_div
|00100| D | A | B | 528 |R| ps_merge00
|00100| D | A | B | 560 |R| ps_merge01
|00100| D | A | B | 592 |R| ps_merge10
|00100| D | A | B | 624 |R| ps_merge11
|00100| D |00000| B | 72 |R| ps_mr
|00100| D |00000| B | 136 |R| ps_nabs
|00100| D |00000| B | 40 |R| ps_neg
|00100| D |00000| B | 24 |R| ps_res
|00100| D |00000| B | 26 |R| ps_rsqrte
|00100| D | A | B | 20 |R| ps_sub
|-----+-----+-----+-----+----------+-|
|00100| D | A | B | C | 29 |R| ps_madd
|00100| D | A | B | C | 14 |R| ps_madds0
|00100| D | A | B | C | 15 |R| ps_madds1
|00100| D | A | B | C | 28 |R| ps_msub
|00100| D | A |00000| C | 25 |R| ps_mul
|00100| D | A |00000| C | 12 |R| ps_muls0
|00100| D | A |00000| C | 13 |R| ps_muls1
|00100| D | A | B | C | 31 |R| ps_nmadd
|00100| D | A | B | C | 30 |R| ps_nmsub
|00100| D | A | B | C | 23 |R| ps_sel
|00100| D | A | B | C | 10 |R| ps_sum0
|00100| D | A | B | C | 11 |R| ps_sum1
------------------------------------
Note : R opcode field (comparsion of result with zero) is implemented,
but unused by regular GC programs.
Descriptions :
PS_ABS - absolute value
Clear bit 0 of PS0[B] and copy result to PS0[D]
Clear bit 0 of PS1[B] and copy result to PS1[D]
PS_ADD - add
PS0[D] = PS0[A] + PS0[B]
PS1[D] = PS1[A] + PS1[B]
PS_CMPO0 - compare ordered high
"c" holds result of comparsion
If (PS0[A] is NaN or PS0[B] is NaN) then c = 0001b
Else if (PS0[A] < PS0[B]) then c = 1000b
Else if (PS0[A] > PS0[B]) then c = 0100b
Else c = 0010b
Save result in D field of condition register (CR[D] = c).
PS_CMPO1 - compare ordered low
"c" holds result of comparsion
If (PS1[A] is NaN or PS1[B] is NaN) then c = 0001b
Else if (PS1[A] < PS1[B]) then c = 1000b
Else if (PS1[A] > PS1[B]) then c = 0100b
Else c = 0010b
Save result in D field of condition register (CR[D] = c).
PS_CMPU0 - compare unordered high
"c" holds result of comparsion
If (PS0[A] is NaN or PS0[B] is NaN) then c = 0001b
Else if (PS0[A] < PS0[B]) then c = 1000b
Else if (PS0[A] > PS0[B]) then c = 0100b
Else c = 0010b
Save result in D field of condition register (CR[D] = c).
PS_CMPU1 - compare unordered low
"c" holds result of comparsion
If (PS1[A] is NaN or PS1[B] is NaN) then c = 0001b
Else if (PS1[A] < PS1[B]) then c = 1000b
Else if (PS1[A] > PS1[B]) then c = 0100b
Else c = 0010b
Save result in D field of condition register (CR[D] = c).
These four compare instructions looks same, because I omitted some
unecessary FPSCR stuff.
PS_DIV - divide
PS0[D] = PS0[A] / PS0[B]
PS1[D] = PS1[A] / PS1[B]
PS_MERGE00 - merge high
PS0[D] = PS0[A]
PS1[D] = PS0[B]
PS_MERGE01 - merge direct
PS0[D] = PS0[A]
PS1[D] = PS1[B]
PS_MERGE10 - merge swapped
PS0[D] = PS1[A]
PS1[D] = PS0[B]
PS_MERGE11 - merge low
PS0[D] = PS1[A]
PS1[D] = PS1[B]
PS_MR - move register
PS0[D] = PS0[B]
PS1[D] = PS1[B]
PS_NABS - negate absolute value
Set bit 0 of PS0[B] and copy result to PS0[D]
Set bit 0 of PS1[B] and copy result to PS1[D]
PS_NEG - negate
Invert bit 0 of PS0[B] and copy result to PS0[D]
Invert bit 0 of PS1[B] and copy result to PS1[D]
PS_RES - reciprocal estimate
PS0[D] = 1 / PS0[B]
PS1[D] = 1 / PS1[B]
PS_RSQRTE - reciprocal square root estimate
PS0[D] = 1 / SQRT(PS0[B])
PS1[D] = 1 / SQRT(PS1[B])
PS_SUB - subtract
PS0[D] = PS0[A] - PS0[B]
PS1[D] = PS1[A] - PS1[B]
PS_MADD - multiply-add
PS0[D] = PS0[A] * PS0[C] + PS0[B]
PS1[D] = PS1[A] * PS1[C] + PS1[B]
PS_MADDS0 - multiply-add scalar high
PS0[D] = PS0[A] * PS0[C] + PS0[B]
PS1[D] = PS1[A] * PS0[C] + PS1[B]
PS_MADDS1 - multiply-add scalar low
PS0[D] = PS0[A] * PS1[C] + PS0[B]
PS1[D] = PS1[A] * PS1[C] + PS1[B]
PS_MSUB - multiply-subtract
PS0[D] = PS0[A] * PS0[C] - PS0[B]
PS1[D] = PS1[A] * PS1[C] - PS1[B]
PS_MUL - multiply
PS0[D] = PS0[A] * PS0[C]
PS1[D] = PS1[A] * PS1[C]
PS_MULS0 - multiply scalar high
PS0[D] = PS0[A] * PS0[C]
PS1[D] = PS1[A] * PS0[C]
PS_MULS1 - multiply scalar low
PS0[D] = PS0[A] * PS1[C]
PS1[D] = PS1[A] * PS1[C]
PS_NMADD - negative multiply-add
PS0[D] = - (PS0[A] * PS0[C] + PS0[B])
PS1[D] = - (PS1[A] * PS1[C] + PS1[B])
PS_NMSUB - negative multiply-subtract
PS0[D] = - (PS0[A] * PS0[C] - PS0[B])
PS1[D] = - (PS1[A] * PS1[C] - PS1[B])
PS_SEL - select
If (PS0[A] >= 0) then PS0[D] = PS0[C] else PS0[D] = PS0[B]
If (PS1[A] >= 0) then PS1[D] = PS1[C] else PS1[D] = PS1[B]
PS_SUM0 - vector sum high
PS0[D] = PS0[A] + PS1[B]
PS1[D] = PS1[C]
PS_SUM1 - vector sum low
PS0[D] = PS0[C]
PS1[D] = PS0[A] + PS1[B]
... TODO
Some of floating-point instructions changes behaviour, when PS is enabled.
affect
---------------------------------------------------------------------------
Written 2003, 2004 by ORG / Dolwin team. Last updated 28 Mar 2004.