mirror of
https://github.com/emu-russia/pureikyubu.git
synced 2025-04-02 10:42:15 -04:00
273 lines
8.7 KiB
Text
273 lines
8.7 KiB
Text
|
|
GEKKO PAIRED-SINGLES - WHAT A THESE ?
|
|
|
|
---------------------------------------------------------------------------
|
|
|
|
1. Overview
|
|
|
|
Paired-Singles are analog of Intel (and other x86 series) processor's
|
|
"streamed instructions", known as SSE. This extension is specific for
|
|
Gekko processor and using to calculate two single-precision numbers
|
|
("floats" in C) using only one operation.
|
|
|
|
Floating-Point Registers of Gekko (FPRs) are modified in some way :
|
|
one half is used for first single number, and other for second.
|
|
This picture showing FPR format in paired-single mode :
|
|
|
|
---------------------------
|
|
bits: |0..........31|32.........63|
|
|
|-------------+-------------|
|
|
| PS0 | PS1 |
|
|
---------------------------
|
|
|
|
This parts are named as "PS0" and "PS1". Since Gekko is working in big-
|
|
endian mode, bits are numbered from-left-to-right order. There total 32
|
|
PS0 and 32 PS1 registers.
|
|
|
|
PS instructions set is divided on two parts : Load and Store Quantization
|
|
and Paired-Single Arithmetic instructions. Load and Store Quantization
|
|
instructions are used for fast integer-float type casting and some
|
|
specific memory operations, using PS0 and PS1 parts of FPR. Details are
|
|
given later in this document.
|
|
|
|
Paired-single mode is useful for fast vector and matrix calculations.
|
|
|
|
2. How to enable Paired-Single Mode
|
|
|
|
To enable PS-mode, you should set some bits of Gekko's HID2 custom
|
|
System-Purpose Register (HID2 assigned as SPR 920).
|
|
|
|
--------------
|
|
bits: | 0 |1| 2 |
|
|
|----+-+---+--- ... (dont care)
|
|
|LSQE| |PSE|
|
|
--------------
|
|
|
|
LSQE - Paired-Single load and store instructions enabled
|
|
PSE - Paired-Single mode enabled
|
|
|
|
Next low-level assembly code demonstrate, how to enable PS :
|
|
|
|
mfspr r0, 920 # read HID2
|
|
oris r0, 0xA000 # set LSQE and PSE bits
|
|
mtspr r0, 920 # write back
|
|
|
|
If you try to execute any PS instruction without LSQE and PSE bit set,
|
|
illegal instruction exception will be generated.
|
|
|
|
3. Paired-Single Load and Store
|
|
|
|
4. Paired-Single Arithmetic
|
|
|
|
Sorted opcode list :
|
|
|
|
------------------------------------
|
|
|00100| D |00000| B | 264 |R| ps_abs
|
|
|00100| D | A | B | 21 |R| ps_add
|
|
|00100| D 00| A | B | 32 |0| ps_cmpo0
|
|
|00100| D 00| A | B | 96 |0| ps_cmpo1
|
|
|00100| D 00| A | B | 0 |0| ps_cmpu0
|
|
|00100| D 00| A | B | 64 |0| ps_cmpu1
|
|
|00100| D | A | B | 18 |R| ps_div
|
|
|00100| D | A | B | 528 |R| ps_merge00
|
|
|00100| D | A | B | 560 |R| ps_merge01
|
|
|00100| D | A | B | 592 |R| ps_merge10
|
|
|00100| D | A | B | 624 |R| ps_merge11
|
|
|00100| D |00000| B | 72 |R| ps_mr
|
|
|00100| D |00000| B | 136 |R| ps_nabs
|
|
|00100| D |00000| B | 40 |R| ps_neg
|
|
|00100| D |00000| B | 24 |R| ps_res
|
|
|00100| D |00000| B | 26 |R| ps_rsqrte
|
|
|00100| D | A | B | 20 |R| ps_sub
|
|
|-----+-----+-----+-----+----------+-|
|
|
|00100| D | A | B | C | 29 |R| ps_madd
|
|
|00100| D | A | B | C | 14 |R| ps_madds0
|
|
|00100| D | A | B | C | 15 |R| ps_madds1
|
|
|00100| D | A | B | C | 28 |R| ps_msub
|
|
|00100| D | A |00000| C | 25 |R| ps_mul
|
|
|00100| D | A |00000| C | 12 |R| ps_muls0
|
|
|00100| D | A |00000| C | 13 |R| ps_muls1
|
|
|00100| D | A | B | C | 31 |R| ps_nmadd
|
|
|00100| D | A | B | C | 30 |R| ps_nmsub
|
|
|00100| D | A | B | C | 23 |R| ps_sel
|
|
|00100| D | A | B | C | 10 |R| ps_sum0
|
|
|00100| D | A | B | C | 11 |R| ps_sum1
|
|
------------------------------------
|
|
|
|
Note : R opcode field (comparsion of result with zero) is implemented,
|
|
but unused by regular GC programs.
|
|
|
|
Descriptions :
|
|
|
|
PS_ABS - absolute value
|
|
|
|
Clear bit 0 of PS0[B] and copy result to PS0[D]
|
|
Clear bit 0 of PS1[B] and copy result to PS1[D]
|
|
|
|
PS_ADD - add
|
|
|
|
PS0[D] = PS0[A] + PS0[B]
|
|
PS1[D] = PS1[A] + PS1[B]
|
|
|
|
PS_CMPO0 - compare ordered high
|
|
|
|
"c" holds result of comparsion
|
|
If (PS0[A] is NaN or PS0[B] is NaN) then c = 0001b
|
|
Else if (PS0[A] < PS0[B]) then c = 1000b
|
|
Else if (PS0[A] > PS0[B]) then c = 0100b
|
|
Else c = 0010b
|
|
Save result in D field of condition register (CR[D] = c).
|
|
|
|
PS_CMPO1 - compare ordered low
|
|
|
|
"c" holds result of comparsion
|
|
If (PS1[A] is NaN or PS1[B] is NaN) then c = 0001b
|
|
Else if (PS1[A] < PS1[B]) then c = 1000b
|
|
Else if (PS1[A] > PS1[B]) then c = 0100b
|
|
Else c = 0010b
|
|
Save result in D field of condition register (CR[D] = c).
|
|
|
|
PS_CMPU0 - compare unordered high
|
|
|
|
"c" holds result of comparsion
|
|
If (PS0[A] is NaN or PS0[B] is NaN) then c = 0001b
|
|
Else if (PS0[A] < PS0[B]) then c = 1000b
|
|
Else if (PS0[A] > PS0[B]) then c = 0100b
|
|
Else c = 0010b
|
|
Save result in D field of condition register (CR[D] = c).
|
|
|
|
PS_CMPU1 - compare unordered low
|
|
|
|
"c" holds result of comparsion
|
|
If (PS1[A] is NaN or PS1[B] is NaN) then c = 0001b
|
|
Else if (PS1[A] < PS1[B]) then c = 1000b
|
|
Else if (PS1[A] > PS1[B]) then c = 0100b
|
|
Else c = 0010b
|
|
Save result in D field of condition register (CR[D] = c).
|
|
|
|
These four compare instructions looks same, because I omitted some
|
|
unecessary FPSCR stuff.
|
|
|
|
PS_DIV - divide
|
|
|
|
PS0[D] = PS0[A] / PS0[B]
|
|
PS1[D] = PS1[A] / PS1[B]
|
|
|
|
PS_MERGE00 - merge high
|
|
|
|
PS0[D] = PS0[A]
|
|
PS1[D] = PS0[B]
|
|
|
|
PS_MERGE01 - merge direct
|
|
|
|
PS0[D] = PS0[A]
|
|
PS1[D] = PS1[B]
|
|
|
|
PS_MERGE10 - merge swapped
|
|
|
|
PS0[D] = PS1[A]
|
|
PS1[D] = PS0[B]
|
|
|
|
PS_MERGE11 - merge low
|
|
|
|
PS0[D] = PS1[A]
|
|
PS1[D] = PS1[B]
|
|
|
|
PS_MR - move register
|
|
|
|
PS0[D] = PS0[B]
|
|
PS1[D] = PS1[B]
|
|
|
|
PS_NABS - negate absolute value
|
|
|
|
Set bit 0 of PS0[B] and copy result to PS0[D]
|
|
Set bit 0 of PS1[B] and copy result to PS1[D]
|
|
|
|
PS_NEG - negate
|
|
|
|
Invert bit 0 of PS0[B] and copy result to PS0[D]
|
|
Invert bit 0 of PS1[B] and copy result to PS1[D]
|
|
|
|
PS_RES - reciprocal estimate
|
|
|
|
PS0[D] = 1 / PS0[B]
|
|
PS1[D] = 1 / PS1[B]
|
|
|
|
PS_RSQRTE - reciprocal square root estimate
|
|
|
|
PS0[D] = 1 / SQRT(PS0[B])
|
|
PS1[D] = 1 / SQRT(PS1[B])
|
|
|
|
PS_SUB - subtract
|
|
|
|
PS0[D] = PS0[A] - PS0[B]
|
|
PS1[D] = PS1[A] - PS1[B]
|
|
|
|
PS_MADD - multiply-add
|
|
|
|
PS0[D] = PS0[A] * PS0[C] + PS0[B]
|
|
PS1[D] = PS1[A] * PS1[C] + PS1[B]
|
|
|
|
PS_MADDS0 - multiply-add scalar high
|
|
|
|
PS0[D] = PS0[A] * PS0[C] + PS0[B]
|
|
PS1[D] = PS1[A] * PS0[C] + PS1[B]
|
|
|
|
PS_MADDS1 - multiply-add scalar low
|
|
|
|
PS0[D] = PS0[A] * PS1[C] + PS0[B]
|
|
PS1[D] = PS1[A] * PS1[C] + PS1[B]
|
|
|
|
PS_MSUB - multiply-subtract
|
|
|
|
PS0[D] = PS0[A] * PS0[C] - PS0[B]
|
|
PS1[D] = PS1[A] * PS1[C] - PS1[B]
|
|
|
|
PS_MUL - multiply
|
|
|
|
PS0[D] = PS0[A] * PS0[C]
|
|
PS1[D] = PS1[A] * PS1[C]
|
|
|
|
PS_MULS0 - multiply scalar high
|
|
|
|
PS0[D] = PS0[A] * PS0[C]
|
|
PS1[D] = PS1[A] * PS0[C]
|
|
|
|
PS_MULS1 - multiply scalar low
|
|
|
|
PS0[D] = PS0[A] * PS1[C]
|
|
PS1[D] = PS1[A] * PS1[C]
|
|
|
|
PS_NMADD - negative multiply-add
|
|
|
|
PS0[D] = - (PS0[A] * PS0[C] + PS0[B])
|
|
PS1[D] = - (PS1[A] * PS1[C] + PS1[B])
|
|
|
|
PS_NMSUB - negative multiply-subtract
|
|
|
|
PS0[D] = - (PS0[A] * PS0[C] - PS0[B])
|
|
PS1[D] = - (PS1[A] * PS1[C] - PS1[B])
|
|
|
|
PS_SEL - select
|
|
|
|
If (PS0[A] >= 0) then PS0[D] = PS0[C] else PS0[D] = PS0[B]
|
|
If (PS1[A] >= 0) then PS1[D] = PS1[C] else PS1[D] = PS1[B]
|
|
|
|
PS_SUM0 - vector sum high
|
|
|
|
PS0[D] = PS0[A] + PS1[B]
|
|
PS1[D] = PS1[C]
|
|
|
|
PS_SUM1 - vector sum low
|
|
|
|
PS0[D] = PS0[C]
|
|
PS1[D] = PS0[A] + PS1[B]
|
|
|
|
|
|
... TODO
|
|
Some of floating-point instructions changes behaviour, when PS is enabled.
|
|
affect
|
|
|
|
---------------------------------------------------------------------------
|
|
|
|
Written 2003, 2004 by ORG / Dolwin team. Last updated 28 Mar 2004.
|