unknown
|
156e4cac52
|
fixed problem ordering zero-extension before add
|
2013-11-20 12:09:53 -05:00 |
|
unknown
|
daa2c258b2
|
better jump table using function look-up (smaller & faster)
|
2013-10-11 03:41:21 -04:00 |
|
unknown
|
cd77984576
|
manual override of GCC's slower decode of SA IW bitmask
|
2013-10-11 02:27:18 -04:00 |
|
unknown
|
56c5ac9daf
|
BIG speed-up by moving shuffles out of EX queue, into VU ops
|
2013-10-11 00:40:41 -04:00 |
|
unknown
|
4c2b671d53
|
deprecated 2-D opcode-element vector jump table
|
2013-10-10 23:04:07 -04:00 |
|
unknown
|
f0b8985bda
|
moved SSE2 declare macro to MAKE/GCC command script
|
2013-10-07 23:37:06 -04:00 |
|
unknown
|
34c6065842
|
add SSSE3 shuffling alternative straight off CEN64 wisdom
|
2013-10-07 22:44:32 -04:00 |
|
unknown
|
adced4b284
|
fixed VMACU overflow mask, more direct VMULU speed clamp
|
2013-10-07 06:17:06 -04:00 |
|
unknown
|
690c25008e
|
much faster unsigned clamp for VMACU/MULU, small VMADN jump
|
2013-10-07 03:45:31 -04:00 |
|
unknown
|
38a03b2566
|
found similar fraction delay speed-ups from MUL as with MAC
|
2013-10-06 09:49:43 -04:00 |
|
unknown
|
2042e783a4
|
new semi-fraction rounding delay technique to optimize MUL
|
2013-10-06 09:24:27 -04:00 |
|
unknown
|
b7656289a5
|
micro-optimizations to basic MAC of fractions
|
2013-10-06 08:10:58 -04:00 |
|
unknown
|
3870e93fa8
|
fix MusyX MP3 signed fractions compressor
|
2013-10-05 23:59:55 -04:00 |
|
unknown
|
9d1e64e935
|
moved everything about RSP flags to new SSE2-hybrid header
|
2013-10-03 21:06:41 -04:00 |
|
unknown
|
390f0d2ad0
|
minimize automated pack/unpack extensions in VADDC/VSUBC
|
2013-10-02 01:35:06 -04:00 |
|
unknown
|
271f6cae5f
|
fix interposed clamp problems in VADD/VSUB, wipe old crap
|
2013-10-02 00:15:56 -04:00 |
|
unknown
|
9af8ba0f57
|
further compacted clamp into saturated add/sub
|
2013-09-29 02:11:00 -04:00 |
|
unknown
|
dafee07a5a
|
switch to smaller dynamic shuffling resource
|
2013-09-28 22:34:38 -04:00 |
|
unknown
|
a661e72c64
|
amend code generation bug in GCC 4.8.1 vectorizer
|
2013-09-28 19:55:57 -04:00 |
|
unknown
|
4bdbbfdff0
|
rewired low clamper scheme, moved VMRG to a schematic
|
2013-09-28 16:39:12 -04:00 |
|
unknown
|
33eb07512a
|
simplified regular signed clamping with straight SSE
|
2013-09-28 04:38:32 -04:00 |
|
unknown
|
148aa0f7a4
|
microoptimizations to uncommon clamps via MarathonMan's intrinsics
|
2013-09-27 22:17:06 -04:00 |
|
unknown
|
b6fc11a983
|
fix build issue when compiling without SSE2 support
|
2013-09-26 03:51:46 -04:00 |
|
unknown
|
b0d38d05b0
|
updates to the source directory structure, a few ANSI tweaks
|
2013-09-24 02:36:00 -04:00 |
|
unknown
|
d226d4b693
|
tl;dr
|
2013-09-23 14:33:34 -04:00 |
|
unknown
|
342488f056
|
integrated signed clamp with VADD/VSUB
|
2013-09-23 14:29:18 -04:00 |
|
unknown
|
bdf5e3c068
|
snuck in sign-extension bug just before last commit >.<
|
2013-09-23 05:25:52 -04:00 |
|
unknown
|
85fe9d7081
|
destroyed global result clamp buffer, lots of extra SSE2 ops cut out
|
2013-09-23 05:14:33 -04:00 |
|
unknown
|
d29cbbe3e2
|
more MAC micro-optimizations, split clamping to new header
|
2013-09-23 03:30:24 -04:00 |
|
unknown
|
256ffb4b57
|
phased out some more excess multiply packs/unpacks
|
2013-09-22 20:21:35 -04:00 |
|
unknown
|
dde3f6d456
|
once again, ensure 128-bit VR<--ACC writes, not memcpy bytes
|
2013-09-22 06:11:30 -04:00 |
|
unknown
|
3c1a9c1cb4
|
various upgrades and microoptimizations to ADD group
|
2013-09-22 05:36:43 -04:00 |
|
unknown
|
19b571ff43
|
wrong byte-iterative copy precision caused scalar acc R/W
|
2013-09-22 04:52:45 -04:00 |
|
unknown
|
c427e052fe
|
purified vectors for complex RSP clip selects, fix VCL
|
2013-09-22 04:39:23 -04:00 |
|
unknown
|
b960b8759f
|
purify parallel executions for simple select ops
|
2013-09-21 23:43:45 -04:00 |
|
cxd4
|
3458546391
|
Merge pull request #1 from tj90241/master
Do not pass -m3dnow to gcc.
|
2013-09-21 15:16:34 -07:00 |
|
unknown
|
5c1ab5806b
|
unify shuffles to new one-dimensional vector op-code jumping
|
2013-09-21 18:16:13 -04:00 |
|
Tyler Stachecki
|
e5ad629b5a
|
Make this array static.
|
2013-09-21 02:21:18 -04:00 |
|
unknown
|
d064b82976
|
split shuffle-related stuff to its own header
|
2013-09-20 23:49:05 -04:00 |
|
unknown
|
70a45f8bd7
|
fix Visual Studio interpretation of VR "re-def"
|
2013-09-20 15:59:27 -04:00 |
|
unknown
|
62e5d5cecd
|
factored out 16-bit VMACU segments into clamp mode
|
2013-09-20 15:57:06 -04:00 |
|
unknown
|
d757e41676
|
force 16-byte alignment, divide clamp base formula to each op
|
2013-09-20 11:56:10 -04:00 |
|
unknown
|
bbebbcc81a
|
uninstalled dynamic vector/scalar coefficient global
|
2013-09-20 00:36:26 -04:00 |
|
unknown
|
3c96c25950
|
fix compiler static over-optimized SSE2 build faults
|
2013-09-19 20:14:03 -04:00 |
|
unknown
|
d69229f3f2
|
unify scalar whole shuffling to SSE2 generator
|
2013-09-19 05:13:45 -04:00 |
|
unknown
|
139864c71f
|
forgot to apply N macro to the divides
|
2013-09-19 03:23:09 -04:00 |
|
unknown
|
d690c5f1c9
|
SSE2-shuffled clip/select ops, and that's the last of them.
|
2013-09-19 02:11:10 -04:00 |
|
unknown
|
d6b475855a
|
all shuffling for vector add group vectorized
|
2013-09-18 23:24:08 -04:00 |
|
unknown
|
485a04bc60
|
divide group shuffling now completely vectorized
|
2013-09-18 22:00:00 -04:00 |
|
unknown
|
cd99d4306b
|
no more ugly scalar shuffling in logical vector group
|
2013-09-18 03:45:53 -04:00 |
|