Commit graph

273 commits

Author SHA1 Message Date
unknown
156e4cac52 fixed problem ordering zero-extension before add 2013-11-20 12:09:53 -05:00
unknown
daa2c258b2 better jump table using function look-up (smaller & faster) 2013-10-11 03:41:21 -04:00
unknown
cd77984576 manual override of GCC's slower decode of SA IW bitmask 2013-10-11 02:27:18 -04:00
unknown
56c5ac9daf BIG speed-up by moving shuffles out of EX queue, into VU ops 2013-10-11 00:40:41 -04:00
unknown
4c2b671d53 deprecated 2-D opcode-element vector jump table 2013-10-10 23:04:07 -04:00
unknown
f0b8985bda moved SSE2 declare macro to MAKE/GCC command script 2013-10-07 23:37:06 -04:00
unknown
34c6065842 add SSSE3 shuffling alternative straight off CEN64 wisdom 2013-10-07 22:44:32 -04:00
unknown
adced4b284 fixed VMACU overflow mask, more direct VMULU speed clamp 2013-10-07 06:17:06 -04:00
unknown
690c25008e much faster unsigned clamp for VMACU/MULU, small VMADN jump 2013-10-07 03:45:31 -04:00
unknown
38a03b2566 found similar fraction delay speed-ups from MUL as with MAC 2013-10-06 09:49:43 -04:00
unknown
2042e783a4 new semi-fraction rounding delay technique to optimize MUL 2013-10-06 09:24:27 -04:00
unknown
b7656289a5 micro-optimizations to basic MAC of fractions 2013-10-06 08:10:58 -04:00
unknown
3870e93fa8 fix MusyX MP3 signed fractions compressor 2013-10-05 23:59:55 -04:00
unknown
9d1e64e935 moved everything about RSP flags to new SSE2-hybrid header 2013-10-03 21:06:41 -04:00
unknown
390f0d2ad0 minimize automated pack/unpack extensions in VADDC/VSUBC 2013-10-02 01:35:06 -04:00
unknown
271f6cae5f fix interposed clamp problems in VADD/VSUB, wipe old crap 2013-10-02 00:15:56 -04:00
unknown
9af8ba0f57 further compacted clamp into saturated add/sub 2013-09-29 02:11:00 -04:00
unknown
dafee07a5a switch to smaller dynamic shuffling resource 2013-09-28 22:34:38 -04:00
unknown
a661e72c64 amend code generation bug in GCC 4.8.1 vectorizer 2013-09-28 19:55:57 -04:00
unknown
4bdbbfdff0 rewired low clamper scheme, moved VMRG to a schematic 2013-09-28 16:39:12 -04:00
unknown
33eb07512a simplified regular signed clamping with straight SSE 2013-09-28 04:38:32 -04:00
unknown
148aa0f7a4 microoptimizations to uncommon clamps via MarathonMan's intrinsics 2013-09-27 22:17:06 -04:00
unknown
b6fc11a983 fix build issue when compiling without SSE2 support 2013-09-26 03:51:46 -04:00
unknown
b0d38d05b0 updates to the source directory structure, a few ANSI tweaks 2013-09-24 02:36:00 -04:00
unknown
d226d4b693 tl;dr 2013-09-23 14:33:34 -04:00
unknown
342488f056 integrated signed clamp with VADD/VSUB 2013-09-23 14:29:18 -04:00
unknown
bdf5e3c068 snuck in sign-extension bug just before last commit >.< 2013-09-23 05:25:52 -04:00
unknown
85fe9d7081 destroyed global result clamp buffer, lots of extra SSE2 ops cut out 2013-09-23 05:14:33 -04:00
unknown
d29cbbe3e2 more MAC micro-optimizations, split clamping to new header 2013-09-23 03:30:24 -04:00
unknown
256ffb4b57 phased out some more excess multiply packs/unpacks 2013-09-22 20:21:35 -04:00
unknown
dde3f6d456 once again, ensure 128-bit VR<--ACC writes, not memcpy bytes 2013-09-22 06:11:30 -04:00
unknown
3c1a9c1cb4 various upgrades and microoptimizations to ADD group 2013-09-22 05:36:43 -04:00
unknown
19b571ff43 wrong byte-iterative copy precision caused scalar acc R/W 2013-09-22 04:52:45 -04:00
unknown
c427e052fe purified vectors for complex RSP clip selects, fix VCL 2013-09-22 04:39:23 -04:00
unknown
b960b8759f purify parallel executions for simple select ops 2013-09-21 23:43:45 -04:00
cxd4
3458546391 Merge pull request #1 from tj90241/master
Do not pass -m3dnow to gcc.
2013-09-21 15:16:34 -07:00
unknown
5c1ab5806b unify shuffles to new one-dimensional vector op-code jumping 2013-09-21 18:16:13 -04:00
Tyler Stachecki
e5ad629b5a Make this array static. 2013-09-21 02:21:18 -04:00
unknown
d064b82976 split shuffle-related stuff to its own header 2013-09-20 23:49:05 -04:00
unknown
70a45f8bd7 fix Visual Studio interpretation of VR "re-def" 2013-09-20 15:59:27 -04:00
unknown
62e5d5cecd factored out 16-bit VMACU segments into clamp mode 2013-09-20 15:57:06 -04:00
unknown
d757e41676 force 16-byte alignment, divide clamp base formula to each op 2013-09-20 11:56:10 -04:00
unknown
bbebbcc81a uninstalled dynamic vector/scalar coefficient global 2013-09-20 00:36:26 -04:00
unknown
3c96c25950 fix compiler static over-optimized SSE2 build faults 2013-09-19 20:14:03 -04:00
unknown
d69229f3f2 unify scalar whole shuffling to SSE2 generator 2013-09-19 05:13:45 -04:00
unknown
139864c71f forgot to apply N macro to the divides 2013-09-19 03:23:09 -04:00
unknown
d690c5f1c9 SSE2-shuffled clip/select ops, and that's the last of them. 2013-09-19 02:11:10 -04:00
unknown
d6b475855a all shuffling for vector add group vectorized 2013-09-18 23:24:08 -04:00
unknown
485a04bc60 divide group shuffling now completely vectorized 2013-09-18 22:00:00 -04:00
unknown
cd99d4306b no more ugly scalar shuffling in logical vector group 2013-09-18 03:45:53 -04:00