Tyler Stachecki
3e094c8985
Convert AIO's VABS optimization to AVX.
2015-01-01 10:46:28 -05:00
Tyler Stachecki
52afe866d4
Fix a mask typo in the last commit.
2015-01-01 10:46:22 -05:00
Tyler Stachecki
bf30cf29fd
Fix a buggy accumulator clamp algorithm.
2015-01-01 10:46:16 -05:00
Tyler Stachecki
5e313634d3
Enable register-caching on MinGW.
...
Use a prelude to get around Microsoft's stupid calling convention.
2015-01-01 10:46:10 -05:00
Tyler Stachecki
8047bf94d9
Unbreak Windows builds (again).
2015-01-01 10:46:03 -05:00
Tyler Stachecki
8cb3c319f9
Commit AIO's VLT optimizations.
2015-01-01 10:45:57 -05:00
Tyler Stachecki
d9b9171f92
Work in AIO's optimizations for VABS.
2015-01-01 10:45:52 -05:00
Tyler Stachecki
b6f0d0ec58
Set initial values for VCC/VCO/VCE.
...
Thanks, krom!
2015-01-01 10:45:45 -05:00
Tyler Stachecki
d9b19d3f32
Move around and patch bugs in new functions.
2015-01-01 10:45:38 -05:00
Tyler Stachecki
b54f9618df
Prevent register-caching on MinGW.
...
Since Microsoft decided to totally bork their x86_64 calling
convention, defer all Windows builds to non-optimized RSP
routines. When MinGW supports __vectorcall, this change can
be reverted.
2015-01-01 10:45:31 -05:00
Tyler Stachecki
5f10b427e1
Add support PE/COFF executable formats.
2015-01-01 10:45:22 -05:00
Tyler Stachecki
d32f8386cd
Update toolchains with GNU AS references.
2015-01-01 10:45:15 -05:00
Tyler Stachecki
26d65b2ebe
Optimize register-caching version of VMRG.
2015-01-01 10:45:07 -05:00
Tyler Stachecki
cc785f9f5b
Only use VEX-encoded SSE where it helps us.
...
Otherwise, stick to the "legacy" SSE instructions as they're
smaller and we don't use the upper halves of AVX registers
anyways.
2015-01-01 10:45:01 -05:00
Tyler Stachecki
84cc9c93cb
Fix register-caching version of VABS.
2015-01-01 10:44:54 -05:00
Tyler Stachecki
94ad149a12
Actually enable the register caching...
...
And fix a lot of bugs introduced with a regex.
2015-01-01 10:44:47 -05:00
Tyler Stachecki
7bc95ee3ee
Implement register-caching version of VLT.
2015-01-01 10:44:40 -05:00
Tyler Stachecki
9b941eced8
Change RSP calling convention.
...
pblendvb needs the mask in %xmm0, so change the calling convention
around just enough so we can cut out a movdqa from most instructions.
2015-01-01 10:44:34 -05:00
Tyler Stachecki
ddb3c893e3
Implement register-caching version of VMRG.
2015-01-01 10:44:23 -05:00
Tyler Stachecki
4aabd7f49e
Minor tweaks to VEQ/VNE register-cached versions.
2015-01-01 10:44:16 -05:00
Tyler Stachecki
e810689fde
Implement register-caching versions of VGE.
2015-01-01 10:44:09 -05:00
Tyler Stachecki
340da34715
Implement register-caching versions of VEQ/VNE.
2015-01-01 10:44:02 -05:00
Tyler Stachecki
c83fe8d424
Prepare to register-cache RSP flags.
2015-01-01 10:43:54 -05:00
Tyler Stachecki
2cc1759259
Register-caching variations of bitwise functions.
2015-01-01 10:43:49 -05:00
Tyler Stachecki
586cf84113
Implement register-caching versions of VABS.
2015-01-01 10:43:44 -05:00
Tyler Stachecki
532dd87223
Actually optimize RelWithDebInfo builds.
2014-12-27 08:19:28 -05:00
Tyler Stachecki
4c9d129173
Fix SSSE3 builds/regex mistake in CMakeLists.
2014-12-26 14:55:43 -05:00
Tyler Stachecki
3a582f81ac
Clamp VMOV/VRCP/VRSQ in/outputs to full elements.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
c1f4ddd911
Fix MFC2/MTC2 odd-element byte indexing.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
71db976759
Fix a typo in the VMOV implementation.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
bc8300c7de
Fix a pair RSP flag-related bugs.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
574c85ad37
Add some missing flag clears to VCL.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
8f17a516bc
Fix a stray memory copy.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
6d0af5d89a
Cleanup SSSE3+ loads and stores.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
ee526c543c
Commit AIO's VCR optimizations.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
3a969b2379
Do some general cleanup/optimization.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
b740c9a5b3
Optimize RSP CP2 register transfers.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
fea458e70c
Add (partial) implementations for LPV/LUV/SPV/SUV.
...
Also, cleanup other SSSE3+ accelerated loads and stores.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
03f04c1b82
Add implementation for MTC2.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
9f9e3ebf80
Sort out a pair of RSP bugs.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
b33f2800ae
Add implementation for MFC2.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
a2f87f843c
Optimize VRCP* and VRSQ* functions.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
824131db6b
Use a union for RSP vectors to force alignment.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
dc008abe77
Fix more show-stopping RSP bugs.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
173815ed63
Another bug: make sure memory requests get filled.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
1e059e3f71
Fix a potentially disasterous RSP bug.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
645f4b06ea
Minor cleanup to the RSP pipeline.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
6faca60054
Start reworking RSP vector loads and stores.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
f1929a056c
Commit AIO's VMACF implementation.
2014-12-24 15:18:59 -05:00
Tyler Stachecki
ae714715fb
Commit AIO's VABS optimization.
2014-12-23 01:13:50 -05:00