Commit graph

953 commits

Author SHA1 Message Date
Tyler Stachecki
3e094c8985 Convert AIO's VABS optimization to AVX. 2015-01-01 10:46:28 -05:00
Tyler Stachecki
52afe866d4 Fix a mask typo in the last commit. 2015-01-01 10:46:22 -05:00
Tyler Stachecki
bf30cf29fd Fix a buggy accumulator clamp algorithm. 2015-01-01 10:46:16 -05:00
Tyler Stachecki
5e313634d3 Enable register-caching on MinGW.
Use a prelude to get around Microsoft's stupid calling convention.
2015-01-01 10:46:10 -05:00
Tyler Stachecki
8047bf94d9 Unbreak Windows builds (again). 2015-01-01 10:46:03 -05:00
Tyler Stachecki
8cb3c319f9 Commit AIO's VLT optimizations. 2015-01-01 10:45:57 -05:00
Tyler Stachecki
d9b9171f92 Work in AIO's optimizations for VABS. 2015-01-01 10:45:52 -05:00
Tyler Stachecki
b6f0d0ec58 Set initial values for VCC/VCO/VCE.
Thanks, krom!
2015-01-01 10:45:45 -05:00
Tyler Stachecki
d9b19d3f32 Move around and patch bugs in new functions. 2015-01-01 10:45:38 -05:00
Tyler Stachecki
b54f9618df Prevent register-caching on MinGW.
Since Microsoft decided to totally bork their x86_64 calling
convention, defer all Windows builds to non-optimized RSP
routines. When MinGW supports __vectorcall, this change can
be reverted.
2015-01-01 10:45:31 -05:00
Tyler Stachecki
5f10b427e1 Add support PE/COFF executable formats. 2015-01-01 10:45:22 -05:00
Tyler Stachecki
d32f8386cd Update toolchains with GNU AS references. 2015-01-01 10:45:15 -05:00
Tyler Stachecki
26d65b2ebe Optimize register-caching version of VMRG. 2015-01-01 10:45:07 -05:00
Tyler Stachecki
cc785f9f5b Only use VEX-encoded SSE where it helps us.
Otherwise, stick to the "legacy" SSE instructions as they're
smaller and we don't use the upper halves of AVX registers
anyways.
2015-01-01 10:45:01 -05:00
Tyler Stachecki
84cc9c93cb Fix register-caching version of VABS. 2015-01-01 10:44:54 -05:00
Tyler Stachecki
94ad149a12 Actually enable the register caching...
And fix a lot of bugs introduced with a regex.
2015-01-01 10:44:47 -05:00
Tyler Stachecki
7bc95ee3ee Implement register-caching version of VLT. 2015-01-01 10:44:40 -05:00
Tyler Stachecki
9b941eced8 Change RSP calling convention.
pblendvb needs the mask in %xmm0, so change the calling convention
around just enough so we can cut out a movdqa from most instructions.
2015-01-01 10:44:34 -05:00
Tyler Stachecki
ddb3c893e3 Implement register-caching version of VMRG. 2015-01-01 10:44:23 -05:00
Tyler Stachecki
4aabd7f49e Minor tweaks to VEQ/VNE register-cached versions. 2015-01-01 10:44:16 -05:00
Tyler Stachecki
e810689fde Implement register-caching versions of VGE. 2015-01-01 10:44:09 -05:00
Tyler Stachecki
340da34715 Implement register-caching versions of VEQ/VNE. 2015-01-01 10:44:02 -05:00
Tyler Stachecki
c83fe8d424 Prepare to register-cache RSP flags. 2015-01-01 10:43:54 -05:00
Tyler Stachecki
2cc1759259 Register-caching variations of bitwise functions. 2015-01-01 10:43:49 -05:00
Tyler Stachecki
586cf84113 Implement register-caching versions of VABS. 2015-01-01 10:43:44 -05:00
Tyler Stachecki
532dd87223 Actually optimize RelWithDebInfo builds. 2014-12-27 08:19:28 -05:00
Tyler Stachecki
4c9d129173 Fix SSSE3 builds/regex mistake in CMakeLists. 2014-12-26 14:55:43 -05:00
Tyler Stachecki
3a582f81ac Clamp VMOV/VRCP/VRSQ in/outputs to full elements. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
c1f4ddd911 Fix MFC2/MTC2 odd-element byte indexing. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
71db976759 Fix a typo in the VMOV implementation. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
bc8300c7de Fix a pair RSP flag-related bugs. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
574c85ad37 Add some missing flag clears to VCL. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
8f17a516bc Fix a stray memory copy. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
6d0af5d89a Cleanup SSSE3+ loads and stores. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
ee526c543c Commit AIO's VCR optimizations. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
3a969b2379 Do some general cleanup/optimization. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
b740c9a5b3 Optimize RSP CP2 register transfers. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
fea458e70c Add (partial) implementations for LPV/LUV/SPV/SUV.
Also, cleanup other SSSE3+ accelerated loads and stores.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
03f04c1b82 Add implementation for MTC2. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
9f9e3ebf80 Sort out a pair of RSP bugs. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
b33f2800ae Add implementation for MFC2. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
a2f87f843c Optimize VRCP* and VRSQ* functions. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
824131db6b Use a union for RSP vectors to force alignment. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
dc008abe77 Fix more show-stopping RSP bugs. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
173815ed63 Another bug: make sure memory requests get filled. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
1e059e3f71 Fix a potentially disasterous RSP bug. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
645f4b06ea Minor cleanup to the RSP pipeline. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
6faca60054 Start reworking RSP vector loads and stores. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
f1929a056c Commit AIO's VMACF implementation. 2014-12-24 15:18:59 -05:00
Tyler Stachecki
ae714715fb Commit AIO's VABS optimization. 2014-12-23 01:13:50 -05:00