Tyler Stachecki
5f10b427e1
Add support PE/COFF executable formats.
2015-01-01 10:45:22 -05:00
Tyler Stachecki
26d65b2ebe
Optimize register-caching version of VMRG.
2015-01-01 10:45:07 -05:00
Tyler Stachecki
cc785f9f5b
Only use VEX-encoded SSE where it helps us.
...
Otherwise, stick to the "legacy" SSE instructions as they're
smaller and we don't use the upper halves of AVX registers
anyways.
2015-01-01 10:45:01 -05:00
Tyler Stachecki
84cc9c93cb
Fix register-caching version of VABS.
2015-01-01 10:44:54 -05:00
Tyler Stachecki
94ad149a12
Actually enable the register caching...
...
And fix a lot of bugs introduced with a regex.
2015-01-01 10:44:47 -05:00
Tyler Stachecki
7bc95ee3ee
Implement register-caching version of VLT.
2015-01-01 10:44:40 -05:00
Tyler Stachecki
9b941eced8
Change RSP calling convention.
...
pblendvb needs the mask in %xmm0, so change the calling convention
around just enough so we can cut out a movdqa from most instructions.
2015-01-01 10:44:34 -05:00
Tyler Stachecki
ddb3c893e3
Implement register-caching version of VMRG.
2015-01-01 10:44:23 -05:00
Tyler Stachecki
4aabd7f49e
Minor tweaks to VEQ/VNE register-cached versions.
2015-01-01 10:44:16 -05:00
Tyler Stachecki
e810689fde
Implement register-caching versions of VGE.
2015-01-01 10:44:09 -05:00
Tyler Stachecki
340da34715
Implement register-caching versions of VEQ/VNE.
2015-01-01 10:44:02 -05:00
Tyler Stachecki
c83fe8d424
Prepare to register-cache RSP flags.
2015-01-01 10:43:54 -05:00
Tyler Stachecki
2cc1759259
Register-caching variations of bitwise functions.
2015-01-01 10:43:49 -05:00
Tyler Stachecki
586cf84113
Implement register-caching versions of VABS.
2015-01-01 10:43:44 -05:00
Tyler Stachecki
3a582f81ac
Clamp VMOV/VRCP/VRSQ in/outputs to full elements.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
71db976759
Fix a typo in the VMOV implementation.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
bc8300c7de
Fix a pair RSP flag-related bugs.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
6d0af5d89a
Cleanup SSSE3+ loads and stores.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
ee526c543c
Commit AIO's VCR optimizations.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
3a969b2379
Do some general cleanup/optimization.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
fea458e70c
Add (partial) implementations for LPV/LUV/SPV/SUV.
...
Also, cleanup other SSSE3+ accelerated loads and stores.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
b33f2800ae
Add implementation for MFC2.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
a2f87f843c
Optimize VRCP* and VRSQ* functions.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
824131db6b
Use a union for RSP vectors to force alignment.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
6faca60054
Start reworking RSP vector loads and stores.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
f1929a056c
Commit AIO's VMACF implementation.
2014-12-24 15:18:59 -05:00
Tyler Stachecki
ae714715fb
Commit AIO's VABS optimization.
2014-12-23 01:13:50 -05:00
Tyler Stachecki
ab8dde80e9
Add AIO's implementation for VMULU.
2014-12-23 01:10:15 -05:00
Tyler Stachecki
3f2329be5b
Fix a bug in VRCP/VRSQ precision selection.
2014-12-22 21:06:17 -05:00
Tyler Stachecki
e52e031ce3
Add implementations for VRSQ, VRSQL, and VRSQH.
2014-12-22 20:47:48 -05:00
Tyler Stachecki
4b6904240e
Add implementations for VRCP, VRCPL, and VRCPH.
2014-12-22 20:29:16 -05:00
Tyler Stachecki
73709f4c45
Add implementation for VCR.
2014-12-22 13:01:03 -05:00
Tyler Stachecki
88310a8104
Add AIO's implementation for VMULF.
2014-12-22 09:50:29 -05:00
Tyler Stachecki
f268795da5
Add implementation for VMRG.
2014-12-21 15:49:44 -05:00
Tyler Stachecki
9f4664a4b6
Add implementation for VADDC.
2014-12-21 15:29:16 -05:00
Tyler Stachecki
a955bf1e2c
Add implementation for VSUBC.
2014-12-21 15:07:00 -05:00
Tyler Stachecki
f199c7bac8
Add implementation for VABS.
2014-12-21 12:59:36 -05:00
Tyler Stachecki
de5b5b0f96
Commit AIO's VSUB optimizations, fix carry/borrow issue.
2014-12-21 12:55:38 -05:00
Tyler Stachecki
0be40f4358
Add implementations for VGE and VLT.
2014-12-21 11:08:00 -05:00
Tyler Stachecki
dc50279609
Add implementations for VEQ and VNE.
2014-12-21 10:39:10 -05:00
Tyler Stachecki
579fb317a8
Formatting/consistency fixes (remove tabs).
2014-12-21 10:20:45 -05:00
Tyler Stachecki
bd899f5034
Unbreak SSE2 builds.
2014-12-21 09:48:01 -05:00
Tyler Stachecki
e1de6cd92d
Add implementations for VCH.
2014-12-21 09:29:58 -05:00
Tyler Stachecki
0c556f5d25
Fix a last minute SSE4.1->SSE2 change.
2014-12-20 17:01:31 -05:00
Tyler Stachecki
145141225e
Add implementations for VCL and CFC2.
2014-12-20 12:27:38 -05:00
Tyler Stachecki
7c83dcb0d3
Prevent GCC from eliding global register var writes.
...
Not sure why GCC was optimizing out these global register variable
writes when FLTO was enabled, but ensure that it does not by using
an inline assembly block.
2014-12-20 10:21:41 -05:00
Tyler Stachecki
affb4bb746
Add a patch job fix for SSE2 RSP builds.
2014-12-19 22:03:25 -05:00
Tyler Stachecki
c72f2c5028
Fix RSP alignment issues once and for all.
2014-12-19 20:03:03 -05:00
Tyler Stachecki
10a5983c0c
Add support for SSE4 FPU acceleration.
...
0d4a5de2f6
is wrong; we can take
advantage of SSE4 rounding intrinsics.
2014-11-16 14:06:34 -05:00
Tyler Stachecki
061a04e216
Change width of fpu_state_t for x86_64.
...
gcc (and probably other compilers) don't like working with 16-bit
types and will zero-extend where needed. Save some overhead and
just store the state as a 32-bit type.
2014-11-15 15:44:04 -05:00