Commit graph

128 commits

Author SHA1 Message Date
Tyler Stachecki
71db976759 Fix a typo in the VMOV implementation. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
bc8300c7de Fix a pair RSP flag-related bugs. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
6d0af5d89a Cleanup SSSE3+ loads and stores. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
ee526c543c Commit AIO's VCR optimizations. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
3a969b2379 Do some general cleanup/optimization. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
fea458e70c Add (partial) implementations for LPV/LUV/SPV/SUV.
Also, cleanup other SSSE3+ accelerated loads and stores.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
b33f2800ae Add implementation for MFC2. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
a2f87f843c Optimize VRCP* and VRSQ* functions. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
824131db6b Use a union for RSP vectors to force alignment. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
6faca60054 Start reworking RSP vector loads and stores. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
f1929a056c Commit AIO's VMACF implementation. 2014-12-24 15:18:59 -05:00
Tyler Stachecki
ae714715fb Commit AIO's VABS optimization. 2014-12-23 01:13:50 -05:00
Tyler Stachecki
ab8dde80e9 Add AIO's implementation for VMULU. 2014-12-23 01:10:15 -05:00
Tyler Stachecki
3f2329be5b Fix a bug in VRCP/VRSQ precision selection. 2014-12-22 21:06:17 -05:00
Tyler Stachecki
e52e031ce3 Add implementations for VRSQ, VRSQL, and VRSQH. 2014-12-22 20:47:48 -05:00
Tyler Stachecki
4b6904240e Add implementations for VRCP, VRCPL, and VRCPH. 2014-12-22 20:29:16 -05:00
Tyler Stachecki
73709f4c45 Add implementation for VCR. 2014-12-22 13:01:03 -05:00
Tyler Stachecki
88310a8104 Add AIO's implementation for VMULF. 2014-12-22 09:50:29 -05:00
Tyler Stachecki
f268795da5 Add implementation for VMRG. 2014-12-21 15:49:44 -05:00
Tyler Stachecki
9f4664a4b6 Add implementation for VADDC. 2014-12-21 15:29:16 -05:00
Tyler Stachecki
a955bf1e2c Add implementation for VSUBC. 2014-12-21 15:07:00 -05:00
Tyler Stachecki
f199c7bac8 Add implementation for VABS. 2014-12-21 12:59:36 -05:00
Tyler Stachecki
de5b5b0f96 Commit AIO's VSUB optimizations, fix carry/borrow issue. 2014-12-21 12:55:38 -05:00
Tyler Stachecki
0be40f4358 Add implementations for VGE and VLT. 2014-12-21 11:08:00 -05:00
Tyler Stachecki
dc50279609 Add implementations for VEQ and VNE. 2014-12-21 10:39:10 -05:00
Tyler Stachecki
579fb317a8 Formatting/consistency fixes (remove tabs). 2014-12-21 10:20:45 -05:00
Tyler Stachecki
bd899f5034 Unbreak SSE2 builds. 2014-12-21 09:48:01 -05:00
Tyler Stachecki
e1de6cd92d Add implementations for VCH. 2014-12-21 09:29:58 -05:00
Tyler Stachecki
0c556f5d25 Fix a last minute SSE4.1->SSE2 change. 2014-12-20 17:01:31 -05:00
Tyler Stachecki
145141225e Add implementations for VCL and CFC2. 2014-12-20 12:27:38 -05:00
Tyler Stachecki
7c83dcb0d3 Prevent GCC from eliding global register var writes.
Not sure why GCC was optimizing out these global register variable
writes when FLTO was enabled, but ensure that it does not by using
an inline assembly block.
2014-12-20 10:21:41 -05:00
Tyler Stachecki
affb4bb746 Add a patch job fix for SSE2 RSP builds. 2014-12-19 22:03:25 -05:00
Tyler Stachecki
c72f2c5028 Fix RSP alignment issues once and for all. 2014-12-19 20:03:03 -05:00
Tyler Stachecki
10a5983c0c Add support for SSE4 FPU acceleration.
0d4a5de2f6 is wrong; we can take
advantage of SSE4 rounding intrinsics.
2014-11-16 14:06:34 -05:00
Tyler Stachecki
061a04e216 Change width of fpu_state_t for x86_64.
gcc (and probably other compilers) don't like working with 16-bit
types and will zero-extend where needed. Save some overhead and
just store the state as a 32-bit type.
2014-11-15 15:44:04 -05:00
Tyler Stachecki
0a9b8c2367 Make read_acc_* return a value.
Instead of writing through a pointer, just return the value.
Thank you, Jared, for pointing out my stupidity.
2014-11-13 19:54:33 -05:00
Tyler Stachecki
33d2e15278 Reduce size of rsp_vload_dmem dynarec code.
We're going to want to instantiate all possible branch targets
ahead of time to avoid SMC penalties, so we want each target to
fit into the smallest block of code possible.
2014-11-10 22:51:33 -05:00
Tyler Stachecki
fc22ab18ba Fix some corner-case bugs in the last commit. 2014-11-10 19:04:23 -05:00
Tyler Stachecki
b4b95d1f21 Fix SS2 RSP vector loads/stores implementation. 2014-11-10 18:32:12 -05:00
Tyler Stachecki
316214d82d (Finally) permit SSE2-only builds.
Add SSE2 codepaths where necessary (even if not complete), while
still allowing the project to be compiled with SSSE3+ intrinsics.
2014-11-10 14:29:13 -05:00
Tyler Stachecki
f66894935b Mark more initialization functions as cold. 2014-11-09 19:11:09 -05:00
Tyler Stachecki
a0f1eb5d7c Move intrinsics to a common location. 2014-11-09 18:51:54 -05:00
Tyler Stachecki
1513f3cac2 arch/x86_64: Prefer _mm_set_s* over _mm_load_s*. 2014-11-09 18:27:14 -05:00
Tyler Stachecki
4cfb7275a9 Fix and optimize rsp_uclamp_acc (once again). 2014-11-08 19:07:08 -05:00
Tyler Stachecki
9f8a9f9d62 Add implementations of VMADH and VMUDH. 2014-11-08 14:01:41 -05:00
Tyler Stachecki
007d72eda1 Add implementations of VMADL and VMADM. 2014-11-08 12:21:06 -05:00
Tyler Stachecki
16a7c434da Fix/optimize the RSP accumulator clamp LO algorithm. 2014-11-05 16:58:59 -05:00
Tyler Stachecki
6a0604eaca Fix the RSP accumulator clamping algorithm. 2014-11-05 15:09:16 -05:00
Tyler Stachecki
b668296589 Add implementations of VADD and VSUB. 2014-11-03 18:06:32 -05:00
Tyler Stachecki
083ad75286 arch/x86_64: Cache RSP accumulator regs in host CPU. 2014-11-03 16:48:38 -05:00