Tyler Stachecki
71db976759
Fix a typo in the VMOV implementation.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
bc8300c7de
Fix a pair RSP flag-related bugs.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
6d0af5d89a
Cleanup SSSE3+ loads and stores.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
ee526c543c
Commit AIO's VCR optimizations.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
3a969b2379
Do some general cleanup/optimization.
2014-12-26 14:19:46 -05:00
Tyler Stachecki
fea458e70c
Add (partial) implementations for LPV/LUV/SPV/SUV.
...
Also, cleanup other SSSE3+ accelerated loads and stores.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
b33f2800ae
Add implementation for MFC2.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
a2f87f843c
Optimize VRCP* and VRSQ* functions.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
824131db6b
Use a union for RSP vectors to force alignment.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
6faca60054
Start reworking RSP vector loads and stores.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
f1929a056c
Commit AIO's VMACF implementation.
2014-12-24 15:18:59 -05:00
Tyler Stachecki
ae714715fb
Commit AIO's VABS optimization.
2014-12-23 01:13:50 -05:00
Tyler Stachecki
ab8dde80e9
Add AIO's implementation for VMULU.
2014-12-23 01:10:15 -05:00
Tyler Stachecki
3f2329be5b
Fix a bug in VRCP/VRSQ precision selection.
2014-12-22 21:06:17 -05:00
Tyler Stachecki
e52e031ce3
Add implementations for VRSQ, VRSQL, and VRSQH.
2014-12-22 20:47:48 -05:00
Tyler Stachecki
4b6904240e
Add implementations for VRCP, VRCPL, and VRCPH.
2014-12-22 20:29:16 -05:00
Tyler Stachecki
73709f4c45
Add implementation for VCR.
2014-12-22 13:01:03 -05:00
Tyler Stachecki
88310a8104
Add AIO's implementation for VMULF.
2014-12-22 09:50:29 -05:00
Tyler Stachecki
f268795da5
Add implementation for VMRG.
2014-12-21 15:49:44 -05:00
Tyler Stachecki
9f4664a4b6
Add implementation for VADDC.
2014-12-21 15:29:16 -05:00
Tyler Stachecki
a955bf1e2c
Add implementation for VSUBC.
2014-12-21 15:07:00 -05:00
Tyler Stachecki
f199c7bac8
Add implementation for VABS.
2014-12-21 12:59:36 -05:00
Tyler Stachecki
de5b5b0f96
Commit AIO's VSUB optimizations, fix carry/borrow issue.
2014-12-21 12:55:38 -05:00
Tyler Stachecki
0be40f4358
Add implementations for VGE and VLT.
2014-12-21 11:08:00 -05:00
Tyler Stachecki
dc50279609
Add implementations for VEQ and VNE.
2014-12-21 10:39:10 -05:00
Tyler Stachecki
579fb317a8
Formatting/consistency fixes (remove tabs).
2014-12-21 10:20:45 -05:00
Tyler Stachecki
bd899f5034
Unbreak SSE2 builds.
2014-12-21 09:48:01 -05:00
Tyler Stachecki
e1de6cd92d
Add implementations for VCH.
2014-12-21 09:29:58 -05:00
Tyler Stachecki
0c556f5d25
Fix a last minute SSE4.1->SSE2 change.
2014-12-20 17:01:31 -05:00
Tyler Stachecki
145141225e
Add implementations for VCL and CFC2.
2014-12-20 12:27:38 -05:00
Tyler Stachecki
7c83dcb0d3
Prevent GCC from eliding global register var writes.
...
Not sure why GCC was optimizing out these global register variable
writes when FLTO was enabled, but ensure that it does not by using
an inline assembly block.
2014-12-20 10:21:41 -05:00
Tyler Stachecki
affb4bb746
Add a patch job fix for SSE2 RSP builds.
2014-12-19 22:03:25 -05:00
Tyler Stachecki
c72f2c5028
Fix RSP alignment issues once and for all.
2014-12-19 20:03:03 -05:00
Tyler Stachecki
10a5983c0c
Add support for SSE4 FPU acceleration.
...
0d4a5de2f6
is wrong; we can take
advantage of SSE4 rounding intrinsics.
2014-11-16 14:06:34 -05:00
Tyler Stachecki
061a04e216
Change width of fpu_state_t for x86_64.
...
gcc (and probably other compilers) don't like working with 16-bit
types and will zero-extend where needed. Save some overhead and
just store the state as a 32-bit type.
2014-11-15 15:44:04 -05:00
Tyler Stachecki
0a9b8c2367
Make read_acc_* return a value.
...
Instead of writing through a pointer, just return the value.
Thank you, Jared, for pointing out my stupidity.
2014-11-13 19:54:33 -05:00
Tyler Stachecki
33d2e15278
Reduce size of rsp_vload_dmem dynarec code.
...
We're going to want to instantiate all possible branch targets
ahead of time to avoid SMC penalties, so we want each target to
fit into the smallest block of code possible.
2014-11-10 22:51:33 -05:00
Tyler Stachecki
fc22ab18ba
Fix some corner-case bugs in the last commit.
2014-11-10 19:04:23 -05:00
Tyler Stachecki
b4b95d1f21
Fix SS2 RSP vector loads/stores implementation.
2014-11-10 18:32:12 -05:00
Tyler Stachecki
316214d82d
(Finally) permit SSE2-only builds.
...
Add SSE2 codepaths where necessary (even if not complete), while
still allowing the project to be compiled with SSSE3+ intrinsics.
2014-11-10 14:29:13 -05:00
Tyler Stachecki
f66894935b
Mark more initialization functions as cold.
2014-11-09 19:11:09 -05:00
Tyler Stachecki
a0f1eb5d7c
Move intrinsics to a common location.
2014-11-09 18:51:54 -05:00
Tyler Stachecki
1513f3cac2
arch/x86_64: Prefer _mm_set_s* over _mm_load_s*.
2014-11-09 18:27:14 -05:00
Tyler Stachecki
4cfb7275a9
Fix and optimize rsp_uclamp_acc (once again).
2014-11-08 19:07:08 -05:00
Tyler Stachecki
9f8a9f9d62
Add implementations of VMADH and VMUDH.
2014-11-08 14:01:41 -05:00
Tyler Stachecki
007d72eda1
Add implementations of VMADL and VMADM.
2014-11-08 12:21:06 -05:00
Tyler Stachecki
16a7c434da
Fix/optimize the RSP accumulator clamp LO algorithm.
2014-11-05 16:58:59 -05:00
Tyler Stachecki
6a0604eaca
Fix the RSP accumulator clamping algorithm.
2014-11-05 15:09:16 -05:00
Tyler Stachecki
b668296589
Add implementations of VADD and VSUB.
2014-11-03 18:06:32 -05:00
Tyler Stachecki
083ad75286
arch/x86_64: Cache RSP accumulator regs in host CPU.
2014-11-03 16:48:38 -05:00