Commit graph

953 commits

Author SHA1 Message Date
Tyler Stachecki
ab8dde80e9 Add AIO's implementation for VMULU. 2014-12-23 01:10:15 -05:00
Tyler Stachecki
2ee295a671 Fix RSP DMEM accesses.
Up until now, the simulator assumed that DMEM accesses had to be
aligned (similarly to the VR4300). This is not actually the case,
so allow scalar memory access to arbitrary DMEM addresses.
2014-12-22 23:53:13 -05:00
Tyler Stachecki
3f2329be5b Fix a bug in VRCP/VRSQ precision selection. 2014-12-22 21:06:17 -05:00
Tyler Stachecki
e52e031ce3 Add implementations for VRSQ, VRSQL, and VRSQH. 2014-12-22 20:47:48 -05:00
Tyler Stachecki
4b6904240e Add implementations for VRCP, VRCPL, and VRCPH. 2014-12-22 20:29:16 -05:00
Tyler Stachecki
73709f4c45 Add implementation for VCR. 2014-12-22 13:01:03 -05:00
Tyler Stachecki
d3938056a4 Add a CONTRIBUTORS file. 2014-12-22 11:32:20 -05:00
Tyler Stachecki
88310a8104 Add AIO's implementation for VMULF. 2014-12-22 09:50:29 -05:00
Tyler Stachecki
f268795da5 Add implementation for VMRG. 2014-12-21 15:49:44 -05:00
Tyler Stachecki
9f4664a4b6 Add implementation for VADDC. 2014-12-21 15:29:16 -05:00
Tyler Stachecki
a955bf1e2c Add implementation for VSUBC. 2014-12-21 15:07:00 -05:00
Tyler Stachecki
bea9f197c0 Upgrade POSIX_C_SOURCE so we get snprintf. 2014-12-21 14:04:08 -05:00
Tyler Stachecki
f199c7bac8 Add implementation for VABS. 2014-12-21 12:59:36 -05:00
Tyler Stachecki
de5b5b0f96 Commit AIO's VSUB optimizations, fix carry/borrow issue. 2014-12-21 12:55:38 -05:00
Tyler Stachecki
0be40f4358 Add implementations for VGE and VLT. 2014-12-21 11:08:00 -05:00
Tyler Stachecki
dc50279609 Add implementations for VEQ and VNE. 2014-12-21 10:39:10 -05:00
Tyler Stachecki
579fb317a8 Formatting/consistency fixes (remove tabs). 2014-12-21 10:20:45 -05:00
Tyler Stachecki
bd899f5034 Unbreak SSE2 builds. 2014-12-21 09:48:01 -05:00
Tyler Stachecki
e1de6cd92d Add implementations for VCH. 2014-12-21 09:29:58 -05:00
Tyler Stachecki
0c556f5d25 Fix a last minute SSE4.1->SSE2 change. 2014-12-20 17:01:31 -05:00
Tyler Stachecki
145141225e Add implementations for VCL and CFC2. 2014-12-20 12:27:38 -05:00
Tyler Stachecki
7c83dcb0d3 Prevent GCC from eliding global register var writes.
Not sure why GCC was optimizing out these global register variable
writes when FLTO was enabled, but ensure that it does not by using
an inline assembly block.
2014-12-20 10:21:41 -05:00
Tyler Stachecki
affb4bb746 Add a patch job fix for SSE2 RSP builds. 2014-12-19 22:03:25 -05:00
Tyler Stachecki
cd9e41e54f Add a list of TODO for the VR4300. 2014-12-19 21:16:18 -05:00
Tyler Stachecki
c72f2c5028 Fix RSP alignment issues once and for all. 2014-12-19 20:03:03 -05:00
Tyler Stachecki
78b4c78757 Add support for cross-compiling with mingw64. 2014-12-18 00:46:56 -05:00
Tyler Stachecki
369d33c2d1 Windows fixes as reported by magumagu. 2014-12-07 10:40:42 -05:00
Tyler Stachecki
8b363895d1 Add missing #include for snprintf.
Thanks, balrog.
2014-11-19 10:42:15 -05:00
Tyler Stachecki
8b45d7eab5 Fix padding around SSE register types.
Really need to stop doing patchjobs and just fix this.
2014-11-16 14:27:43 -05:00
Tyler Stachecki
b1ada90657 Fix incorrect return value on successful exit. 2014-11-16 14:21:28 -05:00
Tyler Stachecki
10a5983c0c Add support for SSE4 FPU acceleration.
0d4a5de2f6 is wrong; we can take
advantage of SSE4 rounding intrinsics.
2014-11-16 14:06:34 -05:00
Tyler Stachecki
9e9114d2fa Cleanup the CMakeLists a little. 2014-11-16 13:35:40 -05:00
Tyler Stachecki
459aed5e8d Generate two binaries.
Generate a 'fast' release binary and a developer binary. The
developer binary contains extra calls that permit debugging and
such things.
2014-11-16 13:32:04 -05:00
Tyler Stachecki
11afa4123d Give os/unix's UI thread a good waxing.
Periodically (~1000x) poll for input instead of waiting for a frame
boundary. Also relinquish the render_lock more aggressively in an
attempt to step out of the way of the simulator.
2014-11-16 11:51:49 -05:00
Tyler Stachecki
c90e55a05d Lock around input reads.
Fix some obvious memory consistency issues.
2014-11-16 10:19:56 -05:00
Tyler Stachecki
c1dc7cba08 Refactor for another major performance boost.
Since the CEN64 core now runs in it's own thread (and doesn't use
the FPU), we can steal the host's FPU state register and not have
to worry about preserving it.

Along with that major overhaul, don't force "extra" features like
simulation statistics and debugging if the user doesn't want them.
Including that code, even when it is not run, mucks with register
allocation or something ever so slightly.
2014-11-15 18:22:20 -05:00
Tyler Stachecki
d17db4cc18 Make sure keep rsp_vect_t aligned to 16 bytes. 2014-11-15 15:58:35 -05:00
Tyler Stachecki
4b806c5601 Remove some "experimental" code that got replaced. 2014-11-15 15:55:26 -05:00
Tyler Stachecki
061a04e216 Change width of fpu_state_t for x86_64.
gcc (and probably other compilers) don't like working with 16-bit
types and will zero-extend where needed. Save some overhead and
just store the state as a 32-bit type.
2014-11-15 15:44:04 -05:00
Tyler Stachecki
172203eb70 Rework VR4300 CP1.
Use switch statements instead of if/else spaghetti to give the
compiler a better idea of what we're trying to do.
2014-11-15 15:40:15 -05:00
Tyler Stachecki
0d4a5de2f6 Remove some comments about SSE4 intrinsics.
Since we have to convert to an integer, as well as round in some
direction, these intrinsics (_mm_ceil_*, _mm_floor_*, _mm_round_*)
aren't of much use to us.
2014-11-15 14:33:43 -05:00
Tyler Stachecki
31443e65c5 Mark another function as cen64_cold. 2014-11-14 22:22:00 -05:00
Tyler Stachecki
01df3de520 Aggressively push more code into the cold section.
We will likely only hit a couple of the slow_cycle functions in
the VR4300 code when we interrupt. Because of this, push everything
just before what will be hit after a data cache fault into the cold
section.
2014-11-14 21:28:34 -05:00
Tyler Stachecki
4d46108cff Fix 8912b4cc50.
Commit 8912b4cc50 was mostly right,
but we still need to make sure we clear the fault type if an IADE
exception really does happen.
2014-11-14 21:11:52 -05:00
Tyler Stachecki
85654a891f Delay computing accurate value of count.
Instead, just bump the counter and don't track cycle count. When
it comes time to use count, shift it to the right by one instead.
2014-11-14 21:04:03 -05:00
Tyler Stachecki
8912b4cc50 IC stage should never fault... I think. 2014-11-14 20:39:33 -05:00
Tyler Stachecki
0a9b8c2367 Make read_acc_* return a value.
Instead of writing through a pointer, just return the value.
Thank you, Jared, for pointing out my stupidity.
2014-11-13 19:54:33 -05:00
Tyler Stachecki
6e474a3251 Implement a neat optimization in the VR4300 core.
Perf reported a window where the backend was busy, and the frontend
was idle. Take advantage of the situation by inserting a branch that
has the potential to filter out (a lot of) instructions from the
backend when it's clogged. This works to our advantage, because more
often than not we aren't executing FPU instructions, or we execute
the FPU instructions in small batches.
2014-11-12 14:06:24 -05:00
Tyler Stachecki
e4fbc9831d Increase VR4300_BUSY_WAIT_DETECTION performance.
Don't split branch functions across "normal" and "busy wait detect"
variants; just have everything use the "busy wait detect" variant.
2014-11-12 12:56:42 -05:00
Tyler Stachecki
a00af95ce1 os/unix: Remove stray character from window title. 2014-11-12 07:38:27 -05:00