Commit graph

736 commits

Author SHA1 Message Date
Tyler Stachecki
bd899f5034 Unbreak SSE2 builds. 2014-12-21 09:48:01 -05:00
Tyler Stachecki
e1de6cd92d Add implementations for VCH. 2014-12-21 09:29:58 -05:00
Tyler Stachecki
0c556f5d25 Fix a last minute SSE4.1->SSE2 change. 2014-12-20 17:01:31 -05:00
Tyler Stachecki
145141225e Add implementations for VCL and CFC2. 2014-12-20 12:27:38 -05:00
Tyler Stachecki
7c83dcb0d3 Prevent GCC from eliding global register var writes.
Not sure why GCC was optimizing out these global register variable
writes when FLTO was enabled, but ensure that it does not by using
an inline assembly block.
2014-12-20 10:21:41 -05:00
Tyler Stachecki
affb4bb746 Add a patch job fix for SSE2 RSP builds. 2014-12-19 22:03:25 -05:00
Tyler Stachecki
cd9e41e54f Add a list of TODO for the VR4300. 2014-12-19 21:16:18 -05:00
Tyler Stachecki
c72f2c5028 Fix RSP alignment issues once and for all. 2014-12-19 20:03:03 -05:00
Tyler Stachecki
78b4c78757 Add support for cross-compiling with mingw64. 2014-12-18 00:46:56 -05:00
Tyler Stachecki
369d33c2d1 Windows fixes as reported by magumagu. 2014-12-07 10:40:42 -05:00
Tyler Stachecki
8b363895d1 Add missing #include for snprintf.
Thanks, balrog.
2014-11-19 10:42:15 -05:00
Tyler Stachecki
8b45d7eab5 Fix padding around SSE register types.
Really need to stop doing patchjobs and just fix this.
2014-11-16 14:27:43 -05:00
Tyler Stachecki
b1ada90657 Fix incorrect return value on successful exit. 2014-11-16 14:21:28 -05:00
Tyler Stachecki
10a5983c0c Add support for SSE4 FPU acceleration.
0d4a5de2f6 is wrong; we can take
advantage of SSE4 rounding intrinsics.
2014-11-16 14:06:34 -05:00
Tyler Stachecki
9e9114d2fa Cleanup the CMakeLists a little. 2014-11-16 13:35:40 -05:00
Tyler Stachecki
459aed5e8d Generate two binaries.
Generate a 'fast' release binary and a developer binary. The
developer binary contains extra calls that permit debugging and
such things.
2014-11-16 13:32:04 -05:00
Tyler Stachecki
11afa4123d Give os/unix's UI thread a good waxing.
Periodically (~1000x) poll for input instead of waiting for a frame
boundary. Also relinquish the render_lock more aggressively in an
attempt to step out of the way of the simulator.
2014-11-16 11:51:49 -05:00
Tyler Stachecki
c90e55a05d Lock around input reads.
Fix some obvious memory consistency issues.
2014-11-16 10:19:56 -05:00
Tyler Stachecki
c1dc7cba08 Refactor for another major performance boost.
Since the CEN64 core now runs in it's own thread (and doesn't use
the FPU), we can steal the host's FPU state register and not have
to worry about preserving it.

Along with that major overhaul, don't force "extra" features like
simulation statistics and debugging if the user doesn't want them.
Including that code, even when it is not run, mucks with register
allocation or something ever so slightly.
2014-11-15 18:22:20 -05:00
Tyler Stachecki
d17db4cc18 Make sure keep rsp_vect_t aligned to 16 bytes. 2014-11-15 15:58:35 -05:00
Tyler Stachecki
4b806c5601 Remove some "experimental" code that got replaced. 2014-11-15 15:55:26 -05:00
Tyler Stachecki
061a04e216 Change width of fpu_state_t for x86_64.
gcc (and probably other compilers) don't like working with 16-bit
types and will zero-extend where needed. Save some overhead and
just store the state as a 32-bit type.
2014-11-15 15:44:04 -05:00
Tyler Stachecki
172203eb70 Rework VR4300 CP1.
Use switch statements instead of if/else spaghetti to give the
compiler a better idea of what we're trying to do.
2014-11-15 15:40:15 -05:00
Tyler Stachecki
0d4a5de2f6 Remove some comments about SSE4 intrinsics.
Since we have to convert to an integer, as well as round in some
direction, these intrinsics (_mm_ceil_*, _mm_floor_*, _mm_round_*)
aren't of much use to us.
2014-11-15 14:33:43 -05:00
Tyler Stachecki
31443e65c5 Mark another function as cen64_cold. 2014-11-14 22:22:00 -05:00
Tyler Stachecki
01df3de520 Aggressively push more code into the cold section.
We will likely only hit a couple of the slow_cycle functions in
the VR4300 code when we interrupt. Because of this, push everything
just before what will be hit after a data cache fault into the cold
section.
2014-11-14 21:28:34 -05:00
Tyler Stachecki
4d46108cff Fix 8912b4cc50.
Commit 8912b4cc50 was mostly right,
but we still need to make sure we clear the fault type if an IADE
exception really does happen.
2014-11-14 21:11:52 -05:00
Tyler Stachecki
85654a891f Delay computing accurate value of count.
Instead, just bump the counter and don't track cycle count. When
it comes time to use count, shift it to the right by one instead.
2014-11-14 21:04:03 -05:00
Tyler Stachecki
8912b4cc50 IC stage should never fault... I think. 2014-11-14 20:39:33 -05:00
Tyler Stachecki
0a9b8c2367 Make read_acc_* return a value.
Instead of writing through a pointer, just return the value.
Thank you, Jared, for pointing out my stupidity.
2014-11-13 19:54:33 -05:00
Tyler Stachecki
6e474a3251 Implement a neat optimization in the VR4300 core.
Perf reported a window where the backend was busy, and the frontend
was idle. Take advantage of the situation by inserting a branch that
has the potential to filter out (a lot of) instructions from the
backend when it's clogged. This works to our advantage, because more
often than not we aren't executing FPU instructions, or we execute
the FPU instructions in small batches.
2014-11-12 14:06:24 -05:00
Tyler Stachecki
e4fbc9831d Increase VR4300_BUSY_WAIT_DETECTION performance.
Don't split branch functions across "normal" and "busy wait detect"
variants; just have everything use the "busy wait detect" variant.
2014-11-12 12:56:42 -05:00
Tyler Stachecki
a00af95ce1 os/unix: Remove stray character from window title. 2014-11-12 07:38:27 -05:00
Tyler Stachecki
0fb96ebedd Revamp the CMake generator file.
Add option to specify architecture support (SSE2, SSSE3, etc.)
for each compiler supported. Update UI window title to indicate
architecture folder and support.
2014-11-11 22:38:09 -05:00
Tyler Stachecki
7ce889135c Catch SIGINT when passed -nointerface. 2014-11-11 17:44:43 -05:00
Tyler Stachecki
538e344442 Add a -nointerface switch, remove spare '\t's. 2014-11-11 17:21:25 -05:00
Tyler Stachecki
71a126e425 Don't hangup the UI when the VI doesn't ack. 2014-11-11 17:02:21 -05:00
Tyler Stachecki
33d2e15278 Reduce size of rsp_vload_dmem dynarec code.
We're going to want to instantiate all possible branch targets
ahead of time to avoid SMC penalties, so we want each target to
fit into the smallest block of code possible.
2014-11-10 22:51:33 -05:00
Tyler Stachecki
fc22ab18ba Fix some corner-case bugs in the last commit. 2014-11-10 19:04:23 -05:00
Tyler Stachecki
b4b95d1f21 Fix SS2 RSP vector loads/stores implementation. 2014-11-10 18:32:12 -05:00
Tyler Stachecki
2794b1c2a0 Don't flag os/main as cold. 2014-11-10 14:40:05 -05:00
Tyler Stachecki
316214d82d (Finally) permit SSE2-only builds.
Add SSE2 codepaths where necessary (even if not complete), while
still allowing the project to be compiled with SSSE3+ intrinsics.
2014-11-10 14:29:13 -05:00
Tyler Stachecki
3a24a67f1f Fix poor SSE2-based RSP performance. 2014-11-10 11:02:57 -05:00
Tyler Stachecki
f66894935b Mark more initialization functions as cold. 2014-11-09 19:11:09 -05:00
Tyler Stachecki
f70c1a5fc5 More _mm_set_s* over _mm_load_s* stuff. 2014-11-09 18:54:07 -05:00
Tyler Stachecki
a0f1eb5d7c Move intrinsics to a common location. 2014-11-09 18:51:54 -05:00
Tyler Stachecki
ffe40c4c20 Mark VR4300 exception handlers as cold. 2014-11-09 18:41:09 -05:00
Tyler Stachecki
1513f3cac2 arch/x86_64: Prefer _mm_set_s* over _mm_load_s*. 2014-11-09 18:27:14 -05:00
Tyler Stachecki
e3d1934855 Enable non-standard-conforming optimizations. 2014-11-09 17:49:58 -05:00
Tyler Stachecki
9b3ce2134b Aggressively optimize loops when using GCC.
Mainly useful for tight RSP DMA copy loops.
2014-11-09 17:26:11 -05:00