cen64

mirror of https://github.com/n64dev/cen64.git synced 2025-04-02 10:31:54 -04:00

Author	SHA1	Message	Date
Tyler Stachecki	bd899f5034	Unbreak SSE2 builds.	2014-12-21 09:48:01 -05:00
Tyler Stachecki	e1de6cd92d	Add implementations for VCH.	2014-12-21 09:29:58 -05:00
Tyler Stachecki	0c556f5d25	Fix a last minute SSE4.1->SSE2 change.	2014-12-20 17:01:31 -05:00
Tyler Stachecki	145141225e	Add implementations for VCL and CFC2.	2014-12-20 12:27:38 -05:00
Tyler Stachecki	7c83dcb0d3	Prevent GCC from eliding global register var writes. Not sure why GCC was optimizing out these global register variable writes when FLTO was enabled, but ensure that it does not by using an inline assembly block.	2014-12-20 10:21:41 -05:00
Tyler Stachecki	affb4bb746	Add a patch job fix for SSE2 RSP builds.	2014-12-19 22:03:25 -05:00
Tyler Stachecki	cd9e41e54f	Add a list of TODO for the VR4300.	2014-12-19 21:16:18 -05:00
Tyler Stachecki	c72f2c5028	Fix RSP alignment issues once and for all.	2014-12-19 20:03:03 -05:00
Tyler Stachecki	78b4c78757	Add support for cross-compiling with mingw64.	2014-12-18 00:46:56 -05:00
Tyler Stachecki	369d33c2d1	Windows fixes as reported by magumagu.	2014-12-07 10:40:42 -05:00
Tyler Stachecki	8b363895d1	Add missing #include for snprintf. Thanks, balrog.	2014-11-19 10:42:15 -05:00
Tyler Stachecki	8b45d7eab5	Fix padding around SSE register types. Really need to stop doing patchjobs and just fix this.	2014-11-16 14:27:43 -05:00
Tyler Stachecki	b1ada90657	Fix incorrect return value on successful exit.	2014-11-16 14:21:28 -05:00
Tyler Stachecki	10a5983c0c	Add support for SSE4 FPU acceleration. `0d4a5de2f6` is wrong; we can take advantage of SSE4 rounding intrinsics.	2014-11-16 14:06:34 -05:00
Tyler Stachecki	9e9114d2fa	Cleanup the CMakeLists a little.	2014-11-16 13:35:40 -05:00
Tyler Stachecki	459aed5e8d	Generate two binaries. Generate a 'fast' release binary and a developer binary. The developer binary contains extra calls that permit debugging and such things.	2014-11-16 13:32:04 -05:00
Tyler Stachecki	11afa4123d	Give os/unix's UI thread a good waxing. Periodically (~1000x) poll for input instead of waiting for a frame boundary. Also relinquish the render_lock more aggressively in an attempt to step out of the way of the simulator.	2014-11-16 11:51:49 -05:00
Tyler Stachecki	c90e55a05d	Lock around input reads. Fix some obvious memory consistency issues.	2014-11-16 10:19:56 -05:00
Tyler Stachecki	c1dc7cba08	Refactor for another major performance boost. Since the CEN64 core now runs in it's own thread (and doesn't use the FPU), we can steal the host's FPU state register and not have to worry about preserving it. Along with that major overhaul, don't force "extra" features like simulation statistics and debugging if the user doesn't want them. Including that code, even when it is not run, mucks with register allocation or something ever so slightly.	2014-11-15 18:22:20 -05:00
Tyler Stachecki	d17db4cc18	Make sure keep rsp_vect_t aligned to 16 bytes.	2014-11-15 15:58:35 -05:00
Tyler Stachecki	4b806c5601	Remove some "experimental" code that got replaced.	2014-11-15 15:55:26 -05:00
Tyler Stachecki	061a04e216	Change width of fpu_state_t for x86_64. gcc (and probably other compilers) don't like working with 16-bit types and will zero-extend where needed. Save some overhead and just store the state as a 32-bit type.	2014-11-15 15:44:04 -05:00
Tyler Stachecki	172203eb70	Rework VR4300 CP1. Use switch statements instead of if/else spaghetti to give the compiler a better idea of what we're trying to do.	2014-11-15 15:40:15 -05:00
Tyler Stachecki	0d4a5de2f6	Remove some comments about SSE4 intrinsics. Since we have to convert to an integer, as well as round in some direction, these intrinsics (_mm_ceil_, _mm_floor_, _mm_round_*) aren't of much use to us.	2014-11-15 14:33:43 -05:00
Tyler Stachecki	31443e65c5	Mark another function as cen64_cold.	2014-11-14 22:22:00 -05:00
Tyler Stachecki	01df3de520	Aggressively push more code into the cold section. We will likely only hit a couple of the slow_cycle functions in the VR4300 code when we interrupt. Because of this, push everything just before what will be hit after a data cache fault into the cold section.	2014-11-14 21:28:34 -05:00
Tyler Stachecki	4d46108cff	Fix `8912b4cc50`. Commit `8912b4cc50` was mostly right, but we still need to make sure we clear the fault type if an IADE exception really does happen.	2014-11-14 21:11:52 -05:00
Tyler Stachecki	85654a891f	Delay computing accurate value of count. Instead, just bump the counter and don't track cycle count. When it comes time to use count, shift it to the right by one instead.	2014-11-14 21:04:03 -05:00
Tyler Stachecki	8912b4cc50	IC stage should never fault... I think.	2014-11-14 20:39:33 -05:00
Tyler Stachecki	0a9b8c2367	Make read_acc_* return a value. Instead of writing through a pointer, just return the value. Thank you, Jared, for pointing out my stupidity.	2014-11-13 19:54:33 -05:00
Tyler Stachecki	6e474a3251	Implement a neat optimization in the VR4300 core. Perf reported a window where the backend was busy, and the frontend was idle. Take advantage of the situation by inserting a branch that has the potential to filter out (a lot of) instructions from the backend when it's clogged. This works to our advantage, because more often than not we aren't executing FPU instructions, or we execute the FPU instructions in small batches.	2014-11-12 14:06:24 -05:00
Tyler Stachecki	e4fbc9831d	Increase VR4300_BUSY_WAIT_DETECTION performance. Don't split branch functions across "normal" and "busy wait detect" variants; just have everything use the "busy wait detect" variant.	2014-11-12 12:56:42 -05:00
Tyler Stachecki	a00af95ce1	os/unix: Remove stray character from window title.	2014-11-12 07:38:27 -05:00
Tyler Stachecki	0fb96ebedd	Revamp the CMake generator file. Add option to specify architecture support (SSE2, SSSE3, etc.) for each compiler supported. Update UI window title to indicate architecture folder and support.	2014-11-11 22:38:09 -05:00
Tyler Stachecki	7ce889135c	Catch SIGINT when passed -nointerface.	2014-11-11 17:44:43 -05:00
Tyler Stachecki	538e344442	Add a -nointerface switch, remove spare '\t's.	2014-11-11 17:21:25 -05:00
Tyler Stachecki	71a126e425	Don't hangup the UI when the VI doesn't ack.	2014-11-11 17:02:21 -05:00
Tyler Stachecki	33d2e15278	Reduce size of rsp_vload_dmem dynarec code. We're going to want to instantiate all possible branch targets ahead of time to avoid SMC penalties, so we want each target to fit into the smallest block of code possible.	2014-11-10 22:51:33 -05:00
Tyler Stachecki	fc22ab18ba	Fix some corner-case bugs in the last commit.	2014-11-10 19:04:23 -05:00
Tyler Stachecki	b4b95d1f21	Fix SS2 RSP vector loads/stores implementation.	2014-11-10 18:32:12 -05:00
Tyler Stachecki	2794b1c2a0	Don't flag os/main as cold.	2014-11-10 14:40:05 -05:00
Tyler Stachecki	316214d82d	(Finally) permit SSE2-only builds. Add SSE2 codepaths where necessary (even if not complete), while still allowing the project to be compiled with SSSE3+ intrinsics.	2014-11-10 14:29:13 -05:00
Tyler Stachecki	3a24a67f1f	Fix poor SSE2-based RSP performance.	2014-11-10 11:02:57 -05:00
Tyler Stachecki	f66894935b	Mark more initialization functions as cold.	2014-11-09 19:11:09 -05:00
Tyler Stachecki	f70c1a5fc5	More _mm_set_s* over _mm_load_s* stuff.	2014-11-09 18:54:07 -05:00
Tyler Stachecki	a0f1eb5d7c	Move intrinsics to a common location.	2014-11-09 18:51:54 -05:00
Tyler Stachecki	ffe40c4c20	Mark VR4300 exception handlers as cold.	2014-11-09 18:41:09 -05:00
Tyler Stachecki	1513f3cac2	arch/x86_64: Prefer _mm_set_s* over _mm_load_s*.	2014-11-09 18:27:14 -05:00
Tyler Stachecki	e3d1934855	Enable non-standard-conforming optimizations.	2014-11-09 17:49:58 -05:00
Tyler Stachecki	9b3ce2134b	Aggressively optimize loops when using GCC. Mainly useful for tight RSP DMA copy loops.	2014-11-09 17:26:11 -05:00

... 6 7 8 9 10 ...

736 commits