Emulation/mupen64plus-rsp-cxd4

mirror of https://github.com/mupen64plus/mupen64plus-rsp-cxd4.git synced 2025-04-02 10:51:55 -04:00

Author	SHA1	Message	Date
Richard Goedeken	06601cf6f5	fix GCC10 error	2020-05-26 20:59:30 -07:00
Richard Goedeken	e49f1fe448	Merge pull request #46 from belegdol/master Sync with latest upstream changes	2020-04-21 17:19:09 -07:00
Gillou68310	7b99972824	Migrate to VS2017	2019-11-13 17:14:06 +01:00
Julian Sikorski	e4ae22295e	Merge remote-tracking branch 'upstream/master'	2019-07-15 20:51:11 +02:00
Iconoclast	2ea5951d80	Regulate undefined and defined states of RSP registers on boot. Now with the correct file modification date set. :)	2018-12-19 01:12:14 -05:00
Iconoclast	b3f3736b54	fixed unused symbol warnings	2018-12-18 19:51:51 -05:00
Iconoclast	143911c8e8	VMOV from VT[de], not VT[e]. Fixes #21. In the face of all adversity to other sources indicating that the four-bit shuffling element specifier is recycled as a selector for the source element from VT, the only way to pass krom's hardware tests on the VMOV operation with operands illegal to standard RSP assembler was to replace this notion with the seemingly oversimplified read from `de` instead of `e`, even though that specifier is already in use as the selector for which destination slice to write to and not just read from. Despite being removed from any references in the corresponding translation unit's functional implementation, the four-bit element shuffling mask is still in use as with all other vector operations for pre-shuffling VT[] before jumping into the vector operation interpreter function pointer table. In addition, the MovIn register is also half-emulated. It is not maintained as a global state machine attribute and only stores the final, hardware-accurate result that was already going to be copied into VD[] anyway rather than the preconceived result of a direct copy from VT[e].	2018-12-18 19:16:34 -05:00
Iconoclast	e3c7f46090	refined optimization from `bf7c98f` to account for very high dividends Fixes #19. Disabling the optimized code is perhaps a temporary measure, but the more readable code under the #else clause should absolutely be kept. The optimized version for 2's complement machines has however also been patched with a fix in case it becomes desirable to go back to enabling it for substantial speed gains.	2018-11-27 11:34:38 -05:00
Iconoclast	24195d94bf	fixed a typo from `c42ac84` in VCH's complements conversion Resolves #18.	2018-11-26 23:34:56 -05:00
Iconoclast	1f7c9fdc0f	fixed regression from fixing VRCPL and VRSQL Sign-extension is correct but only for single-precision reciprocal calculations. Double-precision divides should still continue to mask in the zero-extended low 16 bits of the determined vector register slice if the previously executed divide instruction prepared a double-precision result rather than defining a single-precision one.	2018-11-25 17:35:40 -05:00
Zapeth	11acc78f6e	Fix VRCPL and VRSQL ops Removed the unsigned cast for DivIn, now passes all tests of this test rom -> https://github.com/PeterLemon/N64/tree/master/RSPTest/CP2/VRCPL	2018-11-24 22:16:53 +01:00
Zapeth	5b17225175	Merge branch 'master' of https://github.com/cxd4/rsp	2018-08-19 14:20:38 +02:00
Iconoclast	cecd9976e8	optimized _mm_cmplt_epu16() composite method A: #define _mm_cmplt_epu16(m, n) _mm_cmpgt_epu16(n, m) define _mm_cmpgt_epu16(m, n) _mm_andnot_si128(\ _mm_cmpeq_epi16(m, n), _mm_cmple_epu16(n, m)\ ) define _mm_cmple_epu16(m, n) _mm_cmpeq_epi16(\ _mm_subs_epu16(m, n), _mm_setzero_si128() ) multiply.o: 3,524 bytes; multiply.s: 9,883 bytes method B: #define _mm_cmplt_epu16(m, n) _mm_cmplt_epi16(\ _mm_xor_si128(m, _mm_setmin_epi16()), _mm_xor_si128(n, _mm_setmin_epi16())\ ) define _mm_setmin_epi16() _mm_slli_epi16(_mm_allones_si128(), 15) multiply.o: 3,504 bytes; multiply.s: 9,732 bytes	2018-03-18 20:57:51 -04:00
Iconoclast	ec3b55b48b	syntactical nits	2018-03-18 18:47:07 -04:00
Iconoclast	8857d37876	Count loop iterations with unsigned int, not int. Although functionally there is no difference (when just looping vector elements from 0 to 7) between using a signed int or an unsigned int, repeatedly seeing an inconsistent mix in usage between the two across different vector functions has been an ongoing distraction for years. It should be the same everywhere, and between signed int and unsigned int, unsigned int is the type which always fits within size_t from stddef.h, the safe type for memory pointers and dereference indices.	2018-03-18 18:19:02 -04:00
Iconoclast	af06eddbdd	fixed paste fail	2018-03-18 17:39:35 -04:00
Iconoclast	81c6bd1652	optimized MAC overflow carry when subtracting by -1	2018-03-18 17:15:29 -04:00
Iconoclast	b9e6b43ce5	vectorized VMACU vmacu_old.asm function has 99 instructions. vmacu_new.asm function has 50 instructions.	2018-03-18 10:28:13 -04:00
Iconoclast	cc4a5bb619	vectorized VMACF New VMACF with manually written SSE2 is 45 instructions. Old VMACF with auto-vectorized C code was 91 instructions.	2018-03-17 21:56:20 -04:00
Francisco Zurita	cc6b8833e3	Add libretro NEON optimizations credits: https://github.com/libretro/parallel-n64/tree/master/mupen64plus-rsp-cxd4	2017-03-04 23:36:21 -05:00
Francisco Zurita	e86432df61	Update to latest CXD4	2016-07-28 08:27:07 -04:00
	34f17d1615	fixed rest of the set-but-never-used warnings	2016-03-23 23:52:01 -04:00
	2d04d3660f	fixed remaining strict -Wshadow warning messages	2016-03-23 22:39:57 -04:00
	e9edb921cf	warning: declaration of inst shadows a global declaration [-Wshadow]	2016-03-23 22:31:38 -04:00
	88b125f6ab	warning: overflow in implicit constant conversion [-Woverflow]	2016-03-05 17:14:06 -05:00
unknown	7d9a42c5ff	Prevent in-line expansion of function `do_div()`. This is either for good or just temporary. It depends how much performance is lost from having to call the NOINLINE function, but as this is the actual source of speed hits for the divide operations I find it all that much easier to benchmark it when it is not getting in-lined. Furthermore, it's usually way low at the bottom of the function hot-spot lists anyway, so I'd rather save my 1 KB of DLL file size than worry about premature optimization for a function that needs more thorough benchmark testing anyway.	2015-12-12 18:21:19 -05:00
unknown	a1c53981a4	Make sure the RCP division ROM constants parse as unsigned.	2015-12-12 18:18:38 -05:00
unknown	71356b752a	deleted VMACQ from the function table	2015-11-30 23:15:06 -05:00
unknown	e79c29a8cf	Disable vector operation name-mangling macros for WIN32.	2015-11-30 23:09:29 -05:00
unknown	4f251012b4	Try, yet again, to make GitHub not parse divide.h as C++. At any rate, the new `static` storage class is advantageous for these divide unit state machine globals.	2015-11-29 21:10:00 -05:00
unknown	4e31a8c6a8	Try to stop GitHub from parsing this header as C++.	2015-11-28 17:05:40 -05:00
unknown	f5032094dd	accurate VSAR IW decoding of valid, scalar-based elements	2015-11-27 20:31:09 -05:00
unknown	d807c78226	some trivial clean-ups	2015-11-27 19:40:39 -05:00
Gillou68310	737c5e5bed	Fixed uninitialized local variable	2015-11-10 13:33:31 +01:00
Gillou68310	8796295a2c	Merge commit '73232513e7889c82f86fd77f81ac6a060fe7d828'	2015-11-10 11:57:18 +01:00
unknown	d68bcc4dc7	Remove a (long since) outdated prototype. This code was back when I wanted a central function for shuffling the vectors, only when all the COP2 vector op-codes had the shuffling of VR[vt] done locally within them. Since shuffling is now done within the COP2 dispatch before the function call table--global to all the vector instructions--there is no more need to have this prototype for a central function. That code was probably removed over 2 years ago. This also fixes some surviving 64-bit warnings with PIC linkage on Windows, reported by tony971, that I missed.	2015-08-19 20:38:18 -04:00
unknown	cd7c41482a	For better PIC linkage, remove SHUFFLE_VECTOR. Also got rid of the SSE2 code for shuffling. It takes too much extra byte code in the main interpreter instruction cache and requires an extra branch anyway, and an SSSE3 solution would still require at least 3 such large SIMD instructions anyway. So let's see if we can't safely overhaul this without a speed drop.	2015-08-17 11:09:01 -04:00
unknown	7fb9850b68	fixed possible PIC linkage faults by moving `merge` to static	2015-08-16 09:51:04 -04:00
unknown	bf7c98f586	Conker's BFD micro-optimization with 2's cmpl. integer division	2015-06-08 16:28:21 -04:00
Conchúr Navid	c10a7f570e	Sort the includes based on type and names	2015-03-21 10:58:40 +01:00
no	71fe84e2dc	discovered and fixed implicit SE warning in 64-bit compiles	2015-01-30 15:13:53 -05:00
no	9e0328f45b	Fix GNU assembler syntax errors by prefixing vcr's with cf_.	2015-01-30 14:16:55 -05:00
unknown	2d1887de16	k.	2015-01-29 11:31:35 -05:00
unknown	c90be1f99c	enforcing unsigned types for bit masks and bit-sensitive work	2015-01-29 08:58:37 -05:00
unknown	34819b524d	Force potentially missing vectorization with zeroing arrays.	2015-01-28 13:56:40 -05:00
unknown	14ff3d4878	fixed SIMD::SSE macros for bi-compatibility with __m128i and arrays	2015-01-28 13:02:56 -05:00
unknown	010f192a4d	warning fix at enumerated storing of unsigned max to signed	2015-01-28 12:12:10 -05:00
unknown	cebd37c835	slight improvements to CPU complement/unsigned portability	2015-01-27 23:00:46 -05:00
unknown	fcc46e7845	Avoid Unix name collisions with RSP vector function exports.	2015-01-27 22:53:15 -05:00
unknown	7e72ec2566	more portable VABS--should compare -32768 before mult, not after	2015-01-27 21:29:21 -05:00

1 2 3 4 5 ...

273 commits