Emulation/mupen64plus-rsp-cxd4

mirror of https://github.com/mupen64plus/mupen64plus-rsp-cxd4.git synced 2025-04-02 10:51:55 -04:00

Author	SHA1	Message	Date
Iconoclast	cecd9976e8	optimized _mm_cmplt_epu16() composite method A: #define _mm_cmplt_epu16(m, n) _mm_cmpgt_epu16(n, m) define _mm_cmpgt_epu16(m, n) _mm_andnot_si128(\ _mm_cmpeq_epi16(m, n), _mm_cmple_epu16(n, m)\ ) define _mm_cmple_epu16(m, n) _mm_cmpeq_epi16(\ _mm_subs_epu16(m, n), _mm_setzero_si128() ) multiply.o: 3,524 bytes; multiply.s: 9,883 bytes method B: #define _mm_cmplt_epu16(m, n) _mm_cmplt_epi16(\ _mm_xor_si128(m, _mm_setmin_epi16()), _mm_xor_si128(n, _mm_setmin_epi16())\ ) define _mm_setmin_epi16() _mm_slli_epi16(_mm_allones_si128(), 15) multiply.o: 3,504 bytes; multiply.s: 9,732 bytes	2018-03-18 20:57:51 -04:00
Iconoclast	ec3b55b48b	syntactical nits	2018-03-18 18:47:07 -04:00
Iconoclast	8857d37876	Count loop iterations with unsigned int, not int. Although functionally there is no difference (when just looping vector elements from 0 to 7) between using a signed int or an unsigned int, repeatedly seeing an inconsistent mix in usage between the two across different vector functions has been an ongoing distraction for years. It should be the same everywhere, and between signed int and unsigned int, unsigned int is the type which always fits within size_t from stddef.h, the safe type for memory pointers and dereference indices.	2018-03-18 18:19:02 -04:00
Iconoclast	af06eddbdd	fixed paste fail	2018-03-18 17:39:35 -04:00
Iconoclast	81c6bd1652	optimized MAC overflow carry when subtracting by -1	2018-03-18 17:15:29 -04:00
Iconoclast	b9e6b43ce5	vectorized VMACU vmacu_old.asm function has 99 instructions. vmacu_new.asm function has 50 instructions.	2018-03-18 10:28:13 -04:00
Iconoclast	cc4a5bb619	vectorized VMACF New VMACF with manually written SSE2 is 45 instructions. Old VMACF with auto-vectorized C code was 91 instructions.	2018-03-17 21:56:20 -04:00
unknown	71356b752a	deleted VMACQ from the function table	2015-11-30 23:15:06 -05:00
unknown	7fb9850b68	fixed possible PIC linkage faults by moving `merge` to static	2015-08-16 09:51:04 -04:00
unknown	a4a7f4bd8e	forgot to modernize a few types	2015-01-18 16:39:59 -05:00
unknown	bfd74741f9	force vectorization of unsigned multiply, overflow and VMADL clamp	2014-10-28 20:50:10 -04:00
unknown	55ad9ad9d8	optimized VMADN with static overflow, carry and multiply-add	2014-10-28 15:35:05 -04:00
unknown	6d17d19dc6	correspond VMUDM intrinsics to multiply-accumulate variation	2014-10-26 23:58:18 -04:00
unknown	5cce9f457e	new algorithm for mixed signed * unsigned factorization	2014-10-23 16:45:01 -04:00
unknown	ef09b4eb5d	redesign VMUDN with carry and overflow/underflow SSE logic	2014-10-22 22:45:34 -04:00
unknown	f810a85e31	refer unsigned overflow to `negative' mask	2014-10-21 19:58:10 -04:00
unknown	c7a468e3d7	corresponding optimizations to VMUDL (same multiply, diff. clamp)	2014-10-21 00:05:26 -04:00
unknown	9dbdcc490c	restyled some optimization and fix 48-bit MADD sign-extension	2014-10-20 22:25:01 -04:00
unknown	b832e39a92	merged bi-arch VMULF template into optimized SIMD mulf	2014-10-20 00:57:40 -04:00
unknown	d768f51077	more direct multiply-add high operation without bi-arch template	2014-10-18 22:27:08 -04:00
unknown	79c5aa0cf4	removed bi-arch template for VMUDH	2014-10-17 22:43:43 -04:00
unknown	291e7fb10b	remove bi-arch template for VMUDL as mudl was greatly simplified in SSE	2014-10-17 19:02:02 -04:00
unknown	158a4d0b60	pass only 2 XMM operands, w/ no return slot ifndef ARCH_MIN_SSE2	2014-10-16 00:43:37 -04:00
unknown	f1481dd39b	restructured modular layout of the source, dropped some optional features	2014-10-09 16:45:55 -04:00

24 commits