cen64

mirror of https://github.com/n64dev/cen64.git synced 2025-04-02 10:31:54 -04:00

Author	SHA1	Message	Date
pseudophpt	643e4a028a	Add shuffling to VMOV instruction	2018-09-03 22:52:18 -04:00
Simon Eriksson	35f15f8db4	rsp: Ignore highest bit of RSP CP0 register number.	2016-10-08 20:56:26 +02:00
Tyler Stachecki	1e86268eee	rsp: Fix SQV and SRV (more endianness issues).	2016-07-09 19:38:26 -04:00
Tyler Stachecki	ab2c932aaf	rsp: Fix SP->RDRAM stride bug. krom spotted this one using his upcoming GB emulator.	2016-07-09 19:01:45 -04:00
Tyler Stachecki	1e47020ccc	rsp: Fix LQV bug (related to endianness).	2016-07-09 16:24:40 -04:00
Tyler Stachecki	cae6b6de78	rsp: Fix LBV bug (related to endianness).	2016-07-09 16:14:27 -04:00
Tyler Stachecki	6d3cd1e0d0	rsp: Fix link PC result (12th bit should not get set).	2016-07-09 13:30:05 -04:00
Tyler Stachecki	91b18f2644	rsp: Implement CTC2.	2016-06-29 21:38:25 -04:00
Tyler J. Stachecki	9492bba954	Another MSVC build fix.	2016-06-26 17:23:48 -04:00
Tyler J. Stachecki	3288229a50	Start fixing MSVC builds. Conflicts: rdp/n64video.c	2016-06-26 17:19:17 -04:00
Tyler J. Stachecki	d905183b11	izy removed the LUT from bitwise operations. In addition to removal of all memory accesses from the functions, these functions also result in fewer executed instructions in some cases.	2016-03-16 22:59:22 -04:00
Tyler J. Stachecki	3565a05f30	rsp: Use host byte ordering for ICACHE. Up until the, the RSP was storing instruction words in big- endian format. Thus, each fetch on an x86 host requires a byteswap. This is wasteful, so use host byte ordering for the ICACHE (as the VR4300 does now).	2016-02-27 19:13:50 -05:00
Tyler J. Stachecki	88c65ae630	Another great optimization from izy. izy managed to remove another LUT used in add/sub related insructions. The devil is in the details (see commit). <new>: 00000000004006b0 <rsp_addsub_mask>: 4006b0: c1 ef 02 shr $0x2,%edi 4006b3: 19 c0 sbb %eax,%eax 4006b5: c3 retq <old>: 00000000004006d0 <rsp_addsub_mask>: 4006d0: 83 e7 02 and $0x2,%edi 4006d3: 8b 04 bd 80 07 40 00 mov 0x400780(,%rdi,4),%eax 4006da: c3 retq "You see that this patch doesn't increase the amount of instructions. They are always two/three/four instructions and with automatic register selection. This is always the case with a MOV from memory... you can load to any register, but the same will happen with a SBB over itself. That is also the reason why when the function is inlined it won't require any special register (such as a the EAX:EDX pair, the "cltd" instruction you see in the 32 bit code is only a coincidence caused by the optimizations done by the gcc and isn't mandatory). The System V AMD64 calling convention puts the input parameter in rdi, but wherever the selector is placed nothing changes. The output parameter is in rax, but MOV/SBB can work with any register when inlined.	2016-02-07 14:01:00 -05:00
Tyler J. Stachecki	e12a459b18	More optimization patches from izy. izy noticed that the branch LUT was generating memory moves and could be replaced with an inlined function that coerces gcc into generating a lea in its place: 4005ac: 8d 1c 00 lea (%rax,%rax,1),%ebx 4005af: c1 fb 1f sar $0x1f,%ebx 4005b2: f7 d3 not %ebx (no memory access) 4005b9: c1 e8 1e shr $0x1e,%eax 4005bc: 83 e0 01 and $0x1,%eax 4005bf: 44 8b 24 85 90 07 40 mov 0x400790(,%rax,4),%r12d (original has memory access) This ends up optimizing branch instructions quite nicely: "You see that when you use "mask" you execute "~mask". The compiler understands that ~(~(partial_mask)) = partial_mask and removes both "NOTs". So in this case my version uses 2 instructions and no memory access/cache pollution."	2016-02-06 13:43:07 -05:00
Tyler J. Stachecki	e2e72821e2	Try to reduce component cycle overheads. Oftentimes, many of our countrollers are just doing a simple countdown and don't perform any real work for the cycle. Pull those parts out into headers so that the compiler can 'see' that and optimize accordingly.	2016-01-30 14:58:31 -05:00
Tyler J. Stachecki	401811c33f	Drop in atomics (required for multithreading).	2016-01-24 22:13:36 -05:00
Derek "Turtle" Roe	8b89df2fdc	See long description Replaced all references to simulation with emulation Updated copyright year Updated .gitignore to reduce chances of random files being uploaded to the repo Added .gitattributes to normalize all text files, and to ignore binary files (which includes the logo and the NEC PDF)	2015-07-01 18:44:21 -05:00
Tyler J. Stachecki	f4b182835c	Various small optimizations.	2015-05-08 09:58:18 -04:00
Tyler Stachecki	1ba67eec9d	Alignment/size optimizations.	2015-01-28 22:41:07 -05:00
Tyler Stachecki	ca0b0c944d	Vectorize/inline/optimize CFC2.	2015-01-27 10:28:36 -05:00
Tyler Stachecki	3cc07a7ae4	Unroll the top-level hot functions.	2015-01-22 14:31:54 -05:00
Tyler Stachecki	4b77d3ed61	RSP: Fix opcode cache bug.	2015-01-13 18:02:01 -05:00
Tyler Stachecki	acd03ec4c6	RSP: Add an opcode cache for performance.	2015-01-09 23:22:39 -05:00
Tyler Stachecki	2c94219a9b	RSP: Fix scalar load-use stall.	2015-01-09 23:22:32 -05:00
Tyler Stachecki	79b02e4702	RSP: Optimize memory requests slightly.	2015-01-09 23:22:26 -05:00
Tyler Stachecki	28196d2076	RSP: Optimize decoder/stall checks slightly.	2015-01-09 23:22:20 -05:00
Tyler Stachecki	321cf584f0	Remove some hacks from the RSP pipeline.	2015-01-08 12:17:06 -05:00
Tyler Stachecki	cc3aff976c	Add 64DD mappings and a controller.	2015-01-06 14:07:45 -05:00
Tyler Stachecki	028d8e673d	Decoder optimization: drastically reduce size.	2015-01-06 11:39:36 -05:00
Tyler Stachecki	efc4e38793	Remove an old, unused function.	2015-01-06 02:18:49 -05:00
Tyler Stachecki	e63f8b08e3	Perform some really clever branch folding. Fold all the integer loads and stores into one code path.	2015-01-06 02:18:31 -05:00
Tyler Stachecki	ec3748f0c2	Trim off a few hundred bytes of code.	2015-01-05 22:59:52 -05:00
Tyler Stachecki	c7a4a43242	Same as the last commit, but with the RSP.	2015-01-05 22:12:44 -05:00
Tyler Stachecki	a648cedc87	More cleanup of the fault/TLB code.	2015-01-04 15:38:56 -05:00
Tyler Stachecki	aa175bf6d6	Fix the JALR RSP bug, similar to last commit.	2015-01-04 12:18:03 -05:00
Tyler Stachecki	c795c4ad2d	Remove old function definitions.	2015-01-03 00:49:52 -05:00
Tyler Stachecki	2697ba9445	Merge more functions together.	2015-01-02 23:51:53 -05:00
Tyler Stachecki	1c8f871df8	Start merging RSP vector functions. No need to separate all these functions when they contain so much common code, so start combining things for the sake of locality and predictor effectiveness (and size). In addition to these benefits, the CPU backend is usually busy during the execution of these functions, so suffering a misprediction isn't as painful (especially seeing as we can potentially improve the prediction from the indirect branch).	2015-01-02 22:21:32 -05:00
Tyler Stachecki	c1f1998c78	Add an implementation for VMACU.	2015-01-02 21:04:44 -05:00
Tyler Stachecki	742ffc1493	Fix a series of RSP bugs that krom pointed out.	2015-01-01 21:13:41 -05:00
Tyler Stachecki	267d56491e	Get the Windows build in running condition. Conflicts: rdp/n64video.c	2015-01-01 15:00:53 -05:00
Tyler Stachecki	b52962aa19	Fix RSP bug that arises on BREAK.	2015-01-01 10:46:48 -05:00
Tyler Stachecki	e100147379	Add register-caching version of VCH. Thanks go out to AIO for rounding out this commit with his optimized SSE2 variant.	2015-01-01 10:46:41 -05:00
Tyler Stachecki	5e313634d3	Enable register-caching on MinGW. Use a prelude to get around Microsoft's stupid calling convention.	2015-01-01 10:46:10 -05:00
Tyler Stachecki	b6f0d0ec58	Set initial values for VCC/VCO/VCE. Thanks, krom!	2015-01-01 10:45:45 -05:00
Tyler Stachecki	94ad149a12	Actually enable the register caching... And fix a lot of bugs introduced with a regex.	2015-01-01 10:44:47 -05:00
Tyler Stachecki	7bc95ee3ee	Implement register-caching version of VLT.	2015-01-01 10:44:40 -05:00
Tyler Stachecki	9b941eced8	Change RSP calling convention. pblendvb needs the mask in %xmm0, so change the calling convention around just enough so we can cut out a movdqa from most instructions.	2015-01-01 10:44:34 -05:00
Tyler Stachecki	4aabd7f49e	Minor tweaks to VEQ/VNE register-cached versions.	2015-01-01 10:44:16 -05:00
Tyler Stachecki	e810689fde	Implement register-caching versions of VGE.	2015-01-01 10:44:09 -05:00

1 2 3

146 commits