cen64

mirror of https://github.com/n64dev/cen64.git synced 2024-06-22 22:12:45 -04:00

Author	SHA1	Message	Date
Giovanni Bajo	87ebca00b5	Fix a few pipelining bugs with RSP 1) Setting SP_PC was not resetting the pipeline. This caused that changing the PC within a HALT/UNHALT sequence was still causing previous instructions in the pipeline (at the old address) to be executed. This is not how the hardware works: SP_PC is immediate and discards the whole pipeline. 2) BREAK did not correctly halt the processor at the right instruction, which in turn caused resumption after HALT to execute the wrong set of instructions. This was caused by the fact that the SP_STATUS change was written into the EXDF latch, which in turn takes 3 cycles to reach completion. Instead, we now use the DFWB latch, and we cause it to abort the RSP cycle if the processor is halted. This happens at the beginning of next cycle, which is the correct moment. 2bis) Since we are at it, use rsp_status_write to modify the RSP in this case, rather than a direct write to the register. This change fixes a race condition: SP_STATUS must be accessed atomically when cen64 runs in multithreaded mode. To use rsp_status_write, we need to introduce a nonexisting SP_SET_BROKE bit: we use the MSB, but then mask it out in MTC0 to avoid some code to inadvertently have that bit set. 3) When unhalting after BREAK, it's important to keep the correct PC which comes from the EX stage (the one that was going to be executed if BREAK didn't occur). Before, it was using the IF PC (fetch) which is farther in the future. Fixes #155	2021-12-17 00:23:47 +01:00
Simon Eriksson	eb935a85f7	rsp: Align RSP memory address in DMA to 8	2021-05-04 18:45:54 +02:00
Simon Eriksson	27917c7df8	rsp: Fix VNOP and VNULL	2021-03-08 20:07:19 +01:00
Lauri Kasanen	9464379f8a	rsp: Remove small IO writes RMW, hw does not do that	2020-12-21 16:28:53 +01:00
Simon Eriksson	e340a74a26	rsp: Remove copy-paste leftover from LTV/STV code	2020-05-31 20:25:26 +02:00
Adam Gashlin	0c40ffdde2	Preserve SP PC when clearing halt Also don't re-init pipeline if SP wasn't already halted. Fixes #151	2020-05-29 23:49:32 -07:00
Simon Eriksson	b08188f388	Basic RSP LTV/STV support	2020-04-15 07:38:09 +02:00
Pavel Kryukov	c6c03012fc	Use bus_controller pointers instead of type punning	2018-10-09 01:39:10 +03:00
peterlemon	58c6af5f98	add ifdef win32 to cp0.c	2018-09-10 23:32:35 +01:00
peterlemon	5ce3772646	fix master	2018-09-10 23:15:30 +01:00
pseudophpt	643e4a028a	Add shuffling to VMOV instruction	2018-09-03 22:52:18 -04:00
Simon Eriksson	35f15f8db4	rsp: Ignore highest bit of RSP CP0 register number.	2016-10-08 20:56:26 +02:00
Tyler Stachecki	1e86268eee	rsp: Fix SQV and SRV (more endianness issues).	2016-07-09 19:38:26 -04:00
Tyler Stachecki	ab2c932aaf	rsp: Fix SP->RDRAM stride bug. krom spotted this one using his upcoming GB emulator.	2016-07-09 19:01:45 -04:00
Tyler Stachecki	1e47020ccc	rsp: Fix LQV bug (related to endianness).	2016-07-09 16:24:40 -04:00
Tyler Stachecki	cae6b6de78	rsp: Fix LBV bug (related to endianness).	2016-07-09 16:14:27 -04:00
Tyler Stachecki	6d3cd1e0d0	rsp: Fix link PC result (12th bit should not get set).	2016-07-09 13:30:05 -04:00
Tyler Stachecki	91b18f2644	rsp: Implement CTC2.	2016-06-29 21:38:25 -04:00
Tyler J. Stachecki	9492bba954	Another MSVC build fix.	2016-06-26 17:23:48 -04:00
Tyler J. Stachecki	3288229a50	Start fixing MSVC builds. Conflicts: rdp/n64video.c	2016-06-26 17:19:17 -04:00
Tyler J. Stachecki	d905183b11	izy removed the LUT from bitwise operations. In addition to removal of all memory accesses from the functions, these functions also result in fewer executed instructions in some cases.	2016-03-16 22:59:22 -04:00
Tyler J. Stachecki	3565a05f30	rsp: Use host byte ordering for ICACHE. Up until the, the RSP was storing instruction words in big- endian format. Thus, each fetch on an x86 host requires a byteswap. This is wasteful, so use host byte ordering for the ICACHE (as the VR4300 does now).	2016-02-27 19:13:50 -05:00
Tyler J. Stachecki	88c65ae630	Another great optimization from izy. izy managed to remove another LUT used in add/sub related insructions. The devil is in the details (see commit). <new>: 00000000004006b0 <rsp_addsub_mask>: 4006b0: c1 ef 02 shr $0x2,%edi 4006b3: 19 c0 sbb %eax,%eax 4006b5: c3 retq <old>: 00000000004006d0 <rsp_addsub_mask>: 4006d0: 83 e7 02 and $0x2,%edi 4006d3: 8b 04 bd 80 07 40 00 mov 0x400780(,%rdi,4),%eax 4006da: c3 retq "You see that this patch doesn't increase the amount of instructions. They are always two/three/four instructions and with automatic register selection. This is always the case with a MOV from memory... you can load to any register, but the same will happen with a SBB over itself. That is also the reason why when the function is inlined it won't require any special register (such as a the EAX:EDX pair, the "cltd" instruction you see in the 32 bit code is only a coincidence caused by the optimizations done by the gcc and isn't mandatory). The System V AMD64 calling convention puts the input parameter in rdi, but wherever the selector is placed nothing changes. The output parameter is in rax, but MOV/SBB can work with any register when inlined.	2016-02-07 14:01:00 -05:00
Tyler J. Stachecki	e12a459b18	More optimization patches from izy. izy noticed that the branch LUT was generating memory moves and could be replaced with an inlined function that coerces gcc into generating a lea in its place: 4005ac: 8d 1c 00 lea (%rax,%rax,1),%ebx 4005af: c1 fb 1f sar $0x1f,%ebx 4005b2: f7 d3 not %ebx (no memory access) 4005b9: c1 e8 1e shr $0x1e,%eax 4005bc: 83 e0 01 and $0x1,%eax 4005bf: 44 8b 24 85 90 07 40 mov 0x400790(,%rax,4),%r12d (original has memory access) This ends up optimizing branch instructions quite nicely: "You see that when you use "mask" you execute "~mask". The compiler understands that ~(~(partial_mask)) = partial_mask and removes both "NOTs". So in this case my version uses 2 instructions and no memory access/cache pollution."	2016-02-06 13:43:07 -05:00
Tyler J. Stachecki	e2e72821e2	Try to reduce component cycle overheads. Oftentimes, many of our countrollers are just doing a simple countdown and don't perform any real work for the cycle. Pull those parts out into headers so that the compiler can 'see' that and optimize accordingly.	2016-01-30 14:58:31 -05:00
Tyler J. Stachecki	401811c33f	Drop in atomics (required for multithreading).	2016-01-24 22:13:36 -05:00
Derek "Turtle" Roe	8b89df2fdc	See long description Replaced all references to simulation with emulation Updated copyright year Updated .gitignore to reduce chances of random files being uploaded to the repo Added .gitattributes to normalize all text files, and to ignore binary files (which includes the logo and the NEC PDF)	2015-07-01 18:44:21 -05:00
Tyler J. Stachecki	f4b182835c	Various small optimizations.	2015-05-08 09:58:18 -04:00
Tyler Stachecki	1ba67eec9d	Alignment/size optimizations.	2015-01-28 22:41:07 -05:00
Tyler Stachecki	ca0b0c944d	Vectorize/inline/optimize CFC2.	2015-01-27 10:28:36 -05:00
Tyler Stachecki	3cc07a7ae4	Unroll the top-level hot functions.	2015-01-22 14:31:54 -05:00
Tyler Stachecki	4b77d3ed61	RSP: Fix opcode cache bug.	2015-01-13 18:02:01 -05:00
Tyler Stachecki	acd03ec4c6	RSP: Add an opcode cache for performance.	2015-01-09 23:22:39 -05:00
Tyler Stachecki	2c94219a9b	RSP: Fix scalar load-use stall.	2015-01-09 23:22:32 -05:00
Tyler Stachecki	79b02e4702	RSP: Optimize memory requests slightly.	2015-01-09 23:22:26 -05:00
Tyler Stachecki	28196d2076	RSP: Optimize decoder/stall checks slightly.	2015-01-09 23:22:20 -05:00
Tyler Stachecki	321cf584f0	Remove some hacks from the RSP pipeline.	2015-01-08 12:17:06 -05:00
Tyler Stachecki	cc3aff976c	Add 64DD mappings and a controller.	2015-01-06 14:07:45 -05:00
Tyler Stachecki	028d8e673d	Decoder optimization: drastically reduce size.	2015-01-06 11:39:36 -05:00
Tyler Stachecki	efc4e38793	Remove an old, unused function.	2015-01-06 02:18:49 -05:00
Tyler Stachecki	e63f8b08e3	Perform some really clever branch folding. Fold all the integer loads and stores into one code path.	2015-01-06 02:18:31 -05:00
Tyler Stachecki	ec3748f0c2	Trim off a few hundred bytes of code.	2015-01-05 22:59:52 -05:00
Tyler Stachecki	c7a4a43242	Same as the last commit, but with the RSP.	2015-01-05 22:12:44 -05:00
Tyler Stachecki	a648cedc87	More cleanup of the fault/TLB code.	2015-01-04 15:38:56 -05:00
Tyler Stachecki	aa175bf6d6	Fix the JALR RSP bug, similar to last commit.	2015-01-04 12:18:03 -05:00
Tyler Stachecki	c795c4ad2d	Remove old function definitions.	2015-01-03 00:49:52 -05:00
Tyler Stachecki	2697ba9445	Merge more functions together.	2015-01-02 23:51:53 -05:00
Tyler Stachecki	1c8f871df8	Start merging RSP vector functions. No need to separate all these functions when they contain so much common code, so start combining things for the sake of locality and predictor effectiveness (and size). In addition to these benefits, the CPU backend is usually busy during the execution of these functions, so suffering a misprediction isn't as painful (especially seeing as we can potentially improve the prediction from the indirect branch).	2015-01-02 22:21:32 -05:00
Tyler Stachecki	c1f1998c78	Add an implementation for VMACU.	2015-01-02 21:04:44 -05:00
Tyler Stachecki	742ffc1493	Fix a series of RSP bugs that krom pointed out.	2015-01-01 21:13:41 -05:00

1 2 3 4

156 commits