cen64

mirror of https://github.com/n64dev/cen64.git synced 2024-06-20 21:17:58 -04:00

Author	SHA1	Message	Date
Giovanni Bajo	0902da8113	Fix SRA/SRAV opcodes These opcodes surprisingly let the 33th bit shift in into the lower 32-bits, before sign-extension.	2022-01-18 23:47:28 +01:00
Giovanni Bajo	a56fa4ba41	Fix two bugs in COP0 count First, since the internal register is kept in CPU cycles (not RCP cycles), we need to double the value written via MTC0/DMTC0. Second, writing a count equal to compare would cause an infinite loop because the fault would be triggered while PC was on the instruction doing MTC0 itself, which would be then re-executed at the end of the exception. On real hardware, in general, when COUNT==COMPARE, the interrupt happens a few cycles later, enough for PC to move to other opcodes. Instead of trying to implement this, I've simply made sure that the interrupt happened after the opcode was executed rather than before. Also, since the internal counter is in CPU cycles, we make sure to only raise the CAUSE bit once.	2021-06-13 23:19:07 +02:00
Giovanni Bajo	622dd402f0	vr4300: fix badvaddr register in TLB exceptions. Currently, all load/store opcodes (with the exception of LWL/LWR) mask the lowest bit of address that causes a TLB exception in the BADVADDR COP0 register. This is wrong because the VR4300 reports the exact faulting address in that register, the reason being that the exception handler must require it.	2021-05-04 00:23:24 +02:00
James Lambert	deda9f9709	Have debugger handle memory exceptions	2021-03-08 20:17:17 +01:00
James Lambert	2865d107e4	Implement debugging hooks into vr4300	2021-01-10 17:07:21 -07:00
Lauri Kasanen	55a46f45da	Implement Reserved Instruction exception	2020-12-28 09:42:55 +02:00
Tyler Stachecki	b9c36a4e7f	Merge pull request #184 from clbr/fpu Implement fpu prid	2020-12-27 12:42:33 -05:00
Tyler Stachecki	814c272ca4	Merge pull request #159 from lambertjamesd/implement-trap-instructions Implement trap instructions	2020-12-27 12:41:58 -05:00
James Lambert	ee9cd6f0da	Add correct INFO to trap macros Correctly annotate unused parameters in trap functions	2020-12-27 10:30:26 -07:00
Lauri Kasanen	1369c191a2	Implement fpu prid	2020-12-27 09:30:20 +02:00
Tyler Stachecki	ed6462e365	Merge pull request #178 from clbr/profiler Teach the profiler about L1D misses	2020-12-26 10:44:52 -05:00
Lauri Kasanen	4316ecd0dd	Implement cp0 prid	2020-12-23 16:09:12 +01:00
Lauri Kasanen	81bf10960f	Teach the profiler about L1D misses	2020-12-21 19:05:07 +02:00
James Lambert	054bcb90f7	Implement trap instructions	2020-09-05 17:46:10 -06:00
Simon Eriksson	fa73cbe0fe	vr4300: Implement break instruction	2020-05-27 23:00:53 +02:00
Pavel I. Kryukov	29d6d12339	Use typed pointer for MI interfaces of VR4300	2019-12-09 22:38:17 +03:00
Nabile Rahmani	05eedd91b5	DMTC0 status writes should update the segmented memory. (#135 ) This matches the MTC0 code.	2019-11-03 17:46:58 +01:00
Pavel I. Kryukov	9ddfa25c77	Extract all VR4300 interfaces to interface.h	2019-05-27 22:31:19 +03:00
Tyler Stachecki	1854ee7236	Merge pull request #115 from clbr/master Add profiling support	2019-05-26 18:26:08 -04:00
Nabile Rahmani	43cfdfee22	Only software interrupt bits are writable into the Cause register. See: VR4300 user manual, chapter 6.3.6.	2018-12-24 23:42:58 +01:00
Lauri Kasanen	9812f78917	Add profiling support	2018-12-16 20:04:09 +02:00
Pavel Kryukov	c6c03012fc	Use bus_controller pointers instead of type punning	2018-10-09 01:39:10 +03:00
queueRAM	dc2489fb47	Correct T5 register identifier	2018-01-19 04:49:13 +00:00
Tyler J. Stachecki	db6ea2029c	vr4300: Fix whitespace Signed-off-by: Tyler J. Stachecki <stachecki.tyler@gmail.com>	2017-12-19 14:28:28 -05:00
Simon Eriksson	65b5a08cd6	vr4300: Fix (d)div(u) results for division by zero	2017-12-18 23:25:27 +01:00
Tyler J. Stachecki	888d4dd054	Bugfixes found during n64chain development. Signed-off-by: Tyler J. Stachecki <stachecki.tyler@gmail.com>	2017-07-01 10:18:09 -04:00
Tyler J. Stachecki	d58edd90d8	vr4300: Support system call exceptions.	2017-04-17 22:13:53 -04:00
Tyler J. Stachecki	0d0e042817	vr4300/cp0: @sp1187: Fix undefined CP0 register access. simer/sp1187 pointed out that undefined CP0 registers all share a common value (that is, a write to any undefined CP0 register effectively acts as a write to all undefined CP0 registers). This commit implements the specified behaviour.	2016-10-19 12:10:44 -04:00
Tyler J. Stachecki	5a21c4c7d5	vr4300: Fix a major TLB bug. I seriously screwed up the TLB lookup logic so bad that only the first 8 TLB entries were being probed. Fix that. This fixes (at least) Paper Mario and Mario Tennis.	2016-07-09 14:49:59 -04:00
Tyler J. Stachecki	a12c5a3e04	vr4300: Fix a bug in (D) Index Load Tag. The VALID and DIRTY bits were not being shifted into the proper positions after reading them from the line states.	2016-07-09 12:39:45 -04:00
Tyler J. Stachecki	9886ec2587	vr4300: Fix a (fairly serious) cache bug. The action taken for (D) Index_Write_Back_Invalidate was wrong. As it turns out, the VR4300 manual has an extremely serious typo in the operation section. According to the manual, this cache operation should use the virtual address to index a block (line) in the cache. If that line is not in the INVALID state, it should be unconditionally flushed out to memory and the line should then be invalidated. The hardware, however, seems to only write back the block (line) in the event that the line is VALID and DIRTY. It does, however, invalidate the line regardless of whether or not the line was DIRTY. That is to say, CLEAN lines get invalidated as well. This commit fixes the erroneous behavior.	2016-07-09 12:12:15 -04:00
Tyler J. Stachecki	91926630e8	Fix non-Windows builds.	2016-06-29 20:21:31 -04:00
Tyler J. Stachecki	c1d381e729	Last MSVC build fix. With this, MSVC builds should now work.	2016-06-26 17:38:52 -04:00
Tyler J. Stachecki	d905183b11	izy removed the LUT from bitwise operations. In addition to removal of all memory accesses from the functions, these functions also result in fewer executed instructions in some cases.	2016-03-16 22:59:22 -04:00
Tyler J. Stachecki	e70455761b	More optimizations from izy. This optimization removes the LUT in LWL/LWR: At the moment when the LUT is used inlined this code is generated: OR LUTAddr(offset), dqm That is something like: OR 0x400760(,%rdi,8),dqm The code equivalent to "mov %edi,%edi" from the function above can get removed. I want to assume anyway that accessing the LUT and updating the "dqm" variable generates a single instruction with memory access. With the patch the generated code is: add $0xfffffffd,%edi sbb %rax,%rax OR %rax, dqm Thus my patch increases the amount of opcodes by two instructions. The LUT has 3 advantages on its side: - The function VR4300_LWL_LWR() will use the value read from the LUT only once and only for a logic-OR. - On x86 a logic-OR is an operation that can work with the source operand read from memory - The "offset" variable is pre-calculated and can be used "as is" by the LUT. The code with my patch (without the LUT) has only an advantage on its side: - The LUT (memory access) is removed	2016-02-22 00:02:26 -05:00
Tyler J. Stachecki	9d9655cf62	vr4300: Sign extend results from MFC0. This bug prevented Conker's Bad Fur Day from booting.	2016-02-17 02:00:13 -05:00
Tyler J. Stachecki	88c65ae630	Another great optimization from izy. izy managed to remove another LUT used in add/sub related insructions. The devil is in the details (see commit). <new>: 00000000004006b0 <rsp_addsub_mask>: 4006b0: c1 ef 02 shr $0x2,%edi 4006b3: 19 c0 sbb %eax,%eax 4006b5: c3 retq <old>: 00000000004006d0 <rsp_addsub_mask>: 4006d0: 83 e7 02 and $0x2,%edi 4006d3: 8b 04 bd 80 07 40 00 mov 0x400780(,%rdi,4),%eax 4006da: c3 retq "You see that this patch doesn't increase the amount of instructions. They are always two/three/four instructions and with automatic register selection. This is always the case with a MOV from memory... you can load to any register, but the same will happen with a SBB over itself. That is also the reason why when the function is inlined it won't require any special register (such as a the EAX:EDX pair, the "cltd" instruction you see in the 32 bit code is only a coincidence caused by the optimizations done by the gcc and isn't mandatory). The System V AMD64 calling convention puts the input parameter in rdi, but wherever the selector is placed nothing changes. The output parameter is in rax, but MOV/SBB can work with any register when inlined.	2016-02-07 14:01:00 -05:00
Tyler J. Stachecki	e12a459b18	More optimization patches from izy. izy noticed that the branch LUT was generating memory moves and could be replaced with an inlined function that coerces gcc into generating a lea in its place: 4005ac: 8d 1c 00 lea (%rax,%rax,1),%ebx 4005af: c1 fb 1f sar $0x1f,%ebx 4005b2: f7 d3 not %ebx (no memory access) 4005b9: c1 e8 1e shr $0x1e,%eax 4005bc: 83 e0 01 and $0x1,%eax 4005bf: 44 8b 24 85 90 07 40 mov 0x400790(,%rax,4),%r12d (original has memory access) This ends up optimizing branch instructions quite nicely: "You see that when you use "mask" you execute "~mask". The compiler understands that ~(~(partial_mask)) = partial_mask and removes both "NOTs". So in this case my version uses 2 instructions and no memory access/cache pollution."	2016-02-06 13:43:07 -05:00
Tyler J. Stachecki	b7bf8be66d	Forgot a keyword in an older commit.	2016-01-30 15:42:38 -05:00
Tyler J. Stachecki	2b5eaa579d	Try to reduce VR4300 cycle overhead as well.	2016-01-30 14:58:31 -05:00
Tyler J. Stachecki	401811c33f	Drop in atomics (required for multithreading).	2016-01-24 22:13:36 -05:00
Tyler J. Stachecki	f27c7c7d97	Delay when the cache operation requires it. Also slightly tighten the emulated memory delays. With this commit, WDC boots (but crashes shortly after). Seems like memory timings are coming into play, among other things.	2015-08-19 00:07:15 -04:00
Tyler Stachecki	98d3ae952c	Implement DCB-type stalls.	2015-07-05 08:15:47 -04:00
Derek "Turtle" Roe	8b89df2fdc	See long description Replaced all references to simulation with emulation Updated copyright year Updated .gitignore to reduce chances of random files being uploaded to the repo Added .gitattributes to normalize all text files, and to ignore binary files (which includes the logo and the NEC PDF)	2015-07-01 18:44:21 -05:00
Tyler J. Stachecki	af9b9a489a	Add a temporary hack for the CACHE instruction. When a CACHE instruction uses a mapped virtual address, and a TLB miss results... just ignore it! Clearly, this isn't the right thing to do, but all documentation is ambiguous and this seems to float the boat for now.	2015-05-20 22:36:41 -04:00
Tyler J. Stachecki	daee3698e4	VR4300: CACHE instructions can't cause TLB Mod.	2015-05-20 20:57:46 -04:00
Tyler J. Stachecki	793d8212fd	VR4300: Minor pipeline optimizations.	2015-05-20 20:57:33 -04:00
Tyler J. Stachecki	f4b182835c	Various small optimizations.	2015-05-08 09:58:18 -04:00
Tyler Stachecki	1e6fd9af4b	Fix a slew of cache bugs.	2015-01-29 10:09:14 -05:00
Tyler Stachecki	2f64037d94	Various FPU optimizations.	2015-01-29 10:09:06 -05:00

1 2 3 4 5 ...

283 commits