Commit graph

283 commits

Author SHA1 Message Date
Giovanni Bajo 0902da8113 Fix SRA/SRAV opcodes
These opcodes surprisingly let the 33th bit shift in into the lower 32-bits,
before sign-extension.
2022-01-18 23:47:28 +01:00
Giovanni Bajo a56fa4ba41 Fix two bugs in COP0 count
First, since the internal register is kept in CPU cycles (not RCP cycles),
we need to double the value written via MTC0/DMTC0.

Second, writing a count equal to compare would cause an infinite loop
because the fault would be triggered while PC was on the instruction
doing MTC0 itself, which would be then re-executed at the end of the
exception. On real hardware, in general, when COUNT==COMPARE, the
interrupt happens a few cycles later, enough for PC to move to other
opcodes. Instead of trying to implement this, I've simply made sure
that the interrupt happened after the opcode was executed rather than
before. Also, since the internal counter is in CPU cycles, we make
sure to only raise the CAUSE bit once.
2021-06-13 23:19:07 +02:00
Giovanni Bajo 622dd402f0 vr4300: fix badvaddr register in TLB exceptions.
Currently, all load/store opcodes (with the exception of LWL/LWR) mask
the lowest bit of address that causes a TLB exception in the BADVADDR
COP0 register. This is wrong because the VR4300 reports the exact
faulting address in that register, the reason being that the exception
handler must require it.
2021-05-04 00:23:24 +02:00
James Lambert deda9f9709 Have debugger handle memory exceptions 2021-03-08 20:17:17 +01:00
James Lambert 2865d107e4 Implement debugging hooks into vr4300 2021-01-10 17:07:21 -07:00
Lauri Kasanen 55a46f45da Implement Reserved Instruction exception 2020-12-28 09:42:55 +02:00
Tyler Stachecki b9c36a4e7f
Merge pull request #184 from clbr/fpu
Implement fpu prid
2020-12-27 12:42:33 -05:00
Tyler Stachecki 814c272ca4
Merge pull request #159 from lambertjamesd/implement-trap-instructions
Implement trap instructions
2020-12-27 12:41:58 -05:00
James Lambert ee9cd6f0da Add correct INFO to trap macros
Correctly annotate unused parameters in trap functions
2020-12-27 10:30:26 -07:00
Lauri Kasanen 1369c191a2 Implement fpu prid 2020-12-27 09:30:20 +02:00
Tyler Stachecki ed6462e365
Merge pull request #178 from clbr/profiler
Teach the profiler about L1D misses
2020-12-26 10:44:52 -05:00
Lauri Kasanen 4316ecd0dd Implement cp0 prid 2020-12-23 16:09:12 +01:00
Lauri Kasanen 81bf10960f Teach the profiler about L1D misses 2020-12-21 19:05:07 +02:00
James Lambert 054bcb90f7 Implement trap instructions 2020-09-05 17:46:10 -06:00
Simon Eriksson fa73cbe0fe vr4300: Implement break instruction 2020-05-27 23:00:53 +02:00
Pavel I. Kryukov 29d6d12339 Use typed pointer for MI interfaces of VR4300 2019-12-09 22:38:17 +03:00
Nabile Rahmani 05eedd91b5 DMTC0 status writes should update the segmented memory. (#135)
This matches the MTC0 code.
2019-11-03 17:46:58 +01:00
Pavel I. Kryukov 9ddfa25c77 Extract all VR4300 interfaces to interface.h 2019-05-27 22:31:19 +03:00
Tyler Stachecki 1854ee7236
Merge pull request #115 from clbr/master
Add profiling support
2019-05-26 18:26:08 -04:00
Nabile Rahmani 43cfdfee22 Only software interrupt bits are writable into the Cause register.
See: VR4300 user manual, chapter 6.3.6.
2018-12-24 23:42:58 +01:00
Lauri Kasanen 9812f78917 Add profiling support 2018-12-16 20:04:09 +02:00
Pavel Kryukov c6c03012fc Use bus_controller pointers instead of type punning 2018-10-09 01:39:10 +03:00
queueRAM dc2489fb47 Correct T5 register identifier 2018-01-19 04:49:13 +00:00
Tyler J. Stachecki db6ea2029c vr4300: Fix whitespace
Signed-off-by: Tyler J. Stachecki <stachecki.tyler@gmail.com>
2017-12-19 14:28:28 -05:00
Simon Eriksson 65b5a08cd6 vr4300: Fix (d)div(u) results for division by zero 2017-12-18 23:25:27 +01:00
Tyler J. Stachecki 888d4dd054 Bugfixes found during n64chain development.
Signed-off-by: Tyler J. Stachecki <stachecki.tyler@gmail.com>
2017-07-01 10:18:09 -04:00
Tyler J. Stachecki d58edd90d8 vr4300: Support system call exceptions. 2017-04-17 22:13:53 -04:00
Tyler J. Stachecki 0d0e042817 vr4300/cp0: @sp1187: Fix undefined CP0 register access.
simer/sp1187 pointed out that undefined CP0 registers all
share a common value (that is, a write to any undefined CP0
register effectively acts as a write to *all* undefined CP0
registers).

This commit implements the specified behaviour.
2016-10-19 12:10:44 -04:00
Tyler J. Stachecki 5a21c4c7d5 vr4300: Fix a major TLB bug.
I seriously screwed up the TLB lookup logic so bad that
only the first 8 TLB entries were being probed. Fix that.

This fixes (at least) Paper Mario and Mario Tennis.
2016-07-09 14:49:59 -04:00
Tyler J. Stachecki a12c5a3e04 vr4300: Fix a bug in (D) Index Load Tag.
The VALID and DIRTY bits were not being shifted into the
proper positions after reading them from the line states.
2016-07-09 12:39:45 -04:00
Tyler J. Stachecki 9886ec2587 vr4300: Fix a (fairly serious) cache bug.
The action taken for (D) Index_Write_Back_Invalidate was
wrong. As it turns out, the VR4300 manual has an extremely
serious typo in the operation section.

According to the manual, this cache operation should use
the virtual address to index a block (line) in the cache.
If that line is not in the INVALID state, it should be
unconditionally flushed out to memory and the line should
then be invalidated.

The hardware, however, seems to only write back the block
(line) in the event that the line is VALID and DIRTY. It
does, however, invalidate the line regardless of whether
or not the line was DIRTY. That is to say, CLEAN lines get
invalidated as well.

This commit fixes the erroneous behavior.
2016-07-09 12:12:15 -04:00
Tyler J. Stachecki 91926630e8 Fix non-Windows builds. 2016-06-29 20:21:31 -04:00
Tyler J. Stachecki c1d381e729 Last MSVC build fix.
With this, MSVC builds should now work.
2016-06-26 17:38:52 -04:00
Tyler J. Stachecki d905183b11 izy removed the LUT from bitwise operations.
In addition to removal of all memory accesses from the
functions, these functions also result in fewer executed
instructions in some cases.
2016-03-16 22:59:22 -04:00
Tyler J. Stachecki e70455761b More optimizations from izy.
This optimization removes the LUT in LWL/LWR:

At the moment when the LUT is used inlined this code is generated:
OR LUTAddr(offset), dqm

That is something like:
OR 0x400760(,%rdi,8),dqm

The code equivalent to "mov %edi,%edi" from the function above can get removed.
I want to assume anyway that accessing the LUT and updating the "dqm" variable
generates a single instruction with memory access.

With the patch the generated code is:
add $0xfffffffd,%edi
sbb %rax,%rax
OR %rax, dqm
Thus my patch increases the amount of opcodes by two instructions.

The LUT has 3 advantages on its side:
- The function VR4300_LWL_LWR() will use the value read from the LUT only once
  and only for a logic-OR.
- On x86 a logic-OR is an operation that can work with the source operand read
  from memory
- The "offset" variable is pre-calculated and can be used "as is" by the LUT.

The code with my patch (without the LUT) has only an advantage on its side:
- The LUT (memory access) is removed
2016-02-22 00:02:26 -05:00
Tyler J. Stachecki 9d9655cf62 vr4300: Sign extend results from MFC0.
This bug prevented Conker's Bad Fur Day from booting.
2016-02-17 02:00:13 -05:00
Tyler J. Stachecki 88c65ae630 Another great optimization from izy.
izy managed to remove another LUT used in add/sub related
insructions. The devil is in the details (see commit).

<new>:
00000000004006b0 <rsp_addsub_mask>:
  4006b0:       c1 ef 02                shr    $0x2,%edi
  4006b3:       19 c0                   sbb    %eax,%eax
  4006b5:       c3                      retq

<old>:
00000000004006d0 <rsp_addsub_mask>:
  4006d0:       83 e7 02                and    $0x2,%edi
  4006d3:       8b 04 bd 80 07 40 00    mov    0x400780(,%rdi,4),%eax
  4006da:       c3                      retq

"You see that this patch doesn't increase the amount of
instructions. They are always two/three/four instructions
and with automatic register selection. This is always the
case with a MOV from memory... you can load to any register,
but the same will happen with a SBB over itself. That is
also the reason why when the function is inlined it won't
require any special register (such as a the EAX:EDX pair,
the "cltd" instruction you see in the 32 bit code is only
a coincidence caused by the optimizations done by the gcc
and isn't mandatory).

The System V AMD64 calling convention puts the input
parameter in rdi, but wherever the selector is placed
nothing changes. The output parameter is in rax, but
MOV/SBB can work with any register when inlined.
2016-02-07 14:01:00 -05:00
Tyler J. Stachecki e12a459b18 More optimization patches from izy.
izy noticed that the branch LUT was generating memory moves
and could be replaced with an inlined function that coerces
gcc into generating a lea in its place:

  4005ac:       8d 1c 00                lea    (%rax,%rax,1),%ebx
  4005af:       c1 fb 1f                sar    $0x1f,%ebx
  4005b2:       f7 d3                   not    %ebx
(no memory access)

  4005b9:       c1 e8 1e                shr    $0x1e,%eax
  4005bc:       83 e0 01                and    $0x1,%eax
  4005bf:       44 8b 24 85 90 07 40    mov    0x400790(,%rax,4),%r12d
(original has memory access)

This ends up optimizing branch instructions quite nicely:

"You see that when you use "mask" you execute "~mask". The
compiler understands that ~(~(partial_mask)) = partial_mask
and removes both "NOTs". So in this case my version uses 2
instructions and no memory access/cache pollution."
2016-02-06 13:43:07 -05:00
Tyler J. Stachecki b7bf8be66d Forgot a keyword in an older commit. 2016-01-30 15:42:38 -05:00
Tyler J. Stachecki 2b5eaa579d Try to reduce VR4300 cycle overhead as well. 2016-01-30 14:58:31 -05:00
Tyler J. Stachecki 401811c33f Drop in atomics (required for multithreading). 2016-01-24 22:13:36 -05:00
Tyler J. Stachecki f27c7c7d97 Delay when the cache operation requires it.
Also slightly tighten the emulated memory delays. With
this commit, WDC boots (but crashes shortly after). Seems
like memory timings are coming into play, among other
things.
2015-08-19 00:07:15 -04:00
Tyler Stachecki 98d3ae952c Implement DCB-type stalls. 2015-07-05 08:15:47 -04:00
Derek "Turtle" Roe 8b89df2fdc See long description
Replaced all references to simulation with emulation
Updated copyright year
Updated .gitignore to reduce chances of random files being uploaded to
the repo
Added .gitattributes to normalize all text files, and to ignore binary
files (which includes the logo and the NEC PDF)
2015-07-01 18:44:21 -05:00
Tyler J. Stachecki af9b9a489a Add a temporary hack for the CACHE instruction.
When a CACHE instruction uses a mapped virtual address,
and a TLB miss results... just ignore it! Clearly, this
isn't the right thing to do, but all documentation is
ambiguous and this seems to float the boat for now.
2015-05-20 22:36:41 -04:00
Tyler J. Stachecki daee3698e4 VR4300: CACHE instructions can't cause TLB Mod. 2015-05-20 20:57:46 -04:00
Tyler J. Stachecki 793d8212fd VR4300: Minor pipeline optimizations. 2015-05-20 20:57:33 -04:00
Tyler J. Stachecki f4b182835c Various small optimizations. 2015-05-08 09:58:18 -04:00
Tyler Stachecki 1e6fd9af4b Fix a slew of cache bugs. 2015-01-29 10:09:14 -05:00
Tyler Stachecki 2f64037d94 Various FPU optimizations. 2015-01-29 10:09:06 -05:00