Commit graph

287 commits

Author SHA1 Message Date
Giovanni Bajo
474bf4782c vr4300: implement DADE exception.
Very likely to be wrong in details, but still better than an abort.
2022-05-29 23:26:04 +02:00
Giovanni Bajo
da84018e6e vr4300: implement RANDOM register reading, and fix TLB index masking 2022-05-29 23:26:04 +02:00
Giovanni Bajo
cf38848d5b Convert some asserts into regular code.
No need to assert for unlikely cases that can happen.
2022-05-29 23:26:04 +02:00
Giovanni Bajo
9f488ea96a vr4300: implement LL/SC as LW/SW
The real implementation is harder because it conflicts with how the
the VR4300 pipeline is emulated, but at the end of the day in the happy
path these functions are LW/SW. So instead of just refusing to emulate
them, better fallback to LW/SW.
2022-05-29 23:26:04 +02:00
Giovanni Bajo
0902da8113 Fix SRA/SRAV opcodes
These opcodes surprisingly let the 33th bit shift in into the lower 32-bits,
before sign-extension.
2022-01-18 23:47:28 +01:00
Giovanni Bajo
a56fa4ba41 Fix two bugs in COP0 count
First, since the internal register is kept in CPU cycles (not RCP cycles),
we need to double the value written via MTC0/DMTC0.

Second, writing a count equal to compare would cause an infinite loop
because the fault would be triggered while PC was on the instruction
doing MTC0 itself, which would be then re-executed at the end of the
exception. On real hardware, in general, when COUNT==COMPARE, the
interrupt happens a few cycles later, enough for PC to move to other
opcodes. Instead of trying to implement this, I've simply made sure
that the interrupt happened after the opcode was executed rather than
before. Also, since the internal counter is in CPU cycles, we make
sure to only raise the CAUSE bit once.
2021-06-13 23:19:07 +02:00
Giovanni Bajo
622dd402f0 vr4300: fix badvaddr register in TLB exceptions.
Currently, all load/store opcodes (with the exception of LWL/LWR) mask
the lowest bit of address that causes a TLB exception in the BADVADDR
COP0 register. This is wrong because the VR4300 reports the exact
faulting address in that register, the reason being that the exception
handler must require it.
2021-05-04 00:23:24 +02:00
James Lambert
deda9f9709 Have debugger handle memory exceptions 2021-03-08 20:17:17 +01:00
James Lambert
2865d107e4 Implement debugging hooks into vr4300 2021-01-10 17:07:21 -07:00
Lauri Kasanen
55a46f45da Implement Reserved Instruction exception 2020-12-28 09:42:55 +02:00
Tyler Stachecki
b9c36a4e7f
Merge pull request #184 from clbr/fpu
Implement fpu prid
2020-12-27 12:42:33 -05:00
Tyler Stachecki
814c272ca4
Merge pull request #159 from lambertjamesd/implement-trap-instructions
Implement trap instructions
2020-12-27 12:41:58 -05:00
James Lambert
ee9cd6f0da Add correct INFO to trap macros
Correctly annotate unused parameters in trap functions
2020-12-27 10:30:26 -07:00
Lauri Kasanen
1369c191a2 Implement fpu prid 2020-12-27 09:30:20 +02:00
Tyler Stachecki
ed6462e365
Merge pull request #178 from clbr/profiler
Teach the profiler about L1D misses
2020-12-26 10:44:52 -05:00
Lauri Kasanen
4316ecd0dd Implement cp0 prid 2020-12-23 16:09:12 +01:00
Lauri Kasanen
81bf10960f Teach the profiler about L1D misses 2020-12-21 19:05:07 +02:00
James Lambert
054bcb90f7 Implement trap instructions 2020-09-05 17:46:10 -06:00
Simon Eriksson
fa73cbe0fe vr4300: Implement break instruction 2020-05-27 23:00:53 +02:00
Pavel I. Kryukov
29d6d12339 Use typed pointer for MI interfaces of VR4300 2019-12-09 22:38:17 +03:00
Nabile Rahmani
05eedd91b5 DMTC0 status writes should update the segmented memory. (#135)
This matches the MTC0 code.
2019-11-03 17:46:58 +01:00
Pavel I. Kryukov
9ddfa25c77 Extract all VR4300 interfaces to interface.h 2019-05-27 22:31:19 +03:00
Tyler Stachecki
1854ee7236
Merge pull request #115 from clbr/master
Add profiling support
2019-05-26 18:26:08 -04:00
Nabile Rahmani
43cfdfee22 Only software interrupt bits are writable into the Cause register.
See: VR4300 user manual, chapter 6.3.6.
2018-12-24 23:42:58 +01:00
Lauri Kasanen
9812f78917 Add profiling support 2018-12-16 20:04:09 +02:00
Pavel Kryukov
c6c03012fc Use bus_controller pointers instead of type punning 2018-10-09 01:39:10 +03:00
queueRAM
dc2489fb47 Correct T5 register identifier 2018-01-19 04:49:13 +00:00
Tyler J. Stachecki
db6ea2029c vr4300: Fix whitespace
Signed-off-by: Tyler J. Stachecki <stachecki.tyler@gmail.com>
2017-12-19 14:28:28 -05:00
Simon Eriksson
65b5a08cd6 vr4300: Fix (d)div(u) results for division by zero 2017-12-18 23:25:27 +01:00
Tyler J. Stachecki
888d4dd054 Bugfixes found during n64chain development.
Signed-off-by: Tyler J. Stachecki <stachecki.tyler@gmail.com>
2017-07-01 10:18:09 -04:00
Tyler J. Stachecki
d58edd90d8 vr4300: Support system call exceptions. 2017-04-17 22:13:53 -04:00
Tyler J. Stachecki
0d0e042817 vr4300/cp0: @sp1187: Fix undefined CP0 register access.
simer/sp1187 pointed out that undefined CP0 registers all
share a common value (that is, a write to any undefined CP0
register effectively acts as a write to *all* undefined CP0
registers).

This commit implements the specified behaviour.
2016-10-19 12:10:44 -04:00
Tyler J. Stachecki
5a21c4c7d5 vr4300: Fix a major TLB bug.
I seriously screwed up the TLB lookup logic so bad that
only the first 8 TLB entries were being probed. Fix that.

This fixes (at least) Paper Mario and Mario Tennis.
2016-07-09 14:49:59 -04:00
Tyler J. Stachecki
a12c5a3e04 vr4300: Fix a bug in (D) Index Load Tag.
The VALID and DIRTY bits were not being shifted into the
proper positions after reading them from the line states.
2016-07-09 12:39:45 -04:00
Tyler J. Stachecki
9886ec2587 vr4300: Fix a (fairly serious) cache bug.
The action taken for (D) Index_Write_Back_Invalidate was
wrong. As it turns out, the VR4300 manual has an extremely
serious typo in the operation section.

According to the manual, this cache operation should use
the virtual address to index a block (line) in the cache.
If that line is not in the INVALID state, it should be
unconditionally flushed out to memory and the line should
then be invalidated.

The hardware, however, seems to only write back the block
(line) in the event that the line is VALID and DIRTY. It
does, however, invalidate the line regardless of whether
or not the line was DIRTY. That is to say, CLEAN lines get
invalidated as well.

This commit fixes the erroneous behavior.
2016-07-09 12:12:15 -04:00
Tyler J. Stachecki
91926630e8 Fix non-Windows builds. 2016-06-29 20:21:31 -04:00
Tyler J. Stachecki
c1d381e729 Last MSVC build fix.
With this, MSVC builds should now work.
2016-06-26 17:38:52 -04:00
Tyler J. Stachecki
d905183b11 izy removed the LUT from bitwise operations.
In addition to removal of all memory accesses from the
functions, these functions also result in fewer executed
instructions in some cases.
2016-03-16 22:59:22 -04:00
Tyler J. Stachecki
e70455761b More optimizations from izy.
This optimization removes the LUT in LWL/LWR:

At the moment when the LUT is used inlined this code is generated:
OR LUTAddr(offset), dqm

That is something like:
OR 0x400760(,%rdi,8),dqm

The code equivalent to "mov %edi,%edi" from the function above can get removed.
I want to assume anyway that accessing the LUT and updating the "dqm" variable
generates a single instruction with memory access.

With the patch the generated code is:
add $0xfffffffd,%edi
sbb %rax,%rax
OR %rax, dqm
Thus my patch increases the amount of opcodes by two instructions.

The LUT has 3 advantages on its side:
- The function VR4300_LWL_LWR() will use the value read from the LUT only once
  and only for a logic-OR.
- On x86 a logic-OR is an operation that can work with the source operand read
  from memory
- The "offset" variable is pre-calculated and can be used "as is" by the LUT.

The code with my patch (without the LUT) has only an advantage on its side:
- The LUT (memory access) is removed
2016-02-22 00:02:26 -05:00
Tyler J. Stachecki
9d9655cf62 vr4300: Sign extend results from MFC0.
This bug prevented Conker's Bad Fur Day from booting.
2016-02-17 02:00:13 -05:00
Tyler J. Stachecki
88c65ae630 Another great optimization from izy.
izy managed to remove another LUT used in add/sub related
insructions. The devil is in the details (see commit).

<new>:
00000000004006b0 <rsp_addsub_mask>:
  4006b0:       c1 ef 02                shr    $0x2,%edi
  4006b3:       19 c0                   sbb    %eax,%eax
  4006b5:       c3                      retq

<old>:
00000000004006d0 <rsp_addsub_mask>:
  4006d0:       83 e7 02                and    $0x2,%edi
  4006d3:       8b 04 bd 80 07 40 00    mov    0x400780(,%rdi,4),%eax
  4006da:       c3                      retq

"You see that this patch doesn't increase the amount of
instructions. They are always two/three/four instructions
and with automatic register selection. This is always the
case with a MOV from memory... you can load to any register,
but the same will happen with a SBB over itself. That is
also the reason why when the function is inlined it won't
require any special register (such as a the EAX:EDX pair,
the "cltd" instruction you see in the 32 bit code is only
a coincidence caused by the optimizations done by the gcc
and isn't mandatory).

The System V AMD64 calling convention puts the input
parameter in rdi, but wherever the selector is placed
nothing changes. The output parameter is in rax, but
MOV/SBB can work with any register when inlined.
2016-02-07 14:01:00 -05:00
Tyler J. Stachecki
e12a459b18 More optimization patches from izy.
izy noticed that the branch LUT was generating memory moves
and could be replaced with an inlined function that coerces
gcc into generating a lea in its place:

  4005ac:       8d 1c 00                lea    (%rax,%rax,1),%ebx
  4005af:       c1 fb 1f                sar    $0x1f,%ebx
  4005b2:       f7 d3                   not    %ebx
(no memory access)

  4005b9:       c1 e8 1e                shr    $0x1e,%eax
  4005bc:       83 e0 01                and    $0x1,%eax
  4005bf:       44 8b 24 85 90 07 40    mov    0x400790(,%rax,4),%r12d
(original has memory access)

This ends up optimizing branch instructions quite nicely:

"You see that when you use "mask" you execute "~mask". The
compiler understands that ~(~(partial_mask)) = partial_mask
and removes both "NOTs". So in this case my version uses 2
instructions and no memory access/cache pollution."
2016-02-06 13:43:07 -05:00
Tyler J. Stachecki
b7bf8be66d Forgot a keyword in an older commit. 2016-01-30 15:42:38 -05:00
Tyler J. Stachecki
2b5eaa579d Try to reduce VR4300 cycle overhead as well. 2016-01-30 14:58:31 -05:00
Tyler J. Stachecki
401811c33f Drop in atomics (required for multithreading). 2016-01-24 22:13:36 -05:00
Tyler J. Stachecki
f27c7c7d97 Delay when the cache operation requires it.
Also slightly tighten the emulated memory delays. With
this commit, WDC boots (but crashes shortly after). Seems
like memory timings are coming into play, among other
things.
2015-08-19 00:07:15 -04:00
Tyler Stachecki
98d3ae952c Implement DCB-type stalls. 2015-07-05 08:15:47 -04:00
Derek "Turtle" Roe
8b89df2fdc See long description
Replaced all references to simulation with emulation
Updated copyright year
Updated .gitignore to reduce chances of random files being uploaded to
the repo
Added .gitattributes to normalize all text files, and to ignore binary
files (which includes the logo and the NEC PDF)
2015-07-01 18:44:21 -05:00
Tyler J. Stachecki
af9b9a489a Add a temporary hack for the CACHE instruction.
When a CACHE instruction uses a mapped virtual address,
and a TLB miss results... just ignore it! Clearly, this
isn't the right thing to do, but all documentation is
ambiguous and this seems to float the boat for now.
2015-05-20 22:36:41 -04:00
Tyler J. Stachecki
daee3698e4 VR4300: CACHE instructions can't cause TLB Mod. 2015-05-20 20:57:46 -04:00