Commit graph

154 commits

Author SHA1 Message Date
Simon Eriksson 27917c7df8 rsp: Fix VNOP and VNULL 2021-03-08 20:07:19 +01:00
Lauri Kasanen 9464379f8a rsp: Remove small IO writes RMW, hw does not do that 2020-12-21 16:28:53 +01:00
Simon Eriksson e340a74a26 rsp: Remove copy-paste leftover from LTV/STV code 2020-05-31 20:25:26 +02:00
Adam Gashlin 0c40ffdde2 Preserve SP PC when clearing halt
Also don't re-init pipeline if SP wasn't already halted.

Fixes #151
2020-05-29 23:49:32 -07:00
Simon Eriksson b08188f388 Basic RSP LTV/STV support 2020-04-15 07:38:09 +02:00
Pavel Kryukov c6c03012fc Use bus_controller pointers instead of type punning 2018-10-09 01:39:10 +03:00
peterlemon 58c6af5f98 add ifdef win32 to cp0.c 2018-09-10 23:32:35 +01:00
peterlemon 5ce3772646 fix master 2018-09-10 23:15:30 +01:00
pseudophpt 643e4a028a Add shuffling to VMOV instruction 2018-09-03 22:52:18 -04:00
Simon Eriksson 35f15f8db4 rsp: Ignore highest bit of RSP CP0 register number. 2016-10-08 20:56:26 +02:00
Tyler Stachecki 1e86268eee rsp: Fix SQV and SRV (more endianness issues). 2016-07-09 19:38:26 -04:00
Tyler Stachecki ab2c932aaf rsp: Fix SP->RDRAM stride bug.
krom spotted this one using his upcoming GB emulator.
2016-07-09 19:01:45 -04:00
Tyler Stachecki 1e47020ccc rsp: Fix LQV bug (related to endianness). 2016-07-09 16:24:40 -04:00
Tyler Stachecki cae6b6de78 rsp: Fix LBV bug (related to endianness). 2016-07-09 16:14:27 -04:00
Tyler Stachecki 6d3cd1e0d0 rsp: Fix link PC result (12th bit should not get set). 2016-07-09 13:30:05 -04:00
Tyler Stachecki 91b18f2644 rsp: Implement CTC2. 2016-06-29 21:38:25 -04:00
Tyler J. Stachecki 9492bba954 Another MSVC build fix. 2016-06-26 17:23:48 -04:00
Tyler J. Stachecki 3288229a50 Start fixing MSVC builds.
Conflicts:
	rdp/n64video.c
2016-06-26 17:19:17 -04:00
Tyler J. Stachecki d905183b11 izy removed the LUT from bitwise operations.
In addition to removal of all memory accesses from the
functions, these functions also result in fewer executed
instructions in some cases.
2016-03-16 22:59:22 -04:00
Tyler J. Stachecki 3565a05f30 rsp: Use host byte ordering for ICACHE.
Up until the, the RSP was storing instruction words in big-
endian format. Thus, each fetch on an x86 host requires a
byteswap. This is wasteful, so use host byte ordering for
the ICACHE (as the VR4300 does now).
2016-02-27 19:13:50 -05:00
Tyler J. Stachecki 88c65ae630 Another great optimization from izy.
izy managed to remove another LUT used in add/sub related
insructions. The devil is in the details (see commit).

<new>:
00000000004006b0 <rsp_addsub_mask>:
  4006b0:       c1 ef 02                shr    $0x2,%edi
  4006b3:       19 c0                   sbb    %eax,%eax
  4006b5:       c3                      retq

<old>:
00000000004006d0 <rsp_addsub_mask>:
  4006d0:       83 e7 02                and    $0x2,%edi
  4006d3:       8b 04 bd 80 07 40 00    mov    0x400780(,%rdi,4),%eax
  4006da:       c3                      retq

"You see that this patch doesn't increase the amount of
instructions. They are always two/three/four instructions
and with automatic register selection. This is always the
case with a MOV from memory... you can load to any register,
but the same will happen with a SBB over itself. That is
also the reason why when the function is inlined it won't
require any special register (such as a the EAX:EDX pair,
the "cltd" instruction you see in the 32 bit code is only
a coincidence caused by the optimizations done by the gcc
and isn't mandatory).

The System V AMD64 calling convention puts the input
parameter in rdi, but wherever the selector is placed
nothing changes. The output parameter is in rax, but
MOV/SBB can work with any register when inlined.
2016-02-07 14:01:00 -05:00
Tyler J. Stachecki e12a459b18 More optimization patches from izy.
izy noticed that the branch LUT was generating memory moves
and could be replaced with an inlined function that coerces
gcc into generating a lea in its place:

  4005ac:       8d 1c 00                lea    (%rax,%rax,1),%ebx
  4005af:       c1 fb 1f                sar    $0x1f,%ebx
  4005b2:       f7 d3                   not    %ebx
(no memory access)

  4005b9:       c1 e8 1e                shr    $0x1e,%eax
  4005bc:       83 e0 01                and    $0x1,%eax
  4005bf:       44 8b 24 85 90 07 40    mov    0x400790(,%rax,4),%r12d
(original has memory access)

This ends up optimizing branch instructions quite nicely:

"You see that when you use "mask" you execute "~mask". The
compiler understands that ~(~(partial_mask)) = partial_mask
and removes both "NOTs". So in this case my version uses 2
instructions and no memory access/cache pollution."
2016-02-06 13:43:07 -05:00
Tyler J. Stachecki e2e72821e2 Try to reduce component cycle overheads.
Oftentimes, many of our countrollers are just doing a
simple countdown and don't perform any real work for the
cycle. Pull those parts out into headers so that the
compiler can 'see' that and optimize accordingly.
2016-01-30 14:58:31 -05:00
Tyler J. Stachecki 401811c33f Drop in atomics (required for multithreading). 2016-01-24 22:13:36 -05:00
Derek "Turtle" Roe 8b89df2fdc See long description
Replaced all references to simulation with emulation
Updated copyright year
Updated .gitignore to reduce chances of random files being uploaded to
the repo
Added .gitattributes to normalize all text files, and to ignore binary
files (which includes the logo and the NEC PDF)
2015-07-01 18:44:21 -05:00
Tyler J. Stachecki f4b182835c Various small optimizations. 2015-05-08 09:58:18 -04:00
Tyler Stachecki 1ba67eec9d Alignment/size optimizations. 2015-01-28 22:41:07 -05:00
Tyler Stachecki ca0b0c944d Vectorize/inline/optimize CFC2. 2015-01-27 10:28:36 -05:00
Tyler Stachecki 3cc07a7ae4 Unroll the top-level hot functions. 2015-01-22 14:31:54 -05:00
Tyler Stachecki 4b77d3ed61 RSP: Fix opcode cache bug. 2015-01-13 18:02:01 -05:00
Tyler Stachecki acd03ec4c6 RSP: Add an opcode cache for performance. 2015-01-09 23:22:39 -05:00
Tyler Stachecki 2c94219a9b RSP: Fix scalar load-use stall. 2015-01-09 23:22:32 -05:00
Tyler Stachecki 79b02e4702 RSP: Optimize memory requests slightly. 2015-01-09 23:22:26 -05:00
Tyler Stachecki 28196d2076 RSP: Optimize decoder/stall checks slightly. 2015-01-09 23:22:20 -05:00
Tyler Stachecki 321cf584f0 Remove some hacks from the RSP pipeline. 2015-01-08 12:17:06 -05:00
Tyler Stachecki cc3aff976c Add 64DD mappings and a controller. 2015-01-06 14:07:45 -05:00
Tyler Stachecki 028d8e673d Decoder optimization: drastically reduce size. 2015-01-06 11:39:36 -05:00
Tyler Stachecki efc4e38793 Remove an old, unused function. 2015-01-06 02:18:49 -05:00
Tyler Stachecki e63f8b08e3 Perform some really clever branch folding.
Fold all the integer loads and stores into one code path.
2015-01-06 02:18:31 -05:00
Tyler Stachecki ec3748f0c2 Trim off a few hundred bytes of code. 2015-01-05 22:59:52 -05:00
Tyler Stachecki c7a4a43242 Same as the last commit, but with the RSP. 2015-01-05 22:12:44 -05:00
Tyler Stachecki a648cedc87 More cleanup of the fault/TLB code. 2015-01-04 15:38:56 -05:00
Tyler Stachecki aa175bf6d6 Fix the JALR RSP bug, similar to last commit. 2015-01-04 12:18:03 -05:00
Tyler Stachecki c795c4ad2d Remove old function definitions. 2015-01-03 00:49:52 -05:00
Tyler Stachecki 2697ba9445 Merge more functions together. 2015-01-02 23:51:53 -05:00
Tyler Stachecki 1c8f871df8 Start merging RSP vector functions.
No need to separate all these functions when they contain so
much common code, so start combining things for the sake of
locality and predictor effectiveness (and size). In addition
to these benefits, the CPU backend is usually busy during the
execution of these functions, so suffering a misprediction
isn't as painful (especially seeing as we can potentially
improve the prediction from the indirect branch).
2015-01-02 22:21:32 -05:00
Tyler Stachecki c1f1998c78 Add an implementation for VMACU. 2015-01-02 21:04:44 -05:00
Tyler Stachecki 742ffc1493 Fix a series of RSP bugs that krom pointed out. 2015-01-01 21:13:41 -05:00
Tyler Stachecki 267d56491e Get the Windows build in running condition.
Conflicts:
	rdp/n64video.c
2015-01-01 15:00:53 -05:00
Tyler Stachecki b52962aa19 Fix RSP bug that arises on BREAK. 2015-01-01 10:46:48 -05:00