Commit graph

146 commits

Author SHA1 Message Date
pseudophpt
643e4a028a Add shuffling to VMOV instruction 2018-09-03 22:52:18 -04:00
Simon Eriksson
35f15f8db4 rsp: Ignore highest bit of RSP CP0 register number. 2016-10-08 20:56:26 +02:00
Tyler Stachecki
1e86268eee rsp: Fix SQV and SRV (more endianness issues). 2016-07-09 19:38:26 -04:00
Tyler Stachecki
ab2c932aaf rsp: Fix SP->RDRAM stride bug.
krom spotted this one using his upcoming GB emulator.
2016-07-09 19:01:45 -04:00
Tyler Stachecki
1e47020ccc rsp: Fix LQV bug (related to endianness). 2016-07-09 16:24:40 -04:00
Tyler Stachecki
cae6b6de78 rsp: Fix LBV bug (related to endianness). 2016-07-09 16:14:27 -04:00
Tyler Stachecki
6d3cd1e0d0 rsp: Fix link PC result (12th bit should not get set). 2016-07-09 13:30:05 -04:00
Tyler Stachecki
91b18f2644 rsp: Implement CTC2. 2016-06-29 21:38:25 -04:00
Tyler J. Stachecki
9492bba954 Another MSVC build fix. 2016-06-26 17:23:48 -04:00
Tyler J. Stachecki
3288229a50 Start fixing MSVC builds.
Conflicts:
	rdp/n64video.c
2016-06-26 17:19:17 -04:00
Tyler J. Stachecki
d905183b11 izy removed the LUT from bitwise operations.
In addition to removal of all memory accesses from the
functions, these functions also result in fewer executed
instructions in some cases.
2016-03-16 22:59:22 -04:00
Tyler J. Stachecki
3565a05f30 rsp: Use host byte ordering for ICACHE.
Up until the, the RSP was storing instruction words in big-
endian format. Thus, each fetch on an x86 host requires a
byteswap. This is wasteful, so use host byte ordering for
the ICACHE (as the VR4300 does now).
2016-02-27 19:13:50 -05:00
Tyler J. Stachecki
88c65ae630 Another great optimization from izy.
izy managed to remove another LUT used in add/sub related
insructions. The devil is in the details (see commit).

<new>:
00000000004006b0 <rsp_addsub_mask>:
  4006b0:       c1 ef 02                shr    $0x2,%edi
  4006b3:       19 c0                   sbb    %eax,%eax
  4006b5:       c3                      retq

<old>:
00000000004006d0 <rsp_addsub_mask>:
  4006d0:       83 e7 02                and    $0x2,%edi
  4006d3:       8b 04 bd 80 07 40 00    mov    0x400780(,%rdi,4),%eax
  4006da:       c3                      retq

"You see that this patch doesn't increase the amount of
instructions. They are always two/three/four instructions
and with automatic register selection. This is always the
case with a MOV from memory... you can load to any register,
but the same will happen with a SBB over itself. That is
also the reason why when the function is inlined it won't
require any special register (such as a the EAX:EDX pair,
the "cltd" instruction you see in the 32 bit code is only
a coincidence caused by the optimizations done by the gcc
and isn't mandatory).

The System V AMD64 calling convention puts the input
parameter in rdi, but wherever the selector is placed
nothing changes. The output parameter is in rax, but
MOV/SBB can work with any register when inlined.
2016-02-07 14:01:00 -05:00
Tyler J. Stachecki
e12a459b18 More optimization patches from izy.
izy noticed that the branch LUT was generating memory moves
and could be replaced with an inlined function that coerces
gcc into generating a lea in its place:

  4005ac:       8d 1c 00                lea    (%rax,%rax,1),%ebx
  4005af:       c1 fb 1f                sar    $0x1f,%ebx
  4005b2:       f7 d3                   not    %ebx
(no memory access)

  4005b9:       c1 e8 1e                shr    $0x1e,%eax
  4005bc:       83 e0 01                and    $0x1,%eax
  4005bf:       44 8b 24 85 90 07 40    mov    0x400790(,%rax,4),%r12d
(original has memory access)

This ends up optimizing branch instructions quite nicely:

"You see that when you use "mask" you execute "~mask". The
compiler understands that ~(~(partial_mask)) = partial_mask
and removes both "NOTs". So in this case my version uses 2
instructions and no memory access/cache pollution."
2016-02-06 13:43:07 -05:00
Tyler J. Stachecki
e2e72821e2 Try to reduce component cycle overheads.
Oftentimes, many of our countrollers are just doing a
simple countdown and don't perform any real work for the
cycle. Pull those parts out into headers so that the
compiler can 'see' that and optimize accordingly.
2016-01-30 14:58:31 -05:00
Tyler J. Stachecki
401811c33f Drop in atomics (required for multithreading). 2016-01-24 22:13:36 -05:00
Derek "Turtle" Roe
8b89df2fdc See long description
Replaced all references to simulation with emulation
Updated copyright year
Updated .gitignore to reduce chances of random files being uploaded to
the repo
Added .gitattributes to normalize all text files, and to ignore binary
files (which includes the logo and the NEC PDF)
2015-07-01 18:44:21 -05:00
Tyler J. Stachecki
f4b182835c Various small optimizations. 2015-05-08 09:58:18 -04:00
Tyler Stachecki
1ba67eec9d Alignment/size optimizations. 2015-01-28 22:41:07 -05:00
Tyler Stachecki
ca0b0c944d Vectorize/inline/optimize CFC2. 2015-01-27 10:28:36 -05:00
Tyler Stachecki
3cc07a7ae4 Unroll the top-level hot functions. 2015-01-22 14:31:54 -05:00
Tyler Stachecki
4b77d3ed61 RSP: Fix opcode cache bug. 2015-01-13 18:02:01 -05:00
Tyler Stachecki
acd03ec4c6 RSP: Add an opcode cache for performance. 2015-01-09 23:22:39 -05:00
Tyler Stachecki
2c94219a9b RSP: Fix scalar load-use stall. 2015-01-09 23:22:32 -05:00
Tyler Stachecki
79b02e4702 RSP: Optimize memory requests slightly. 2015-01-09 23:22:26 -05:00
Tyler Stachecki
28196d2076 RSP: Optimize decoder/stall checks slightly. 2015-01-09 23:22:20 -05:00
Tyler Stachecki
321cf584f0 Remove some hacks from the RSP pipeline. 2015-01-08 12:17:06 -05:00
Tyler Stachecki
cc3aff976c Add 64DD mappings and a controller. 2015-01-06 14:07:45 -05:00
Tyler Stachecki
028d8e673d Decoder optimization: drastically reduce size. 2015-01-06 11:39:36 -05:00
Tyler Stachecki
efc4e38793 Remove an old, unused function. 2015-01-06 02:18:49 -05:00
Tyler Stachecki
e63f8b08e3 Perform some really clever branch folding.
Fold all the integer loads and stores into one code path.
2015-01-06 02:18:31 -05:00
Tyler Stachecki
ec3748f0c2 Trim off a few hundred bytes of code. 2015-01-05 22:59:52 -05:00
Tyler Stachecki
c7a4a43242 Same as the last commit, but with the RSP. 2015-01-05 22:12:44 -05:00
Tyler Stachecki
a648cedc87 More cleanup of the fault/TLB code. 2015-01-04 15:38:56 -05:00
Tyler Stachecki
aa175bf6d6 Fix the JALR RSP bug, similar to last commit. 2015-01-04 12:18:03 -05:00
Tyler Stachecki
c795c4ad2d Remove old function definitions. 2015-01-03 00:49:52 -05:00
Tyler Stachecki
2697ba9445 Merge more functions together. 2015-01-02 23:51:53 -05:00
Tyler Stachecki
1c8f871df8 Start merging RSP vector functions.
No need to separate all these functions when they contain so
much common code, so start combining things for the sake of
locality and predictor effectiveness (and size). In addition
to these benefits, the CPU backend is usually busy during the
execution of these functions, so suffering a misprediction
isn't as painful (especially seeing as we can potentially
improve the prediction from the indirect branch).
2015-01-02 22:21:32 -05:00
Tyler Stachecki
c1f1998c78 Add an implementation for VMACU. 2015-01-02 21:04:44 -05:00
Tyler Stachecki
742ffc1493 Fix a series of RSP bugs that krom pointed out. 2015-01-01 21:13:41 -05:00
Tyler Stachecki
267d56491e Get the Windows build in running condition.
Conflicts:
	rdp/n64video.c
2015-01-01 15:00:53 -05:00
Tyler Stachecki
b52962aa19 Fix RSP bug that arises on BREAK. 2015-01-01 10:46:48 -05:00
Tyler Stachecki
e100147379 Add register-caching version of VCH.
Thanks go out to AIO for rounding out this commit with
his optimized SSE2 variant.
2015-01-01 10:46:41 -05:00
Tyler Stachecki
5e313634d3 Enable register-caching on MinGW.
Use a prelude to get around Microsoft's stupid calling convention.
2015-01-01 10:46:10 -05:00
Tyler Stachecki
b6f0d0ec58 Set initial values for VCC/VCO/VCE.
Thanks, krom!
2015-01-01 10:45:45 -05:00
Tyler Stachecki
94ad149a12 Actually enable the register caching...
And fix a lot of bugs introduced with a regex.
2015-01-01 10:44:47 -05:00
Tyler Stachecki
7bc95ee3ee Implement register-caching version of VLT. 2015-01-01 10:44:40 -05:00
Tyler Stachecki
9b941eced8 Change RSP calling convention.
pblendvb needs the mask in %xmm0, so change the calling convention
around just enough so we can cut out a movdqa from most instructions.
2015-01-01 10:44:34 -05:00
Tyler Stachecki
4aabd7f49e Minor tweaks to VEQ/VNE register-cached versions. 2015-01-01 10:44:16 -05:00
Tyler Stachecki
e810689fde Implement register-caching versions of VGE. 2015-01-01 10:44:09 -05:00