Commit graph

49 commits

Author SHA1 Message Date
Tyler Stachecki
1e86268eee rsp: Fix SQV and SRV (more endianness issues). 2016-07-09 19:38:26 -04:00
Tyler Stachecki
1e47020ccc rsp: Fix LQV bug (related to endianness). 2016-07-09 16:24:40 -04:00
Tyler Stachecki
cae6b6de78 rsp: Fix LBV bug (related to endianness). 2016-07-09 16:14:27 -04:00
Tyler Stachecki
6d3cd1e0d0 rsp: Fix link PC result (12th bit should not get set). 2016-07-09 13:30:05 -04:00
Tyler J. Stachecki
d905183b11 izy removed the LUT from bitwise operations.
In addition to removal of all memory accesses from the
functions, these functions also result in fewer executed
instructions in some cases.
2016-03-16 22:59:22 -04:00
Tyler J. Stachecki
88c65ae630 Another great optimization from izy.
izy managed to remove another LUT used in add/sub related
insructions. The devil is in the details (see commit).

<new>:
00000000004006b0 <rsp_addsub_mask>:
  4006b0:       c1 ef 02                shr    $0x2,%edi
  4006b3:       19 c0                   sbb    %eax,%eax
  4006b5:       c3                      retq

<old>:
00000000004006d0 <rsp_addsub_mask>:
  4006d0:       83 e7 02                and    $0x2,%edi
  4006d3:       8b 04 bd 80 07 40 00    mov    0x400780(,%rdi,4),%eax
  4006da:       c3                      retq

"You see that this patch doesn't increase the amount of
instructions. They are always two/three/four instructions
and with automatic register selection. This is always the
case with a MOV from memory... you can load to any register,
but the same will happen with a SBB over itself. That is
also the reason why when the function is inlined it won't
require any special register (such as a the EAX:EDX pair,
the "cltd" instruction you see in the 32 bit code is only
a coincidence caused by the optimizations done by the gcc
and isn't mandatory).

The System V AMD64 calling convention puts the input
parameter in rdi, but wherever the selector is placed
nothing changes. The output parameter is in rax, but
MOV/SBB can work with any register when inlined.
2016-02-07 14:01:00 -05:00
Tyler J. Stachecki
e12a459b18 More optimization patches from izy.
izy noticed that the branch LUT was generating memory moves
and could be replaced with an inlined function that coerces
gcc into generating a lea in its place:

  4005ac:       8d 1c 00                lea    (%rax,%rax,1),%ebx
  4005af:       c1 fb 1f                sar    $0x1f,%ebx
  4005b2:       f7 d3                   not    %ebx
(no memory access)

  4005b9:       c1 e8 1e                shr    $0x1e,%eax
  4005bc:       83 e0 01                and    $0x1,%eax
  4005bf:       44 8b 24 85 90 07 40    mov    0x400790(,%rax,4),%r12d
(original has memory access)

This ends up optimizing branch instructions quite nicely:

"You see that when you use "mask" you execute "~mask". The
compiler understands that ~(~(partial_mask)) = partial_mask
and removes both "NOTs". So in this case my version uses 2
instructions and no memory access/cache pollution."
2016-02-06 13:43:07 -05:00
Derek "Turtle" Roe
8b89df2fdc See long description
Replaced all references to simulation with emulation
Updated copyright year
Updated .gitignore to reduce chances of random files being uploaded to
the repo
Added .gitattributes to normalize all text files, and to ignore binary
files (which includes the logo and the NEC PDF)
2015-07-01 18:44:21 -05:00
Tyler J. Stachecki
f4b182835c Various small optimizations. 2015-05-08 09:58:18 -04:00
Tyler Stachecki
2c94219a9b RSP: Fix scalar load-use stall. 2015-01-09 23:22:32 -05:00
Tyler Stachecki
79b02e4702 RSP: Optimize memory requests slightly. 2015-01-09 23:22:26 -05:00
Tyler Stachecki
321cf584f0 Remove some hacks from the RSP pipeline. 2015-01-08 12:17:06 -05:00
Tyler Stachecki
efc4e38793 Remove an old, unused function. 2015-01-06 02:18:49 -05:00
Tyler Stachecki
e63f8b08e3 Perform some really clever branch folding.
Fold all the integer loads and stores into one code path.
2015-01-06 02:18:31 -05:00
Tyler Stachecki
a648cedc87 More cleanup of the fault/TLB code. 2015-01-04 15:38:56 -05:00
Tyler Stachecki
aa175bf6d6 Fix the JALR RSP bug, similar to last commit. 2015-01-04 12:18:03 -05:00
Tyler Stachecki
b52962aa19 Fix RSP bug that arises on BREAK. 2015-01-01 10:46:48 -05:00
Tyler Stachecki
8f17a516bc Fix a stray memory copy. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
3a969b2379 Do some general cleanup/optimization. 2014-12-26 14:19:46 -05:00
Tyler Stachecki
fea458e70c Add (partial) implementations for LPV/LUV/SPV/SUV.
Also, cleanup other SSSE3+ accelerated loads and stores.
2014-12-26 14:19:45 -05:00
Tyler Stachecki
9f9e3ebf80 Sort out a pair of RSP bugs. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
dc008abe77 Fix more show-stopping RSP bugs. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
173815ed63 Another bug: make sure memory requests get filled. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
645f4b06ea Minor cleanup to the RSP pipeline. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
6faca60054 Start reworking RSP vector loads and stores. 2014-12-26 14:19:45 -05:00
Tyler Stachecki
2ee295a671 Fix RSP DMEM accesses.
Up until now, the simulator assumed that DMEM accesses had to be
aligned (similarly to the VR4300). This is not actually the case,
so allow scalar memory access to arbitrary DMEM addresses.
2014-12-22 23:53:13 -05:00
Tyler Stachecki
c72f2c5028 Fix RSP alignment issues once and for all. 2014-12-19 20:03:03 -05:00
Tyler Stachecki
e89f054674 Optimize extremely aggressively.
Tell GCC to optimize cold functions for size and stash them away in
a separate part of the binary. Put the simulate core, meanwhile, on
the hot path. Also, bump optimization to -O3 as we can now "afford"
to do so.
2014-11-05 08:39:47 -05:00
Tyler Stachecki
89ecd417d8 Pack RSP results into a result structure. 2014-11-02 13:40:49 -05:00
Tyler Stachecki
c4612418ed Implement VINV, fixup INV. 2014-11-02 11:57:26 -05:00
Tyler Stachecki
6f54353825 Fix another incorrect RSP branch target. 2014-11-02 10:29:19 -05:00
Tyler Stachecki
aaf56a0928 Make sure RSP branch targets don't escape IMEM. 2014-11-02 09:35:50 -05:00
Tyler Stachecki
c522b7cab0 Some minor tweaks/fixes to the SU pipeline. 2014-10-25 17:11:45 -04:00
Tyler Stachecki
304f667674 Implement several LWC2/SWC2 opcodes. 2014-10-25 14:03:26 -04:00
Tyler Stachecki
b9b989131f More peephole optimizations. 2014-10-25 13:25:07 -04:00
Tyler Stachecki
0c64ae620b Combine SLL, SLLV function logic. 2014-10-25 13:01:20 -04:00
Tyler Stachecki
87986a5037 Cut some instructions from execution functions.
Extend a LUT by a couple of entries to avoid a shift at runtime.
2014-10-25 12:52:41 -04:00
Tyler Stachecki
85a21616cc Micro-optimization: faster li emulation.
If we think about how the assembler forms 32-bit immediates, it
usually generates a lui and addiu pair. Well, if can craft the
simulation such that lui and addiu are the same indirect target
when branching to execution functions, we can reduce the chance
that we'll mispredict and have a resulting pipeline flush on the
host.

Every cycle counts!
2014-10-25 12:40:27 -04:00
Tyler Stachecki
e698bfe1d1 Improving accuracy of RSP LWC2/SWC2 operations. 2014-10-25 02:06:30 -04:00
Tyler Stachecki
74327ef79e Compress LQV/SQV into one function. 2014-10-24 23:56:42 -04:00
Tyler Stachecki
ba2ca6f427 Fix more byte-ordering issues. This was hard. 2014-10-24 23:43:24 -04:00
Tyler Stachecki
e63b13605e Various LWC2/SWC2 fixes, add VSAR. 2014-10-24 21:07:25 -04:00
Tyler Stachecki
f395be631e Start adding in support for LWC2/SWC2 ops: LQV/SQV. 2014-10-24 18:31:13 -04:00
Tyler Stachecki
421b0e0519 Implement some RSP DMEM reads and writes. 2014-10-18 11:34:09 -04:00
Tyler Stachecki
4ff41a0e34 Fix DMA/interrupt issues with the RSP. 2014-10-18 11:34:02 -04:00
Tyler Stachecki
df68d13733 FIx some PC-related bugs in the RSP. 2014-10-18 11:33:56 -04:00
Tyler Stachecki
b421093700 Start fleshing out the RSP frontend. 2014-10-18 11:33:14 -04:00
Tyler Stachecki
7ac625cec1 Implement RSP DMAs, COP0 registers, etc. 2014-10-18 11:32:51 -04:00
Tyler Stachecki
440c51fef2 Add modified functions for RSP. 2014-10-18 11:32:43 -04:00