Commit graph

2921 commits

Author SHA1 Message Date
Unknown W. Brackets
b85b0476b9 arm64jit: Correct vdot vec4 mapping. 2023-10-14 20:54:40 -07:00
Henrik Rydgård
64ee5675b8 Minor unrelated cleanup 2023-10-06 15:39:59 +02:00
Henrik Rydgård
0d06af87b6 Interpreter: Optimize ReadVector/WriteVector by removing voffset lookups
Drops these functions down the ranking of top functions by quite a bit in GTA,
speedup at most 0.5% though. But enough of these small ones and they
start adding up.

Not sure why GTA falls back to the interpreter for these so much though.
I guess some "uneaten" prefix..
2023-10-05 19:11:34 +02:00
Henrik Rydgård
60a304f29b Turn the ifs inside out 2023-10-05 18:59:56 +02:00
Henrik Rydgård
f21523ff74 WriteVector: Pluck transpose out of the loop 2023-10-05 18:56:15 +02:00
Henrik Rydgård
e852771480 Integrate the voffset shuffle in ReadVector 2023-10-05 18:52:50 +02:00
Henrik Rydgård
76f0c6cab4
Merge pull request #18305 from unknownbrackets/x86-ir-vcmp
x86jit: Fix IR vcmp all bit
2023-10-04 07:48:42 +02:00
Unknown W. Brackets
f1a9e39ce9 x86jit: Fix IR vcmp all bit. 2023-10-03 17:46:29 -07:00
Henrik Rydgård
7c184a7e1c
Merge pull request #18289 from fp64/sse2-vfpu-dot
Add SSE2 version of vfpu_dot
2023-10-03 10:39:10 +02:00
Unknown W. Brackets
521335cb2a x86: Fix 32-bit IR jit block entry. 2023-10-02 20:26:07 -07:00
fp64
49ac4c6774 Clarify 2023-10-02 14:05:49 -04:00
fp64
23e2d0f797 Add SSE2 version of vfpu_dot
See #18249. Speedup for this function ranges 10%..100%,
depending on system. Updated verification and speed measurements:
https://godbolt.org/z/W1z3sj6hz
2023-10-02 12:53:30 -04:00
Henrik Rydgård
db805cc4cc
Merge pull request #18282 from unknownbrackets/ir-compiling
Improve IR compilation performance
2023-10-01 11:34:27 +02:00
Henrik Rydgård
7bb7c2f28a
Merge pull request #18279 from unknownbrackets/arm64-ir-transfer
arm64jit: Implement reg lane transfers in IR
2023-10-01 11:31:19 +02:00
Unknown W. Brackets
cd46f0b4cb irjit: Cache IR metadata lookups.
This improves compilation performance, because all those lookups were
adding up.
2023-09-30 15:56:53 -07:00
Unknown W. Brackets
00c80cea6e irjit: Optimize offset logging during compile.
As I guessed, this was expensive.  using a vector and reserve isn't very.
It's nice to keep this before logBlocks_ is > 0, in case it's set mid
block.
2023-09-30 15:56:18 -07:00
Unknown W. Brackets
4e0761b104 irjit: Fix regcache disable for FPRs. 2023-09-30 15:54:54 -07:00
Unknown W. Brackets
4380bf9787 arm64jit: Optimize transfers to vec4 better. 2023-09-30 15:44:53 -07:00
Unknown W. Brackets
cb835295c8 arm64jit: Implement reg lane transfers. 2023-09-30 15:44:41 -07:00
Henrik Rydgård
5d8a0b3ac7
Merge pull request #18266 from unknownbrackets/ir-vtfm
irjit: Fix vhtfm instruction
2023-09-29 09:43:06 +02:00
Unknown W. Brackets
c92148ee2c irjit: Fix vhtfm instruction. 2023-09-28 21:16:54 -07:00
Henrik Rydgård
84d0236bf4 Comment fixes 2023-09-27 12:31:17 +02:00
Henrik Rydgård
4c0077fd84 Protect against weirdness in UnlinkBlocks (hopefully not needed after prev fix) 2023-09-27 12:31:17 +02:00
Henrik Rydgård
d6a8bfdf3e
Merge pull request #18249 from unknownbrackets/arm64jit-vcrsp
arm64jit: Avoid fused multiplies in vcrsp.t
2023-09-27 08:49:01 +02:00
Unknown W. Brackets
ded18ff237 arm64jit: Avoid fused multiplies in vcrsp.t.
With this change, issues in Harvest Moon with teleporting animals seem to
disappear.  It was causing some differences in signs of zeros in results,
and slightly different result values.
2023-09-26 20:09:02 -07:00
Henrik Rydgård
dd2b1ace88 BlockCache on ARM/ARM64: Allow two more exits 2023-09-26 19:44:05 +02:00
Henrik Rydgård
51d5026792 WriteExit: Assert on bad exit numbers 2023-09-26 19:39:48 +02:00
Henrik Rydgård
c0ee711cb9 In the FinalizeBlock assert, extract some more info 2023-09-26 13:37:40 +02:00
Henrik Rydgård
9fffa33eee
Merge pull request #18234 from unknownbrackets/x86-ir-transfer
x86jit: Perform vector transfers instead of flushing to memory
2023-09-26 09:28:05 +02:00
Unknown W. Brackets
38e5b33a53 x86jit: Prefer BLENDPS to INSERTPS.
It's faster, this performs better.
2023-09-25 22:12:48 -07:00
Henrik Rydgård
9f62a3f750
Merge pull request #18235 from unknownbrackets/ir-vdet
irjit: Handle VDet
2023-09-25 09:06:20 +02:00
Henrik Rydgård
51456980db
Merge pull request #18121 from unknownbrackets/jit-ir-profiler
IR: Add mini native jit MIPS block profiler
2023-09-25 09:04:55 +02:00
Unknown W. Brackets
9b2fa46861 IR: Add mini native jit MIPS block profiler. 2023-09-24 23:04:29 -07:00
Unknown W. Brackets
e104a28b71 irjit: Handle VDet. 2023-09-24 23:03:25 -07:00
Unknown W. Brackets
05786f5719 x86jit: Correct spill on IR lane extract. 2023-09-24 19:06:06 -07:00
Unknown W. Brackets
685d2acffe x86jit: Retain old lanes when there's space. 2023-09-24 17:31:25 -07:00
Unknown W. Brackets
46e704f879 x86jit: Cleanup and refactor transfer. 2023-09-24 16:58:41 -07:00
Unknown W. Brackets
d9f6bae1ff x64jit: Initial reg transfer. 2023-09-24 16:28:29 -07:00
Unknown W. Brackets
88b6442527 irjit: Add facility for native reg transfer. 2023-09-24 16:28:29 -07:00
Henrik Rydgård
06a1f0b72c
Merge pull request #18226 from unknownbrackets/x86-ir-breakpoints
x86jit: Improve memory breakpoint speed
2023-09-25 00:47:22 +02:00
Unknown W. Brackets
da013ee105 x86jit: Fix asm jitbase displacement check. 2023-09-24 12:11:00 -07:00
Unknown W. Brackets
7d0f2e43b6 irjit: Fix safety of kernel bit memory addresses. 2023-09-24 10:18:55 -07:00
Henrik Rydgård
2ba63c65f2
Merge pull request #18227 from unknownbrackets/x86-ir-flush
x86jit: Flush floats together if possible
2023-09-24 17:27:38 +02:00
Unknown W. Brackets
d36728e532 x86jit: Load common float vals from constants. 2023-09-24 08:01:08 -07:00
Unknown W. Brackets
decccf199a x86jit: Flush floats together if possible. 2023-09-24 08:01:05 -07:00
Unknown W. Brackets
9742aaaffe x86jit: Use MOVAPS directly when we can.
May help older processors or reduce total bytes.
2023-09-24 08:01:02 -07:00
Unknown W. Brackets
772b3ff7b8 arm64jit: Tweak memchecks. 2023-09-24 07:42:11 -07:00
Unknown W. Brackets
e433a8be4a arm64jit: Speed up memchecks, add validation. 2023-09-24 07:42:11 -07:00
Unknown W. Brackets
5929aaae85 x86jit: Speed up safe memory checks. 2023-09-24 07:06:57 -07:00
Unknown W. Brackets
017d0d4b17 x86jit: Improve memory breakpoint speed.
This helps a lot compared to before.
2023-09-24 07:06:57 -07:00