Commit graph

31450 commits

Author SHA1 Message Date
Unknown W. Brackets
ce6ea8da11 samplerjit: Apply gather lookup to all CLUT4. 2022-01-02 17:19:18 -08:00
Unknown W. Brackets
22f770c828 samplerjit: Use VPGATHERDD for simple CLUT4 loads.
Planning to expand this to more paths.
2022-01-02 17:19:17 -08:00
Unknown W. Brackets
65c84d5dd5 samplerjit: Avoid a couple more copies in AVX.
From looking at assembly, just trying to keep it small.
2022-01-02 17:01:14 -08:00
Henrik Rydgård
daf9e7020a
Merge pull request #15274 from unknownbrackets/softgpu-mask
softgpu: Skip sample lookup if masked
2022-01-02 23:30:51 +01:00
Unknown W. Brackets
7594187538 softgpu: Skip sample lookup if masked.
Was hoping making other things faster would make this unnecessary or
worse, but it hasn't seemed to.  This gives a pretty decent improvement in
most places (~4%.)
2022-01-02 13:47:14 -08:00
Unknown W. Brackets
a0fe4d06bf softgpu: Stop specializing on miplevels.
Now that samplerjit is processing mips, it no longer helps.  Just
complexity now.
2022-01-02 13:47:14 -08:00
Unknown W. Brackets
e4673a5fa4 softgpu: Separately profile verts and lighting. 2022-01-02 13:46:11 -08:00
Henrik Rydgård
d3f0af7458
Merge pull request #15273 from unknownbrackets/softjit-bloom
Optimize software renderer handling of common bloom operations
2022-01-02 18:11:07 +01:00
Henrik Rydgård
c07ca2d89d
Merge pull request #15272 from unknownbrackets/softgpu-meminfo
softgpu: Add code for tracking GPU writes
2022-01-02 18:09:16 +01:00
Henrik Rydgård
c7062d7063
Merge pull request #15271 from unknownbrackets/samplerjit-color16
samplerjit: Decode colors in parallel
2022-01-02 17:55:46 +01:00
Unknown W. Brackets
a259761262 samplerjit: Use nearest func in fast path too.
This uses the more optimal tex funcs.
2022-01-02 08:48:16 -08:00
Unknown W. Brackets
ba17f538d6 softjit: Avoid const temp registers.
Was trying to make sure register allocation was okay in the worst case.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
e93c709f5c sofjit: Correctly poison memory.
Noticed this wasn't breakpoints when reviewing some assembly output.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
745c35f320 softjit: Small bloom optimization.
Another common case, src*dst + dst*0.  Can skip the add.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
355bad666c softjit: Optimize common case bloom blending.
Bloom often uses fixed ONE + ONE, which is a lot less work for us.  And
bloom often runs over and over again on pixels, so saving work is good.
2022-01-02 08:47:04 -08:00
Henrik Rydgård
6fb5d82fe0
Merge pull request #15264 from unknownbrackets/samplerjit-vec
A couple more smaller samplerjit optimizations
2022-01-02 17:32:54 +01:00
Unknown W. Brackets
496545e55c softgpu: Add code for tracking GPU writes.
Unfortunately, it has a pretty noticeable speed impact, even at the basic
"assume everything's written" level.  Compiled off by default, but at
least it's there.

Doesn't account for tests (i.e. alpha test skipping write) so still not
perfectly accurate.
2022-01-02 08:28:30 -08:00
Unknown W. Brackets
0eec4e7e4d samplerjit: Decode colors in parallel.
Not used in a ton of games, but a decent improvement where it is used.
2022-01-02 08:27:55 -08:00
Henrik Rydgård
5b06dbce82
Merge pull request #15265 from unknownbrackets/debugger
Debugger: Correct delayed symbol listbox updates
2022-01-02 09:53:19 +01:00
Henrik Rydgård
cb1f26122d
Merge pull request #15269 from unknownbrackets/softgpu-opt
softgpu: Reduce interpolation if not needed
2022-01-02 09:47:19 +01:00
Henrik Rydgård
da38c027b5
Merge pull request #15268 from unknownbrackets/samplerjit-nearest
Implement nearest in samplerjit, like linear
2022-01-02 09:46:29 +01:00
Henrik Rydgård
827e1713d6
Merge pull request #15267 from unknownbrackets/softgpu-cleanup
Fix assortment of softgpu bugs
2022-01-02 09:45:37 +01:00
Unknown W. Brackets
025ac99f2f softgpu: Reduce interpolation if not needed.
About 3% gain in some areas.
2022-01-01 18:34:04 -08:00
Unknown W. Brackets
7060035303 samplerjit: Implement nearest in jit.
This uses the tex func and similar within jit.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
91c9343e87 samplerjit: Refactor and reuse constant pool.
It's just here to be rip accessible, the fixed values can be output just
once.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
40240be91c samplerjit: Update nearest args, temp disable jit.
This temporarily disables jit for nearest, but refactors to use the new
arg structure.  It now matches linear.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
5f84de7de7 softjit: Small optimizations. 2022-01-01 16:58:04 -08:00
Unknown W. Brackets
06e954fe2a samplerjit: Create a separate fetch func.
This allows nearest to become more similar to linear, where it applies the
texture function.
2022-01-01 16:58:04 -08:00
Unknown W. Brackets
3bc6009158 samplerjit: Refactor sampler ID calculation.
Make it the same as pixel func IDs.
2022-01-01 16:58:04 -08:00
Unknown W. Brackets
d41e42d247 softgpu: Correct off-by-one scissor mask.
Fixes Brave Story in the software renderer.  Was overwriting display list
data in the stride gap.
2022-01-01 16:42:36 -08:00
Unknown W. Brackets
b35ca3d472 softgpu: Cleanup min/max tri range handling.
The previous looked like it had off by one errors.  This is simpler.
2022-01-01 16:42:36 -08:00
Unknown W. Brackets
e82fd3bd33 GPU: Avoid spline crashes on bad data.
If we get 0 prims, we can generate confusing index bounds and go out of
bounds.  Similarly, if we get a crazy number of control points and fail to
allocate, we can crash.
2022-01-01 16:40:59 -08:00
Unknown W. Brackets
12405709f0 softgpu: Skip processing scissored triangles.
If only one side was scissored (common), we might even put it on a thread,
which ended up as a lot of overhead.  Gives 3-4% improvement in some
places.
2022-01-01 16:40:34 -08:00
Unknown W. Brackets
6aec68aa5c samplerjit: Correct wrong bufw at mip levels.
Oops, was always using the base bufw.
2022-01-01 16:40:02 -08:00
Unknown W. Brackets
dbb015f427 samplerjit: Oops, fix Linux mipmap handling. 2022-01-01 16:40:02 -08:00
Unknown W. Brackets
c31f746896 PPGe: Disable dither in UI drawing.
We perform it in software, but it looks bad.
2022-01-01 16:40:01 -08:00
Unknown W. Brackets
8c31f1bb38 softjit: Fix regcache error when clearing.
Happens for non-through clears.
2022-01-01 16:40:01 -08:00
Unknown W. Brackets
0ad4502dd2 GPU: Fix bone matrix CALL opt corruption.
If the matrix number is high and we have extra CALLs, we can't load it
into the memory after the bone matrixes.
2022-01-01 16:40:01 -08:00
Unknown W. Brackets
85b7b221be Debugger: Correct delayed symbol listbox updates.
With the dialogs no longer created on start, this message wasn't coming
through.
2021-12-31 09:10:40 -08:00
Unknown W. Brackets
8ea67b571b samplerjit: Tiny dependency optimizations.
This had a small but measureable impact (~0.3%.)
2021-12-31 08:11:57 -08:00
Unknown W. Brackets
fc3688d273 samplerjit: Small AVX optimization to modulate.
Only gives about 0.5% but it's still something.
2021-12-31 08:10:04 -08:00
Henrik Rydgård
244b0a86f6
Merge pull request #15262 from unknownbrackets/samplerjit-vec
samplerjit: Use SSSE3/SSE4 in linear filtering
2021-12-31 09:29:59 +01:00
Henrik Rydgård
879e2842db
Merge pull request #15263 from unknownbrackets/softgpu
softgpu: Skip zero size triangles
2021-12-31 09:28:15 +01:00
Unknown W. Brackets
33e9841a4a softgpu: Skip zero size triangles.
These were drawing before, incorrectly, which caused artifacts.
Noticeable in Blade Dancer.
2021-12-31 00:20:12 -08:00
Unknown W. Brackets
1addf84e90 samplerjit: Use SSSE3/SSE4 in linear filtering. 2021-12-30 23:22:56 -08:00
Henrik Rydgård
fd540f1780
Merge pull request #15261 from unknownbrackets/x64-avx
Add AVX/AVX2 instructions to the x64 emitter
2021-12-30 10:31:32 +01:00
Unknown W. Brackets
7aa9664d20 x64jit: Add AVX2-only instructions. 2021-12-29 19:46:26 -08:00
Unknown W. Brackets
7508fcc22d x64jit: Add AVX-only instructions. 2021-12-29 19:46:26 -08:00
Unknown W. Brackets
147b81d6f7 x64jit: Add AVX/AVX2 encodings.
Also fix the FMA double ones, which were passing W wrongly.
2021-12-29 19:46:26 -08:00
Henrik Rydgård
08e1677d7c
Merge pull request #15260 from unknownbrackets/samplerjit-vec
Correct level bug in samplerjit, minor SSE4 opt
2021-12-29 21:42:01 +01:00