Commit graph

1160 commits

Author SHA1 Message Date
Unknown W. Brackets
e93c709f5c sofjit: Correctly poison memory.
Noticed this wasn't breakpoints when reviewing some assembly output.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
745c35f320 softjit: Small bloom optimization.
Another common case, src*dst + dst*0.  Can skip the add.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
355bad666c softjit: Optimize common case bloom blending.
Bloom often uses fixed ONE + ONE, which is a lot less work for us.  And
bloom often runs over and over again on pixels, so saving work is good.
2022-01-02 08:47:04 -08:00
Henrik Rydgård
6fb5d82fe0
Merge pull request #15264 from unknownbrackets/samplerjit-vec
A couple more smaller samplerjit optimizations
2022-01-02 17:32:54 +01:00
Unknown W. Brackets
496545e55c softgpu: Add code for tracking GPU writes.
Unfortunately, it has a pretty noticeable speed impact, even at the basic
"assume everything's written" level.  Compiled off by default, but at
least it's there.

Doesn't account for tests (i.e. alpha test skipping write) so still not
perfectly accurate.
2022-01-02 08:28:30 -08:00
Unknown W. Brackets
0eec4e7e4d samplerjit: Decode colors in parallel.
Not used in a ton of games, but a decent improvement where it is used.
2022-01-02 08:27:55 -08:00
Henrik Rydgård
cb1f26122d
Merge pull request #15269 from unknownbrackets/softgpu-opt
softgpu: Reduce interpolation if not needed
2022-01-02 09:47:19 +01:00
Henrik Rydgård
da38c027b5
Merge pull request #15268 from unknownbrackets/samplerjit-nearest
Implement nearest in samplerjit, like linear
2022-01-02 09:46:29 +01:00
Unknown W. Brackets
025ac99f2f softgpu: Reduce interpolation if not needed.
About 3% gain in some areas.
2022-01-01 18:34:04 -08:00
Unknown W. Brackets
7060035303 samplerjit: Implement nearest in jit.
This uses the tex func and similar within jit.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
91c9343e87 samplerjit: Refactor and reuse constant pool.
It's just here to be rip accessible, the fixed values can be output just
once.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
40240be91c samplerjit: Update nearest args, temp disable jit.
This temporarily disables jit for nearest, but refactors to use the new
arg structure.  It now matches linear.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
5f84de7de7 softjit: Small optimizations. 2022-01-01 16:58:04 -08:00
Unknown W. Brackets
06e954fe2a samplerjit: Create a separate fetch func.
This allows nearest to become more similar to linear, where it applies the
texture function.
2022-01-01 16:58:04 -08:00
Unknown W. Brackets
3bc6009158 samplerjit: Refactor sampler ID calculation.
Make it the same as pixel func IDs.
2022-01-01 16:58:04 -08:00
Unknown W. Brackets
d41e42d247 softgpu: Correct off-by-one scissor mask.
Fixes Brave Story in the software renderer.  Was overwriting display list
data in the stride gap.
2022-01-01 16:42:36 -08:00
Unknown W. Brackets
b35ca3d472 softgpu: Cleanup min/max tri range handling.
The previous looked like it had off by one errors.  This is simpler.
2022-01-01 16:42:36 -08:00
Unknown W. Brackets
e82fd3bd33 GPU: Avoid spline crashes on bad data.
If we get 0 prims, we can generate confusing index bounds and go out of
bounds.  Similarly, if we get a crazy number of control points and fail to
allocate, we can crash.
2022-01-01 16:40:59 -08:00
Unknown W. Brackets
12405709f0 softgpu: Skip processing scissored triangles.
If only one side was scissored (common), we might even put it on a thread,
which ended up as a lot of overhead.  Gives 3-4% improvement in some
places.
2022-01-01 16:40:34 -08:00
Unknown W. Brackets
6aec68aa5c samplerjit: Correct wrong bufw at mip levels.
Oops, was always using the base bufw.
2022-01-01 16:40:02 -08:00
Unknown W. Brackets
dbb015f427 samplerjit: Oops, fix Linux mipmap handling. 2022-01-01 16:40:02 -08:00
Unknown W. Brackets
8c31f1bb38 softjit: Fix regcache error when clearing.
Happens for non-through clears.
2022-01-01 16:40:01 -08:00
Unknown W. Brackets
8ea67b571b samplerjit: Tiny dependency optimizations.
This had a small but measureable impact (~0.3%.)
2021-12-31 08:11:57 -08:00
Unknown W. Brackets
fc3688d273 samplerjit: Small AVX optimization to modulate.
Only gives about 0.5% but it's still something.
2021-12-31 08:10:04 -08:00
Henrik Rydgård
244b0a86f6
Merge pull request #15262 from unknownbrackets/samplerjit-vec
samplerjit: Use SSSE3/SSE4 in linear filtering
2021-12-31 09:29:59 +01:00
Unknown W. Brackets
33e9841a4a softgpu: Skip zero size triangles.
These were drawing before, incorrectly, which caused artifacts.
Noticeable in Blade Dancer.
2021-12-31 00:20:12 -08:00
Unknown W. Brackets
1addf84e90 samplerjit: Use SSSE3/SSE4 in linear filtering. 2021-12-30 23:22:56 -08:00
Unknown W. Brackets
147b81d6f7 x64jit: Add AVX/AVX2 encodings.
Also fix the FMA double ones, which were passing W wrongly.
2021-12-29 19:46:26 -08:00
Unknown W. Brackets
4bd94a4e5e samplerjit: Pass funcs as an argument.
Seeing computing the ID in some profiles, so want to avoid computing per
thread/invocation.
2021-12-29 07:11:53 -08:00
Unknown W. Brackets
28cfbe0e5a samplerjit: Add an alternate profiling method.
This is more useful to group common operations together for profiling.
2021-12-29 07:11:39 -08:00
Unknown W. Brackets
3aedea89eb samplerjit: Correct level lookup offset. 2021-12-29 07:09:36 -08:00
Unknown W. Brackets
bf06342f9d samplerjit: Minor SSE4 optimizations.
These seem to be a bit faster.
2021-12-29 07:07:35 -08:00
Unknown W. Brackets
631706a8ba samplerjit: Set stackArgPos_ early.
Unfortunately, this has to match the value set lower...
2021-12-28 20:21:21 -08:00
Unknown W. Brackets
74eb450e76 samplerjit: Move texture function into jit.
Could do this also for nearest, might end up with a third set of functions
there for a direct sample lookup (for debug funcs.)
2021-12-28 17:52:17 -08:00
Unknown W. Brackets
940e6bb1d7 samplerjit: Lookup both mip tex values. 2021-12-28 16:22:54 -08:00
Unknown W. Brackets
6b55d328e5 samplerjit: Use regcache for linear filtering.
This makes it easier to reuse for mipmap filtering.
2021-12-28 15:37:25 -08:00
Unknown W. Brackets
cdf14c8579 samplerjit: Calculate mip level U/V/offsets.
Not actually doing the sampling for the second mip level in the single jit
pass yet, but close.
2021-12-28 14:12:58 -08:00
Unknown W. Brackets
a4558a5736 samplerjit: Take texptr/bufw as arrays.
Prep for moving mip map sampling into linear.
2021-12-28 12:04:16 -08:00
Unknown W. Brackets
4864850b3b samplerjit: Handle mipmap width/height in S/T calc. 2021-12-28 11:29:29 -08:00
Unknown W. Brackets
a84accf713 samplerjit: Move S/T calculation into jit.
Gives a pretty decent 5-10% improvement in many places.
2021-12-28 09:58:23 -08:00
Unknown W. Brackets
476dfdf731 samplerjit: Add more bits for S/T, skip multiply.
For now, we're not using those other bits yet.
2021-12-27 18:24:37 -08:00
Unknown W. Brackets
9cc0883d53 softgpu: Correct non-SSE T clamp. 2021-12-27 15:31:37 -08:00
Unknown W. Brackets
39d5b1c221 softgpu: Reduce mipmap fraction to 4 bits.
For CONST (and SLOPE with flat w), this produces accurate values.
SLOPE is still wrong in its handling of w, and AUTO seems to calculate
using a different and less accurate ramp.  But they both produce values
with 16 steps, in any case.
2021-12-27 11:37:33 -08:00
Unknown W. Brackets
d6b6ef4cb1 softgpu: Correct nearest filtering too.
Turns out to have the same behavior as linear, when it comes to the
subpixel offset.
2021-12-27 11:37:33 -08:00
Unknown W. Brackets
1dfaea9062 softgpu: Remove no longer possible report.
Also, it's known how this behaves, now.
2021-12-27 11:37:33 -08:00
Unknown W. Brackets
75f105f84b softgpu: Make linear filtering more accurate.
This matches tests for various u/v offsets and x/y subpixel offsets.
Mipmaps are probably still wrong.
2021-12-27 11:37:32 -08:00
Unknown W. Brackets
3cd19b02ac samplerjit: Handle unswizzled offsets too. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
820361f34b samplerjit: Calculate texel byte offset as vector. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
4d6a2f3919 samplerjit: Blend linear using integers. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
6f4e735757 samplerjit: Accumulate results in an XMM. 2021-12-27 11:37:32 -08:00