Commit graph

326 commits

Author SHA1 Message Date
Unknown W. Brackets
a228b2ab6c softgpu: Use cached sampler state outside jit. 2022-01-15 15:26:26 -08:00
Unknown W. Brackets
a2abf9402b softgpu: Cache line drawing state. 2022-01-15 13:17:40 -08:00
Unknown W. Brackets
58455c8cf1 softgpu: Use cached state for clear write mask. 2022-01-15 13:03:11 -08:00
Unknown W. Brackets
092b03bd67 softgpu: Move fixed blend factor to draw pix state.
This is the last of the gstate.
2022-01-15 13:03:11 -08:00
Unknown W. Brackets
0b3f096c01 softgpu: Cache strides in draw pixel state. 2022-01-15 13:03:10 -08:00
Unknown W. Brackets
970e9c2f51 softgpu: Move threading into BinManager.
This threads much more effectively, across entire prim call.
2022-01-13 22:45:23 -08:00
Unknown W. Brackets
48ef4a18b1 softgpu: Handle scissor/range in BinManager. 2022-01-13 19:07:41 -08:00
Unknown W. Brackets
a0a9b1e89b softgpu: Add class to manage and enqueue for bins.
For now, just forwarding.
2022-01-13 09:26:59 -08:00
Unknown W. Brackets
6839aac109 Debugger: Cache list PC for softgpu tagging.
Still slow, but improved.
2022-01-12 21:23:49 -08:00
Unknown W. Brackets
d962fb35d3 softgpu: Centralize more prim drawing state. 2022-01-12 21:23:49 -08:00
Unknown W. Brackets
d06f17d27b softgpu: Move tex filter setting check to state. 2022-01-11 00:07:24 -08:00
Unknown W. Brackets
75ff3e44e6 softgpu: Move texture addresses to prim state. 2022-01-11 00:00:03 -08:00
Unknown W. Brackets
d5c5e9478e softgpu: Prepare more state per prim call. 2022-01-10 22:12:35 -08:00
Unknown W. Brackets
9ec7d65c49 softgpu: Use func IDs instead of gstate more. 2022-01-10 22:12:35 -08:00
Unknown W. Brackets
d7a82ab7b8 softgpu: Compute func IDs once per batch of verts.
This saves a decent chunk of time, especially when many verts are being
drawn.
2022-01-10 22:12:35 -08:00
Unknown W. Brackets
b915a82c41 softgpu: Correct decal doubling without alpha. 2022-01-09 12:23:55 -08:00
Henrik Rydgård
2d7a7fd34e
Merge pull request #15288 from unknownbrackets/softgpu-self
softgpu: Draw top left of rectangles first
2022-01-09 08:33:28 +01:00
Unknown W. Brackets
88ef2d1ac1 softgpu: Skip threading when rendering to self.
This will probably always be a problem to thread.
2022-01-08 21:05:08 -08:00
Unknown W. Brackets
8a00c2d233 GPU: Allow gcc/clang/icc runtime SSE4 usage.
All our builds before were only using SSE4 in jit...
2022-01-08 17:09:09 -08:00
Unknown W. Brackets
c7fc448869 softgpu: Use some SSE4 in triangle interpolation. 2022-01-08 11:38:07 -08:00
Unknown W. Brackets
3b1cc0d3b8 softgpu: Limit minX/maxX per line.
Only helps when single-threaded, though.
2022-01-08 10:04:52 -08:00
Unknown W. Brackets
7594187538 softgpu: Skip sample lookup if masked.
Was hoping making other things faster would make this unnecessary or
worse, but it hasn't seemed to.  This gives a pretty decent improvement in
most places (~4%.)
2022-01-02 13:47:14 -08:00
Unknown W. Brackets
a0fe4d06bf softgpu: Stop specializing on miplevels.
Now that samplerjit is processing mips, it no longer helps.  Just
complexity now.
2022-01-02 13:47:14 -08:00
Henrik Rydgård
d3f0af7458
Merge pull request #15273 from unknownbrackets/softjit-bloom
Optimize software renderer handling of common bloom operations
2022-01-02 18:11:07 +01:00
Unknown W. Brackets
355bad666c softjit: Optimize common case bloom blending.
Bloom often uses fixed ONE + ONE, which is a lot less work for us.  And
bloom often runs over and over again on pixels, so saving work is good.
2022-01-02 08:47:04 -08:00
Unknown W. Brackets
496545e55c softgpu: Add code for tracking GPU writes.
Unfortunately, it has a pretty noticeable speed impact, even at the basic
"assume everything's written" level.  Compiled off by default, but at
least it's there.

Doesn't account for tests (i.e. alpha test skipping write) so still not
perfectly accurate.
2022-01-02 08:28:30 -08:00
Henrik Rydgård
cb1f26122d
Merge pull request #15269 from unknownbrackets/softgpu-opt
softgpu: Reduce interpolation if not needed
2022-01-02 09:47:19 +01:00
Henrik Rydgård
da38c027b5
Merge pull request #15268 from unknownbrackets/samplerjit-nearest
Implement nearest in samplerjit, like linear
2022-01-02 09:46:29 +01:00
Unknown W. Brackets
025ac99f2f softgpu: Reduce interpolation if not needed.
About 3% gain in some areas.
2022-01-01 18:34:04 -08:00
Unknown W. Brackets
40240be91c samplerjit: Update nearest args, temp disable jit.
This temporarily disables jit for nearest, but refactors to use the new
arg structure.  It now matches linear.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
06e954fe2a samplerjit: Create a separate fetch func.
This allows nearest to become more similar to linear, where it applies the
texture function.
2022-01-01 16:58:04 -08:00
Unknown W. Brackets
d41e42d247 softgpu: Correct off-by-one scissor mask.
Fixes Brave Story in the software renderer.  Was overwriting display list
data in the stride gap.
2022-01-01 16:42:36 -08:00
Unknown W. Brackets
b35ca3d472 softgpu: Cleanup min/max tri range handling.
The previous looked like it had off by one errors.  This is simpler.
2022-01-01 16:42:36 -08:00
Unknown W. Brackets
12405709f0 softgpu: Skip processing scissored triangles.
If only one side was scissored (common), we might even put it on a thread,
which ended up as a lot of overhead.  Gives 3-4% improvement in some
places.
2022-01-01 16:40:34 -08:00
Unknown W. Brackets
33e9841a4a softgpu: Skip zero size triangles.
These were drawing before, incorrectly, which caused artifacts.
Noticeable in Blade Dancer.
2021-12-31 00:20:12 -08:00
Unknown W. Brackets
4bd94a4e5e samplerjit: Pass funcs as an argument.
Seeing computing the ID in some profiles, so want to avoid computing per
thread/invocation.
2021-12-29 07:11:53 -08:00
Unknown W. Brackets
74eb450e76 samplerjit: Move texture function into jit.
Could do this also for nearest, might end up with a third set of functions
there for a direct sample lookup (for debug funcs.)
2021-12-28 17:52:17 -08:00
Unknown W. Brackets
940e6bb1d7 samplerjit: Lookup both mip tex values. 2021-12-28 16:22:54 -08:00
Unknown W. Brackets
6b55d328e5 samplerjit: Use regcache for linear filtering.
This makes it easier to reuse for mipmap filtering.
2021-12-28 15:37:25 -08:00
Unknown W. Brackets
a4558a5736 samplerjit: Take texptr/bufw as arrays.
Prep for moving mip map sampling into linear.
2021-12-28 12:04:16 -08:00
Unknown W. Brackets
a84accf713 samplerjit: Move S/T calculation into jit.
Gives a pretty decent 5-10% improvement in many places.
2021-12-28 09:58:23 -08:00
Unknown W. Brackets
9cc0883d53 softgpu: Correct non-SSE T clamp. 2021-12-27 15:31:37 -08:00
Unknown W. Brackets
39d5b1c221 softgpu: Reduce mipmap fraction to 4 bits.
For CONST (and SLOPE with flat w), this produces accurate values.
SLOPE is still wrong in its handling of w, and AUTO seems to calculate
using a different and less accurate ramp.  But they both produce values
with 16 steps, in any case.
2021-12-27 11:37:33 -08:00
Unknown W. Brackets
d6b6ef4cb1 softgpu: Correct nearest filtering too.
Turns out to have the same behavior as linear, when it comes to the
subpixel offset.
2021-12-27 11:37:33 -08:00
Unknown W. Brackets
1dfaea9062 softgpu: Remove no longer possible report.
Also, it's known how this behaves, now.
2021-12-27 11:37:33 -08:00
Unknown W. Brackets
75f105f84b softgpu: Make linear filtering more accurate.
This matches tests for various u/v offsets and x/y subpixel offsets.
Mipmaps are probably still wrong.
2021-12-27 11:37:32 -08:00
Unknown W. Brackets
b00a66e34c samplerjit: Pass u/v coords as vector. 2021-12-27 11:37:32 -08:00
Unknown W. Brackets
3180e6c043 softgpu: Correct alpha on add + invalid texfuncs. 2021-12-05 16:28:37 -08:00
Unknown W. Brackets
325a1f75aa softgpu: Match texenv blend texfunc accurately. 2021-12-05 16:09:26 -08:00
Unknown W. Brackets
0b6e7c421f softgpu: Make decal tex func more accurate.
Tested for all values of A * B + 0 * (255 - B), as well as A * 127 + B *
(255 - 127), and matches accurately.  Spot checked other values, but not
exhaustively.
2021-12-05 13:34:19 -08:00