Unknown W. Brackets
167213c746
softgpu: Cache texture bufws at 16 bit.
...
Reducing the size of state a bit.
2022-09-12 21:57:00 -07:00
Unknown W. Brackets
90e009edb9
softgpu: Clamp/wrap textures at 512 pixels.
...
A texture larger than 512 is "valid", but simply wraps/clamps at 512.
Importantly, the texture coords are still calculated at the specified
size, which can be up to 32768.
2022-09-10 20:23:09 -07:00
Unknown W. Brackets
df1a91ee25
samplerjit: Correct nearest negative texture clamp.
...
Was not clamping to zero when negative.
2022-02-20 10:25:00 -08:00
Unknown W. Brackets
a88c9a0680
softgpu: Remove incorrect offsetting for X/Y.
2022-02-20 09:13:20 -08:00
Unknown W. Brackets
ad18833a4f
samplerjit: Fix non-SSE4 bugs in jit.
2022-02-15 20:13:38 -08:00
Unknown W. Brackets
99d7703d33
samplerjit: Precalculate DXT1/3/5 offsets.
...
This improves WALL-E by 8% overall.
2022-02-05 13:04:17 -08:00
Unknown W. Brackets
c91b51c8e1
samplerjit: Reduce DXT5 decode code size a bit.
2022-02-03 20:42:34 -08:00
Unknown W. Brackets
c2dd59084d
samplerjit: Optimize DXT calc using BMI2.
2022-02-01 00:18:56 -08:00
Unknown W. Brackets
3e4afe2a0c
samplerjit: Avoid RCX gymanstics with BMI2.
2022-01-31 22:33:09 -08:00
Unknown W. Brackets
4cadcea6da
samplerjit: Decode colors with BMI2.
...
This only happens with nearest, though, so very small benefit.
2022-01-31 22:05:34 -08:00
Unknown W. Brackets
1b2cf52bfe
samplerjit: Fix non-shared CLUT on Linux.
...
Oops, good that CI will catch this now - I've broken this more than once.
2022-01-29 22:20:46 -08:00
Unknown W. Brackets
26a8d498d7
samplerjit: Correct level lookup in nearest.
2022-01-29 20:29:43 -08:00
Unknown W. Brackets
3387ab1711
samplerjit: Fix reg corruption in DXT funcs.
...
We'd cache something in a reg, but it'd no longer be there.
2022-01-29 20:29:08 -08:00
Unknown W. Brackets
5976cad797
samplerjit: Reduce register waste.
...
A few registers were allocated longer than needed, which made requiring
stack more likely.
2022-01-29 09:47:06 -08:00
Unknown W. Brackets
eb70a90347
samplerjit: Avoid frac uv transfer to gen regs.
...
It should just stay in vec, this is more convenient anyway.
2022-01-28 23:50:54 -08:00
Unknown W. Brackets
99d6d569f0
samplerjit: Reduce transfers in nearest texel calc.
...
This benefits a few games, mostly where there's lots of UI or similar.
2022-01-24 21:28:04 -08:00
Unknown W. Brackets
c1e657ed47
samplerjit: Better vectorize UV linear calc.
...
Gives about 1-2% when mips are used.
2022-01-24 20:42:07 -08:00
Unknown W. Brackets
733046962f
samplerjit: Reuse XMM reg for sizes.
...
Gives just under 1% overall improvement in games using mips.
2022-01-24 19:01:23 -08:00
Unknown W. Brackets
d8c5c35b1a
samplerjit: Optimize texenv blending a bit.
...
This reduces to a single multiply, which is much faster.
2022-01-23 11:43:34 -08:00
Unknown W. Brackets
4262e657b4
samplerjit: Oops, forgot about 64 unpack.
...
Just a minor codegen tweak. Always forget there are more of these than
pack instructions.
2022-01-22 10:49:36 -08:00
Unknown W. Brackets
0425b8d630
samplerjit: Fix Linux stack corruption.
...
Oops, nearest was not using the red zone correctly.
2022-01-22 10:47:32 -08:00
Unknown W. Brackets
212e730e98
samplerjit: Fix some Linux register issues.
2022-01-22 00:14:15 -08:00
Unknown W. Brackets
6ec819878a
samplerjit: Reduce prolog/epilog spill.
...
Track reg usage so we only push/pop what we need.
2022-01-19 00:03:59 -08:00
Unknown W. Brackets
357e2e9d68
softjit: Simplify constant writes.
2022-01-19 00:03:59 -08:00
Unknown W. Brackets
c2985bca31
softjit: Centralize some common funcs from sampler.
...
No need to duplicate this code.
2022-01-19 00:03:59 -08:00
Unknown W. Brackets
0ba2d05da5
samplerjit: Simplify AVX shift-copies.
...
These have been the most common and the fallback is safe. Let's just add
a helper.
2022-01-17 15:15:36 -08:00
Unknown W. Brackets
d6fa301ab1
softgpu: Track CLUTs as states for binning.
...
This way we can have multiple CLUTs in process at once, which helps.
2022-01-16 08:14:09 -08:00
Unknown W. Brackets
edb79d968f
softgpu: Cache CLUT params in sampler state.
...
And now there's no more gstate for pixel drawing or sampling. Just a
little left in rasterization.
2022-01-15 18:09:09 -08:00
Unknown W. Brackets
c0e85e6170
softgpu: Move texenv color into sampler state.
2022-01-15 17:52:40 -08:00
Unknown W. Brackets
ad3635c82a
softgpu: Move tex size to cached state.
2022-01-15 17:22:43 -08:00
Unknown W. Brackets
b915a82c41
softgpu: Correct decal doubling without alpha.
2022-01-09 12:23:55 -08:00
Unknown W. Brackets
72aa4be879
samplerjit: Skip processing alpha if unused.
2022-01-09 12:23:55 -08:00
Unknown W. Brackets
fe0b3dbd01
samplerjit: Fix alpha for 565 in linear lookup.
2022-01-09 11:08:46 -08:00
Henrik Rydgård
f82f24a9bb
Merge pull request #15280 from unknownbrackets/samplerjit-dxt
...
Correct some recent regressions in samplerjit
2022-01-05 09:42:30 +01:00
Unknown W. Brackets
741a9b0a4d
samplerjit: Fix DXT compilation.
2022-01-05 00:00:03 -08:00
Unknown W. Brackets
19998976c7
samplerjit: Correct linear compile failure.
...
It was resetting to nullptr, because `nearest` was nullptr.
2022-01-04 23:58:07 -08:00
Unknown W. Brackets
2aa57679fa
softjit: Keep mip S/T calc in SIMD.
...
This is only a tiny bit faster, though.
2022-01-03 06:45:10 -08:00
Unknown W. Brackets
26e7768a67
samplerjit: Remove old linear nearest paths.
...
We only use it for DXT now, so let's not keep the dead code around.
2022-01-02 17:28:52 -08:00
Unknown W. Brackets
5e3bef7e14
samplerjit: Avoid gather if overread could crash.
...
This should be rare, but a game could easily shove a CLUT4 texture at the
end of VRAM, and then accessing the last index would segfault.
2022-01-02 17:28:52 -08:00
Unknown W. Brackets
7806dfddea
samplerjit: Use VPGATHERDD for all types.
2022-01-02 17:19:18 -08:00
Unknown W. Brackets
ce6ea8da11
samplerjit: Apply gather lookup to all CLUT4.
2022-01-02 17:19:18 -08:00
Unknown W. Brackets
22f770c828
samplerjit: Use VPGATHERDD for simple CLUT4 loads.
...
Planning to expand this to more paths.
2022-01-02 17:19:17 -08:00
Unknown W. Brackets
65c84d5dd5
samplerjit: Avoid a couple more copies in AVX.
...
From looking at assembly, just trying to keep it small.
2022-01-02 17:01:14 -08:00
Henrik Rydgård
c7062d7063
Merge pull request #15271 from unknownbrackets/samplerjit-color16
...
samplerjit: Decode colors in parallel
2022-01-02 17:55:46 +01:00
Henrik Rydgård
6fb5d82fe0
Merge pull request #15264 from unknownbrackets/samplerjit-vec
...
A couple more smaller samplerjit optimizations
2022-01-02 17:32:54 +01:00
Unknown W. Brackets
0eec4e7e4d
samplerjit: Decode colors in parallel.
...
Not used in a ton of games, but a decent improvement where it is used.
2022-01-02 08:27:55 -08:00
Unknown W. Brackets
7060035303
samplerjit: Implement nearest in jit.
...
This uses the tex func and similar within jit.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
91c9343e87
samplerjit: Refactor and reuse constant pool.
...
It's just here to be rip accessible, the fixed values can be output just
once.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
40240be91c
samplerjit: Update nearest args, temp disable jit.
...
This temporarily disables jit for nearest, but refactors to use the new
arg structure. It now matches linear.
2022-01-01 16:58:05 -08:00
Unknown W. Brackets
06e954fe2a
samplerjit: Create a separate fetch func.
...
This allows nearest to become more similar to linear, where it applies the
texture function.
2022-01-01 16:58:04 -08:00