Commit graph

6950 commits

Author SHA1 Message Date
Joel Linn
e4ae1d8b2f [Base] Fix copy_and_swap_16_in_32_aligned 2022-01-22 16:18:54 +03:00
Joel Linn
0316d1a054 [Base] Tests for copy_and_swap_16_in_32_aligned 2022-01-22 16:18:54 +03:00
Joel Linn
4a288dc6bd [Base, aarch64] Add copy_and_swap NEON impls 2022-01-22 16:18:54 +03:00
Joel Linn
bfaad055a2 [Base] Add easier to debug copy_and_swap tests 2022-01-22 16:18:54 +03:00
Rick Gibbed
617b17e25b
[WinKey] Fix RThumbDown being mapped to RThumbLeft 2022-01-14 16:06:40 -06:00
Wunkolo
a9a365aa32 [x64] Add GFNI-based optimization for VECTOR_SHA_V128(Int8)
In the `Int8` case of `VECTOR_SHA_V128`, when all the values are the same, then a single-instruction `gf2p8affineqb` can be emitted that does an int8-based arithmetic-shift, utilizing GF(8) arithmetic.

More info here:
https://wunkolo.github.io/post/2020/11/gf2p8affineqb-int8-shifting/

As of now(Dec 2021): Tremont(Lakefield), Jasper Lake, Ice lake, Tigerlake, and Rocket Lake support GNFI.
2022-01-13 15:32:55 -06:00
Wunkolo
fba23e3e75 [x64] Add kX64EmitGFNI emitter feature-flag
This determines support for the `gf2p8affineqb` instruction. Even though `GFNI` is typically found with AVX512-enabled chips, it _is_ possible for there to be a chip with `GFNI` but does not support `AVX` or `AVX2` of any sort. An example of this is Tremont(Lakefield) chips as well as Jasper Lake.

13df339fe7/GenuineIntel/GenuineIntel00806A1_Lakefield_LC_InstLatX64.txt (L1297-L1299)

13df339fe7/GenuineIntel/GenuineIntel00906C0_JasperLake_InstLatX64.txt (L1252-L1254)
2022-01-13 15:32:55 -06:00
Wunkolo
5d1b53cd6f [x64] Add VECTOR_SHA_I8_SAME_CONSTANT unit test
This is to target the new GNFI-based optimization for the Int8 case.
2022-01-13 15:32:55 -06:00
Stefan Schmidt
31c9f026c5 [UI] Force use of Xwayland when running on Wayland 2022-01-12 17:37:54 +03:00
Enrico Pozzobon
5e31429128 [WinKey] Rebindable keyboard controls. 2022-01-11 12:38:13 -06:00
gibbed
5384e0e174 [Base] Fix MICROPROFILE_PRINTF. 2022-01-11 06:09:26 -06:00
gibbed
f4d60f3fc4 [XAM] Fix xeXMsgStartIORequestEx result check. 2022-01-11 06:09:06 -06:00
Wunkolo
233ed107fe [CPU] Remove use_haswell_instructions in favor of x64_extension_mask
Rather than having a single bool to conditionally detect haswell-level
instruction features. The granularity is increased with a new
`x64_extension_mask` where individual features within the x64 backend
can be turned on or off in a bit-mask manner. Since we have an ARM
backend on the horizon, I've added this to the new `x64`
configuration-group rather than `CPU`. This new pattern will hopefully
allow for testing to be more targetted to certain processor features and
allows the user to determine if they want certain features to be enabled
or disabled(such as avoiding BMI2 on certain AMD processors due to
pdep/pext being incredibly slow). The default configuration is to detect
and utilize all available features.
2022-01-11 03:57:32 -06:00
Wunkolo
37aa3d129c [x64] Explicitly handle AND_NOT dest == src1
This addresses a JIT-issue in the case that the `src1` and `dest`
register are both the same. This issue only happens in the "generic"
x86 path but not in the BMI1-accelerated path.

Thanks Rick for the extensive debugging help.

When `src1` and `dest` were the same, then the `addc` instruction at
`82099A08` in title `584108FF` might emit the following assembly:
```
.text:82099A08                 andc      r11, r10, r11
  |
  | Jitted
  |
  V
00000000A0011B15  mov         rbx,r10
00000000A0011B18  not         rbx
00000000A0011B1B  and         rbx,rbx
```

This was due to the src1 operand and the destination register being the
same, which used to call the "else" case in the x64 emitter when it
needs to be handled explicitly due to register aliasing/allocation.

Addresses issue #1945
2022-01-10 15:48:49 -06:00
gibbed
975eadf17e [Kernel] Assert export function return/arg types. 2022-01-09 14:16:37 -06:00
gibbed
12ec728989 [Kernel] Use tables for export groups. 2022-01-09 14:16:37 -06:00
gibbed
3ad0a7dab2 [Kernel] Suffix export functions with _entry. 2022-01-09 12:17:03 -06:00
Rick Gibbed
ce1a84375b
Remove FUNDING.yml.
File has been moved to organization-wide repository.

https://github.com/xenia-project/.github/FUNDING.yml
2022-01-09 12:06:34 -06:00
Triang3l
14b69fdb00 [GPU] vfetch_full fetching nothing still must calculate the address 2022-01-09 16:26:05 +03:00
Triang3l
d6188c5d7e [GPU] Reuse base+index*stride in vfetch_mini instead of reloading the index GPR
The wheel shader in 4D530910 does vfetch_full to r0 with the index from r0.x, and then vfetch_mini.
Thanks @Gliniak for the finding :3
Also small formatting cleanup in commented-out code.
2022-01-09 14:58:38 +03:00
gibbed
600c14b3f0 [xboxknrl] Implement ExTryToAcquireRWLShared.
[xboxknrl] Implement ExTryToAcquireReadWriteLockShared.
2022-01-07 10:22:48 -06:00
gibbed
1f9c434b5e [xboxkrnl] Implement ExAcquireRWLShared.
[xboxkrnl] Implement ExAcquireReadWriteLockShared.
2022-01-07 10:22:48 -06:00
gibbed
3162a6435c [xboxkrnl] Implement ExTryToAcquireRWLExclusive.
[xboxkrnl] Implement ExTryToAcquireReadWriteLockExclusive.
2022-01-07 10:22:48 -06:00
gibbed
e795337071 [xboxkrnl] ExReleaseReadWriteLock fixes.
[xboxkrnl] ExReleaseReadWriteLock fixes:
- Don't unncessarily double-load lock members.
- Reset readers entry count when lock count becomes negative.
- Properly decrease writers waiting count when writer event fired.
2022-01-07 10:22:48 -06:00
gibbed
b4f35635c5 [xboxkrnl] ExAcquireReadWriteLockExclusive fixes.
[xboxkrnl] ExAcquireReadWriteLockExclusive fixes:
- Don't unnecessarily double-load lock count.
- Don't release spin lock before we're done with the lock.
2022-01-07 10:22:48 -06:00
gibbed
fa774f1d86 [xboxkrnl] Fix up XexGetProcedureAddress logging.
[xboxkrnl] Fix up XexGetProcedureAddress failure logging.
2022-01-07 09:35:43 -06:00
Wunkolo
4303f6b200 [x64] Fix OPCODE_AND_NOT src1-constant case
Fix the the case where src1 is constant and src2 is non-constant causing
an assert due to trying to call `.constant()` on the src2 operand.
Interfaces with an issue Gliniak was encountering where title `4D53082D`
encounters an assert. Also includes a BMI1-acceleration in the 64-bit
case where a temporary register is needed(the `and` x86 instruction only
supports immediate constants up to 32-bits).
2022-01-06 13:00:58 -06:00
Gliniak
20fe7bc4b7 [Kernel/XMP] Send correct notification when playback controller is changed
- Changed locked into playback_client enumerator
- Changed vague notification name to something more descriptive
2022-01-04 16:22:57 -06:00
Gliniak
1ba4fbec17 [Kernel/XMP] Remove responsibility of stopping audio when controller is changed 2022-01-04 16:22:57 -06:00
Margen67
6ea8e043f3 [AppVeyor] Cleanup
Remove unneeded init:
 As far as I can tell this is a leftover from the appveyor.yml Reference: https://www.appveyor.com/docs/appveyor-yml/
Cleanup command blocks:
 cmd is the default shell, so it doesn't have to be specified.
 Use proper multi-line syntax for install.
Make configuration into one line:
 This reduces line count, but is mainly personal preference.
2022-01-04 16:18:28 -06:00
Wunkolo
24d4e1e0e5 [x64] Add BMI1-based acceleration for AndNot
In the case of having two register operands for `AndNot`, the `andn` instruction can be used when the host supports `BMI1`. `andn` only supports 32-bit and 64-bit operands, so some register up-casting is needed.
2022-01-04 16:16:49 -06:00
Wunkolo
3ab43d480d [x64] Add kX64EmitBMI1 feature-flag and detection
The `BMI1 feature` fits into the current pattern of `use_haswell_instructions` as BMI1 was only introduced in haswell.

Also moved the aliases to the end of the enum rather than interleave it with the bit definitions.
2022-01-04 16:16:49 -06:00
Wunkolo
0fdb855a11 [JIT, x64] Add and implement OPCODE_AND_NOT
Verified the x64 implementation using `xenia-cpu-ppc-tests`.
2022-01-04 16:16:49 -06:00
Joel Linn
4f258b2ee9 [GPU, Vulkan] Fix typo in non AMD64 code
* `copy_and_swap_16_unaligned` -> `copy_cmp_swap_16_unaligned`.
2022-01-02 16:47:05 -06:00
Joel Linn
59cce10ae1 [CI, Drone] Add Android NDK builds 2022-01-02 16:47:05 -06:00
Joel Linn
657645fb2c [CI, Drone] Add GCC builds
* Switch to starlark language to simplify configuration
* Use image `xeniaproject/buildenv:2022-01-01`
2022-01-02 16:47:05 -06:00
Joel Linn
b2e51fd24f [xenia-build] Update clang-format version to 13 2022-01-02 16:47:05 -06:00
Rick Gibbed
7fc93185f2 Replace premake5 binary with CI artifact. 2022-01-02 15:49:18 -06:00
Rick Gibbed
094c20bd4e Update premake-core. 2022-01-02 15:43:59 -06:00
Wunkolo
13a48e13bd [Base] Add operator<< string conversion for vec128_t
This allows `catch` to print out the contents of a particular vector when diagnosing how a `REQUIRE` expression has failed.
2022-01-02 15:14:58 -06:00
Wunkolo
f645c3ba31 [Base] Fix to_hex_string out-of-indexing for vec128_t type
Trying to print five `{:08X}` when vec128_t only has four values. 🥴
2022-01-02 15:14:58 -06:00
Wunkolo
5317907523 [x64] Add kX64EmitAVX512* feature-flags
Implements the detection of some baseline `AVX512` subsets and some common aliases into `X64EmitterFeatureFlags`.

So far, `AVX512{F,VL,BW,DQ}` are the only subsets of `AVX512` that are detected with this PR since I anticipate these are the ones that will actually be used a lot in the x64 backend. Some aliases are also implemented such as `kX64EmitAVX512Ortho` which is `AVX512F` and `AVX512VL` combined which are the two subsets of AVX512 required to allow for `AVX512` operations upon `ymm` and `xmm` registers.

These aliases can possibly be collapsed since we could just always require `AVX512VL` to be supported to allow for _any_ kind of `AVX512` to be used since we will practically always want to use `AVX512` on `xmm` registers at the very least as there is no use-case where we want to use the 512-bit `zmm` registers exclusively.
2022-01-02 11:52:31 -06:00
Wunkolo
1a8068b151 [Base] Add user-literals for several memory sizes
Rather than using `n * 1024 * 1024`, this adds a convenient `_MiB`/`_KiB` user-literal to the new `literals.h` header to concisely describe units of memory in a much more readable way. Any other useful literals can be added to this header. These literals exist in the `xe::literals` namespace so they are opt-in, similar to `std::chrono` literals, and require a `using namespace xe::literals` statement to utilize it within the current scope.

I've done a pass through the codebase to replace trivial instances of `1024 * 1024 * ...` expressions being used but avoided anything that added additional casting complexity from `size_t` to `uint32_t` and such to keep this commit concise.
2022-01-02 11:51:31 -06:00
Wunkolo
b64b4c6761 [x64] IsFeatureEnabled: Allow parallel feature checks
Just checking if the resulting mask is non-zero means we cannot allow this function to check for multiple features in parallel. A hypothetical computer that supports FMA but not AVX2 will return `true` if you try to call `IsFeatureEnabled(kX64EmitFMA | kX64EmitAVX2)`. We should make sure all the masked flags return `true` rather than check for non-zero.

This is ramping up to allow for particular subsets of AVX512 to be checked for in parallel with a single function call.
2021-12-28 20:57:32 -06:00
Gliniak
f2c0ae46c1 [Kernel] Added missing month to RtlTimeFieldsToTime
Additionally added check for highest possible month day
2021-12-22 15:02:25 +03:00
Triang3l
701300e8e9 [Linux] Use sched_yield instead of the deprecated pthread_yield 2021-12-18 19:43:17 +03:00
Triang3l
a950dff87d [Vulkan] Update SPIRV-Tools fork to fix Linux building issue 2021-12-17 13:49:36 +03:00
Triang3l
39890bab6f Merge branch 'master' into vulkan 2021-12-13 22:06:09 +03:00
Dr. Chat
509a1fa386 [GPU] Fix a crash when GetWindowTitleText is called before the texture cache is initialized 2021-12-12 22:51:24 -06:00
Triang3l
95c2101ca9 Merge branch 'master' into vulkan 2021-12-12 21:32:43 +03:00