Commit graph

277 commits

Author SHA1 Message Date
ReinUsesLisp
389cb51a33 shader/decode: Fix constant buffer offsets
Some instances were using cbuf34.offset instead of cbuf34.GetOffset().
This returned the an invalid offset. Address those instances and rename
offset to "shifted_offset" to avoid future bugs.
2020-02-05 12:19:09 -03:00
bunnei
a683c3c57c Merge pull request #3357 from ReinUsesLisp/bfi-rc
shader/bfi: Implement register-constant buffer variant
2020-02-04 15:14:13 -05:00
bunnei
223e535f65 Merge pull request #3356 from ReinUsesLisp/fcmp
shader/arithmetic: Implement FCMP
2020-02-04 11:36:59 -05:00
ReinUsesLisp
ca61e82f85 shader/bfi: Implement register-constant buffer variant
It's the same as the variant that was implemented, but it takes the
operands from another source.
2020-01-27 01:20:38 -03:00
ReinUsesLisp
098218ff4c shader/arithmetic: Implement FCMP
Compares the third operand with zero, then selects between the first and
second.
2020-01-27 01:15:44 -03:00
ReinUsesLisp
0d8f0ad3b3 shader/memory: Implement ATOM.ADD
ATOM operates atomically on global memory. For now only add ATOM.ADD
since that's what was found in commercial games.

This asserts for ATOM.ADD.S32 (handling the others as unimplemented),
although ATOM.ADD.U32 shouldn't be any different.

This change forces us to change the default type on SPIR-V storage
buffers from float to uint. We could also alias the buffers, but it's
simpler for now to just use uint. While we are at it, abstract the code
to avoid repetition.
2020-01-26 01:54:24 -03:00
ReinUsesLisp
c4fd02b47f shader/memory: Implement ATOMS.ADD.U32 2020-01-16 17:30:55 -03:00
bunnei
0f250f4a1f Merge pull request #3239 from ReinUsesLisp/p2r
shader/p2r: Implement P2R Pr
2019-12-31 20:37:16 -05:00
bunnei
9d254bf997 Merge pull request #3228 from ReinUsesLisp/ptp
shader/texture: Implement AOFFI and PTP for TLD4 and TLD4S
2019-12-26 21:43:44 -05:00
ReinUsesLisp
0544c7ebce shader/r2p: Refactor P2R to support P2R 2019-12-20 17:55:42 -03:00
ReinUsesLisp
6acbec5666 shader_bytecode: Fix TLD4S encoding 2019-12-17 23:32:10 -03:00
ReinUsesLisp
ac847a8cca shader/texture: Implement TLD4.PTP 2019-12-16 04:09:24 -03:00
Fernando Sahmkow
e47f66ac4b Shader_Ir: Correct TLD4S encoding and implement f16 flag. 2019-12-11 19:53:17 -04:00
ReinUsesLisp
6e95568616 shader: Implement MEMBAR.GL
Implement using memoryBarrier in GLSL and OpMemoryBarrier on SPIR-V.
2019-12-10 16:45:03 -03:00
ReinUsesLisp
243a33aba9 shader_ir/memory: Implement patch stores 2019-12-09 23:25:21 -03:00
ReinUsesLisp
959ac10dc8 shader_bytecode: Remove corrupted character 2019-12-06 20:31:56 -03:00
Fernando Sahmkow
206d13c987 Shader_IR: Implement TXD instruction. 2019-11-14 11:15:27 -04:00
Fernando Sahmkow
6267529837 Shader_IR: Implement FLO instruction. 2019-11-14 11:15:27 -04:00
Fernando Sahmkow
cb07d60362 Shader_Bytecode: Add encodings for FLO, SHF and TXD 2019-11-14 11:15:26 -04:00
Fernando Sahmkow
dfaeb0a97d Merge pull request #3081 from ReinUsesLisp/fswzadd-shuffles
shader: Implement FSWZADD and reimplement SHFL
2019-11-14 10:27:27 -04:00
ReinUsesLisp
905cc250a4 video_core: Silence implicit conversion warnings 2019-11-08 22:48:50 +00:00
ReinUsesLisp
bb94bcc991 shader_ir/warp: Implement FSWZADD 2019-11-07 20:08:41 -03:00
Fernando Sahmkow
d65eed3b61 Shader_IR: Fix TLD4 and add Bindless Variant.
This commit fixes an issue where not all 4 results of tld4 were being
written, the color component was defaulted to red, among other things.
It also implements the bindless variant.
2019-10-30 12:02:03 -04:00
Lioncash
f1443d2b41 shader_bytecode: Make Matcher constexpr capable
Greatly shrinks the amount of generated code for GetDecodeTable().

Collapses an assembly output of 9000+ lines down to ~3621 with Clang,
and 6513 down to ~2616 with GCC, given it's now allowed to construct all
the entries as a sequence of constant data.
2019-10-24 01:10:10 -04:00
bunnei
6deb6d2b10 Merge pull request #2869 from ReinUsesLisp/suld
shader/image: Implement SULD and fix SUATOM
2019-09-23 21:47:03 -04:00
Rodrigo Locatti
e33b9e3e6f Merge pull request #2878 from FernandoS27/icmp
shader_ir: Implement ICMP
2019-09-21 18:06:07 -03:00
ReinUsesLisp
79a7463f4c gl_shader_decompiler: Use uint for images and fix SUATOM
In the process remove implementation of SUATOM.MIN and SUATOM.MAX as
these require a distinction between U32 and S32. These have to be
implemented with imageCompSwap loop.
2019-09-21 17:33:52 -03:00
ReinUsesLisp
331d140bb4 shader/image: Implement SULD and remove irrelevant code
* Implement SULD as float.
* Remove conditional declaration of GL_ARB_shader_viewport_layer_array.
2019-09-21 17:32:48 -03:00
ReinUsesLisp
dfe69a7f19 shader_bytecode: Add SULD encoding 2019-09-21 17:31:46 -03:00
Fernando Sahmkow
f02b9d37f0 Shader_IR: ICMP corrections and fixes 2019-09-21 14:28:03 -04:00
Fernando Sahmkow
01b8a78a8a Shader_IR: Implement ICMP. 2019-09-19 20:56:29 -04:00
ReinUsesLisp
42815d1d24 shader_ir/warp: Implement SHFL 2019-09-17 17:44:07 -03:00
ReinUsesLisp
2e6bebb3d2 shader/image: Implement SUATOM and fix SUST 2019-09-10 20:22:31 -03:00
ReinUsesLisp
9b001821d9 shader/shift: Implement SHR wrapped and clamped variants
Nvidia defaults to wrapped shifts, but this is undefined behaviour on
OpenGL's spec. Explicitly mask/clamp according to what the guest shader
requires.
2019-09-04 01:55:24 -03:00
bunnei
3df0f440fd Merge pull request #2812 from ReinUsesLisp/f2i-selector
shader_ir/conversion: Implement F2I and F2F F16 selector
2019-09-03 22:35:33 -04:00
bunnei
4ae7f81090 Merge pull request #2811 from ReinUsesLisp/fsetp-fix
float_set_predicate: Add missing negation bit for the second operand
2019-09-03 22:34:34 -04:00
ReinUsesLisp
6f134adf2a shader_ir/conversion: Split int and float selector and implement F2F H1 2019-08-28 16:09:33 -03:00
ReinUsesLisp
d9ad389777 shader_ir/conversion: Implement F2I F16 Ra.H1 2019-08-27 23:40:40 -03:00
ReinUsesLisp
d490cc5285 float_set_predicate: Add missing negation bit for the second operand 2019-08-27 21:57:43 -03:00
ReinUsesLisp
67f47b2f6a shader_ir: Implement VOTE
Implement VOTE using Nvidia's intrinsics. Documentation about these can
be found here
https://developer.nvidia.com/reading-between-threads-shader-intrinsics

Instead of using portable ARB instructions I opted to use Nvidia
intrinsics because these are the closest we have to how Tegra X1
hardware renders.

To stub VOTE on non-Nvidia drivers (including nouveau) this commit
simulates a GPU with a warp size of one, returning what is meaningful
for the instruction being emulated:

* anyThreadNV(value) -> value
* allThreadsNV(value) -> value
* allThreadsEqualNV(value) -> true

ballotARB, also known as "uint64_t(activeThreadsNV())", emits

VOTE.ANY Rd, PT, PT;

on nouveau's compiler. This doesn't match exactly to Nvidia's code

VOTE.ALL Rd, PT, PT;

Which is emulated with activeThreadsNV() by this commit. In theory this
shouldn't really matter since .ANY, .ALL and .EQ affect the predicates
(set to PT on those cases) and not the registers.
2019-08-21 14:50:38 -03:00
bunnei
0d754d7a75 Merge pull request #2753 from FernandoS27/float-convert
Shader_Ir: Implement F16 Variants of F2F, F2I, I2F.
2019-08-21 10:27:57 -04:00
ReinUsesLisp
b6272eb8e2 shader_ir: Implement NOP 2019-08-04 03:02:55 -03:00
Fernando Sahmkow
9a0fa90be2 Shader_Ir: Implement F16 Variants of F2F, F2I, I2F.
This commit takes care of implementing the F16 Variants of the 
conversion instructions and makes sure conversions are done.
2019-07-20 17:38:25 -04:00
ReinUsesLisp
edc43b2509 shader/half_set_predicate: Implement missing HSETP2 variants 2019-07-19 22:20:47 -03:00
Fernando Sahmkow
e221290cb7 Merge pull request #2695 from ReinUsesLisp/layer-viewport
gl_shader_decompiler: Implement gl_ViewportIndex and gl_Layer in vertex shaders
2019-07-15 16:28:07 -04:00
Fernando Sahmkow
43662e376e Merge pull request #2692 from ReinUsesLisp/tlds-f16
shader/texture: Add F16 support for TLDS
2019-07-14 08:44:38 -04:00
Fernando Sahmkow
d5d4cc30ec shader_ir: Implement BRX & BRA.CC 2019-07-09 08:14:37 -04:00
ReinUsesLisp
a650406899 gl_shader_decompiler: Implement gl_ViewportIndex and gl_Layer in vertex shaders
This commit implements gl_ViewportIndex and gl_Layer in vertex and
geometry shaders. In the case it's used in a vertex shader, it requires
ARB_shader_viewport_layer_array. This extension is available on AMD and
Nvidia devices (mesa and proprietary drivers), but not available on
Intel on any platform. At the moment of writing this description I don't
know if this is a hardware limitation or a driver limitation.

In the case that ARB_shader_viewport_layer_array is not available,
writes to these registers on a vertex shader are ignored, with the
appropriate logging.
2019-07-07 20:42:55 -03:00
ReinUsesLisp
48d485d6df shader/texture: Add F16 support for TLDS 2019-07-07 16:05:56 -03:00
ReinUsesLisp
7eed876cfb shader_bytecode: Include missing <array> 2019-06-24 01:51:02 -03:00