Commit graph

12877 commits

Author SHA1 Message Date
Lioncash
6a42862a16 kernel/vm_manager: Remove usages of global system accessors
Makes the dependency on the system instance explicit within VMManager's
interface.
2019-04-16 20:02:50 -04:00
Fernando Sahmkow
ad686a3c0d Implement IsBlockContinous
This detects when a GPU Memory Block is not continous within host cpu
memory.
2019-04-16 18:49:35 -04:00
Fernando Sahmkow
56c2b0ea86 Apply Const correctness to SwizzleKepler and replace u32 for size_t on iterators. 2019-04-16 12:00:46 -04:00
Fernando Sahmkow
994393bd02 Use ReadBlockUnsafe for fetyching DMA CommandLists 2019-04-16 11:22:34 -04:00
Fernando Sahmkow
86d3cb5fa7 Document unsafe versions and add BlockCopyUnsafe 2019-04-16 10:11:35 -04:00
Fernando Sahmkow
cde8e7f605 Use ReadBlockUnsafe for Shader Cache 2019-04-15 23:34:03 -04:00
Fernando Sahmkow
b33c627670 Use ReadBlockUnsafe on TIC and TSC reading
Use ReadBlockUnsafe on TIC and TSC reading as memory is never flushed
from host GPU there.
2019-04-15 23:10:24 -04:00
Fernando Sahmkow
57051db434 GPU MemoryManager: Implement ReadBlockUnsafe and WriteBlockUnsafe 2019-04-15 23:01:35 -04:00
Fernando Sahmkow
525211db3b Use WriteBlock and ReadBlock. 2019-04-15 22:42:34 -04:00
bunnei
d41d65dd10 Merge pull request #2382 from lioncash/table
service: Update service function tables
2019-04-15 21:46:15 -04:00
bunnei
3c817b0304 Merge pull request #2393 from lioncash/svc
kernel/svc: Implement svcMapProcessCodeMemory/svcUnmapProcessCodeMemory
2019-04-15 21:43:56 -04:00
bunnei
7675fa7c42 Merge pull request #2398 from lioncash/boost
kernel/thread: Remove BoostPriority()
2019-04-15 21:42:16 -04:00
bunnei
789d0a28fc Merge pull request #2399 from FernandoS27/fermi-fix
Correct Pitch in Fermi2D
2019-04-15 21:41:52 -04:00
Fernando Sahmkow
15368c6070 Implement Block Linear copies in Kepler Memory. 2019-04-15 21:22:16 -04:00
ReinUsesLisp
45044529b4 vk_shader_decompiler: Add missing operations 2019-04-15 21:32:57 -03:00
ReinUsesLisp
6ea1afc2bc shader_ir/decode: Fix half float pre-operations and remove MetaHalfArithmetic
Operations done before the main half float operation (like HAdd) were
managing a packed value instead of the unpacked one. Adding an unpacked
operation allows us to drop the per-operand MetaHalfArithmetic entry,
simplifying the code overall.
2019-04-15 21:16:10 -03:00
ReinUsesLisp
7e58372bb9 gl_shader_decompiler: Fix MrgH0 decompilation
GLSL decompilation for HMergeH0 was wrong. This addresses that issue.
2019-04-15 21:16:10 -03:00
ReinUsesLisp
6d47914b88 shader_ir/decode: Implement half float saturation 2019-04-15 21:16:10 -03:00
ReinUsesLisp
9c4449696a shader_ir/decode: Reduce severity of unimplemented half-float FTZ 2019-04-15 21:16:09 -03:00
ReinUsesLisp
a87fe3ea63 renderer_opengl: Implement half float NaN comparisons 2019-04-15 21:13:26 -03:00
ReinUsesLisp
b6a805df3b shader_ir: Avoid using static on heap-allocated objects
Using static here might be faster at runtime, but it adds a heap
allocation called before main.
2019-04-15 21:12:43 -03:00
Fernando Sahmkow
73f925a949 Do some corrections in conversion shader instructions.
Corrects encodings for I2F, F2F, I2I and F2I
Implements Immediate variants of all four conversion types.
Add assertions to unimplemented stuffs.
2019-04-15 19:16:27 -04:00
Cameron Cawley
ed8ae582f7 travis: Use Ninja for Travis builds 2019-04-16 01:06:34 +02:00
fearlessTobi
2e197250dc GenerateSCMRev: fix Travis compilation on repo forks 2019-04-16 00:34:22 +02:00
Lioncash
0af0b0f908 CMakeLists: Define QT_USE_QSTRINGBUILDER for the Qt target
This is a compile definition introduced in Qt 4.8 for reducing the total
potential number of strings created when performing string
concatenation. This allows for less memory churn.

This can be read about here:
https://blog.qt.io/blog/2011/06/13/string-concatenation-with-qstringbuilder/

For a change that isn't source-compatible, we only had one occurrence
that actually need to have its type clarified, which is pretty good, as
far as transitioning goes.
2019-04-15 17:59:41 -04:00
liushuyu
831f7ffed9 travis: use prebuilt image (#3839)
* travis: use prebuilt image

* travis: use prebuilt image (MinGW)
2019-04-15 22:22:09 +02:00
Lioncash
4902eb4d01 svc: Specify handle value in thread's name
Allows the handle to be seen alongside the entry point.
2019-04-15 15:56:18 -04:00
Fernando Sahmkow
02c84726ed Correct Kepler Memory on Linear Pushes. 2019-04-15 14:51:36 -04:00
Fernando Sahmkow
0e8065d640 Support compressed formats on linear textures. 2019-04-15 13:56:09 -04:00
Lioncash
a08e56c7a7 common/{lz4_compression, zstd_compression}: Add missing header guards
These two files were missing the #pragma once directive.
2019-04-15 13:00:08 -04:00
Fernando Sahmkow
7e2bd462f9 Correct Pitch in Fermi2D 2019-04-15 12:24:29 -04:00
Lioncash
78571c84b3 kernel/thread: Remove BoostPriority()
This is a holdover from Citra that currently remains unused, so it can
be removed from the Thread interface.
2019-04-15 06:59:19 -04:00
Lioncash
6baebc3d41 kernel/thread: Remove unused guest_handle member variable
This member variable is entirely unused. It was only set but never
actually utilized. Given that, we can remove it to get rid of noise in
the thread interface.
2019-04-14 06:06:06 -04:00
ReinUsesLisp
4338b9d829 gl_shader_decompiler: Use variable AOFFI on supported hardware 2019-04-14 05:13:19 -03:00
ReinUsesLisp
79e7fb6d6f shader_ir: Implement STG, keep track of global memory usage and flush 2019-04-14 00:25:32 -03:00
bunnei
c6fff9d12c Merge pull request #2378 from lioncash/ro
ldr: Minor amendments to IPC-related parameters
2019-04-13 22:16:10 -04:00
bunnei
2ca1f24c4b Merge pull request #2373 from FernandoS27/z32
Set Pixel Format to Z32 if its R32F and depth compare enabled, and Implement format ZF32_X24S8
2019-04-13 22:14:51 -04:00
bunnei
7b12d8d511 Merge pull request #2357 from zarroboogs/force-30fps-mode
Add a toggle to force 30FPS mode
2019-04-13 22:14:04 -04:00
bunnei
d75fb5713f Merge pull request #2381 from lioncash/fs
fsp_srv: Minor cleanup related changes
2019-04-13 22:09:58 -04:00
bunnei
39c54252f4 Merge pull request #2386 from ReinUsesLisp/shader-manager
gl_shader_manager: Move code to source file and minor clean up
2019-04-13 22:09:27 -04:00
bunnei
116f65a527 Merge pull request #2017 from jroweboy/glwidget
Frontend: Migrate to QOpenGLWindow and support shared contexts
2019-04-13 22:08:40 -04:00
bunnei
8cbba96a16 Merge pull request #2389 from FreddyFunk/rename-gamedir
ui_settings: Rename game directory variables
2019-04-13 22:06:51 -04:00
Lioncash
97ccd45bb4 kernel/svc: Implement svcUnmapProcessCodeMemory
Essentially performs the inverse of svcMapProcessCodeMemory. This unmaps
the aliasing region first, then restores the general traits of the
aliased memory.

What this entails, is:

- Restoring Read/Write permissions to the VMA.
- Restoring its memory state to reflect it as a general heap memory region.
- Clearing the memory attributes on the region.
2019-04-12 21:56:03 -04:00
Lioncash
0b1ffc40a7 kernel/svc: Implement svcMapProcessCodeMemory
This is utilized for mapping code modules into memory. Notably, the
ldr service would call this in order to map objects into memory.
2019-04-12 21:55:50 -04:00
bunnei
d060da0515 Merge pull request #2391 from lioncash/scope
common/scope_exit: Replace std::move with std::forward in ScopeExit()
2019-04-12 21:52:35 -04:00
bunnei
c02a19f880 Merge pull request #2392 from lioncash/swap
common/swap: Minor cleanup and improvements to byte swapping functions
2019-04-12 21:52:16 -04:00
FreddyFunk
27f51145b5 Fix Clang Format 2019-04-12 16:40:35 +02:00
Lioncash
46a7c8826b common/swap: Improve codegen of the default swap fallbacks
Uses arithmetic that can be identified more trivially by compilers for
optimizations. e.g. Rather than shifting the halves of the value and
then swapping and combining them, we can swap them in place.

e.g. for the original swap32 code on x86-64, clang 8.0 would generate:

    mov     ecx, edi
    rol     cx, 8
    shl     ecx, 16
    shr     edi, 16
    rol     di, 8
    movzx   eax, di
    or      eax, ecx
    ret

while GCC 8.3 would generate the ideal:

    mov     eax, edi
    bswap   eax
    ret

now both generate the same optimal output.

MSVC used to generate the following with the old code:

    mov     eax, ecx
    rol     cx, 8
    shr     eax, 16
    rol     ax, 8
    movzx   ecx, cx
    movzx   eax, ax
    shl     ecx, 16
    or      eax, ecx
    ret     0

Now MSVC also generates a similar, but equally optimal result as clang/GCC:

    bswap   ecx
    mov     eax, ecx
    ret     0

====

In the swap64 case, for the original code, clang 8.0 would generate:

    mov     eax, edi
    bswap   eax
    shl     rax, 32
    shr     rdi, 32
    bswap   edi
    or      rax, rdi
    ret

(almost there, but still missing the mark)

while, again, GCC 8.3 would generate the more ideal:

    mov     rax, rdi
    bswap   rax
    ret

now clang also generates the optimal sequence for this fallback as well.

This is a case where MSVC unfortunately falls short, despite the new
code, this one still generates a doozy of an output.

    mov     r8, rcx
    mov     r9, rcx
    mov     rax, 71776119061217280
    mov     rdx, r8
    and     r9, rax
    and     edx, 65280
    mov     rax, rcx
    shr     rax, 16
    or      r9, rax
    mov     rax, rcx
    shr     r9, 16
    mov     rcx, 280375465082880
    and     rax, rcx
    mov     rcx, 1095216660480
    or      r9, rax
    mov     rax, r8
    and     rax, rcx
    shr     r9, 16
    or      r9, rax
    mov     rcx, r8
    mov     rax, r8
    shr     r9, 8
    shl     rax, 16
    and     ecx, 16711680
    or      rdx, rax
    mov     eax, -16777216
    and     rax, r8
    shl     rdx, 16
    or      rdx, rcx
    shl     rdx, 16
    or      rax, rdx
    shl     rax, 8
    or      rax, r9
    ret     0

which is pretty unfortunate.
2019-04-12 00:07:39 -04:00
Lioncash
e49ee38660 core/core: Move process execution start to System's Load()
This gives us significantly more control over where in the
initialization process we start execution of the main process.

Previously we were running the main process before the CPU or GPU
threads were initialized (not good). This amends execution to start
after all of our threads are properly set up.
2019-04-11 22:11:41 -04:00
Lioncash
67744e08c9 core/process: Remove unideal page table setting from LoadFromMetadata()
Initially required due to the split codepath with how the initial main
process instance was initialized. We used to initialize the process
like:

Init() {
    main_process = Process::Create(...);
    kernel.MakeCurrentProcess(main_process.get());
}

Load() {
    const auto load_result = loader.Load(*kernel.GetCurrentProcess());
    if (load_result != Loader::ResultStatus::Success) {
        // Handle error here.
    }
    ...
}

which presented a problem.

Setting a created process as the main process would set the page table
for that process as the main page table. This is fine... until we get to
the part that the page table can have its size changed in the Load()
function via NPDM metadata, which can dictate either a 32-bit, 36-bit,
or 39-bit usable address space.

Now that we have full control over the process' creation in load, we can
simply set the initial process as the main process after all the loading
is done, reflecting the potential page table changes without any
special-casing behavior.

We can also remove the cache flushing within LoadModule(), as execution
wouldn't have even begun yet during all usages of this function, now
that we have the initialization order cleaned up.
2019-04-11 22:11:41 -04:00