📅December 17th, 2024
A while ago I wrote this DirectX Raytracing (DXR) test application. It looks like this

The application is hosted on GitHub, here: https://github.com/clandrew/vapor
Time passed, and app rotted to the point where it doesn't run anymore.
This blog post explains some background about the app, why it rotted, and how I fixed it.
About this app
This a toy application that's an homage to a popular album cover.
The Helios statue mesh I threw together in 3DS Max, besides that there's some textured cubes. There's Direct2D-rendered text, interroped with 11on12 to get textured onto a D3D12 cube geometry. Nothing too crazy. A secondary ray draws a shadow on the floor. There's a raster-based screenspace effect you can toggle on and off.
I wrote it back when DXR support was brand new in Windows. Back then, seemed good.
Fast forward to recently, when I tried it again, cloned and built it and it just crashed on startup.
For context, I originally tested this application on NVIDIA GeForce GTX 1070 (pre-native-RTX support). Nowadays I was testing it on AMD Radeon RX 6900 XT.
What happened between then and now
Back when I wrote it, this application originally used the D3D12 Raytracing Fallback Layer. You can see some remnants of this in the application.
Quick side note about the fallback layer-- "isn't that just WARP?" The fallback layer is different from WARP, and it's also different from DXR(!) It shipped as a completely separate-but-very-similar API as DXR, separate headers and everything, calling into D3D12 API itself. Like, typically you have to recompile to use the fallback layer. You can't just swap in a different DLL or change some toggle at runtime or something. If you squint you'll see that a few parameters are different compared to DXR. The fallback layer implemented a DXR-like interface on top of compute workloads.
While WARP acts more like a driver, the fallback layer is more like middleware. And while WARP is all CPU, the fallback layer is agnostic to that question. In practice I usually used fallback layer on top of GPU though.
Since the time this application was written, WARP was actually updated to support DXR.
And since the time this application was written, I updated the application itself to use DXR.
However, because of the timeline of when this was written versus the availability of actual DXR hardware, the application didn't get battle-tested on actual DXR nearly as much as it did on fallback layer. Since the fallback layer is a totally parallel implementation, you can get some difference of behavior and levels of strictness between it and actual DXR. Also, we have more varied and more baked implementations of DXR now compared to then.
So I suspected the rotting was from a combination of the app changing, and being ported (from fallback layer to DXR) and the underlying environment changing (maturity of DXR implementations with more varied strictness and fault tolerance), and this ended up being true.
Problem 1: Scratch resource is the wrong size
When I ran the application it just crashed silently.
To investigate this I did the first thing I always do which is enable SDK layers. This is validation on CPU timeline.
It showed me
ID3D12CommandList::BuildRaytracingAccelerationStructure: pDesc->ScratchAccelerationStructureData + SizeInBytes - 1 (0x0000000301087cc7) exceeds end of the virtual address range of Resource (0x000002532BE82EC0:'UpdateScra', GPU VA Range: 0x0000000300f8f000 - 0x0000000300f9996f). [ RESOURCE_MANIPULATION ERROR #1158: BUILD_RAYTRACING_ACCELERATION_STRUCTURE_INVALID]
Basically, this showed there was a bug in the app where the scratch resource used for the acceleration structure update was the wrong size. Scratch resource sizes are platform and situation dependent so it must have been that I ‘got lucky’ when this app was run before.
This was super simple, I fixed it to use the correct size reported from GetRaytracingAccelerationStructurePrebuildInfo().
Problem 2: Resource binding disagreement
The application still crashed so that this point I enabled GPU-based validation. As of the time of writing this, SDK layers GPU-based validation offers a lot of coverage of some general scenarios which are pipeline agnostic (e.g., incorrect resource barriers, attempting to use unbound resources, accessing beyond the end of a descriptor heap), while it doesn't include much in the way of DXR-specific validation, so I wasn't betting on it showing a problem of that category.
When I ran GBV (GPU-based validation), it showed
DescriptorTableStart: [0],
Descriptor Heap Index FromTableStart: [0],
Descriptor Type in Heap: D3D12_DESCRIPTOR_RANGE_TYPE_UAV,
Register Type: D3D12_DESCRIPTOR_RANGE_TYPE_SRV,
Index of Descriptor Range: 0, Shader Stage: PIXEL,
Root Parameter Index: [0],
Draw Index: [0],
Shader Code: PostprocessPS.hlsl(140,15-15), Asm Instruction Range: [0x22-0xffffffff], Asm Operand Index: [0], Command List: 0x000001CCFE6A2FC0:'Unnamed ID3D12GraphicsCommandList Object', Command List Type: D3D12_COMMAND_LIST_TYPE_DIRECT, SRV/UAV/CBV Descriptor Heap: 0x000001CCFE8479F0:'DescriptorHeapWrapper::m_descriptorHeap', Sampler Descriptor Heap: 0x000001CCFE8525D0:'m_samplerDescriptorHeap', Pipeline State: 0x000001CCFEA72720:'Unnamed ID3D12PipelineState Object', [ EXECUTION ERROR #939: GPU_BASED_VALIDATION_DESCRIPTOR_TYPE_MISMATCH]
So this was happening not during the ray tracing, but in the raster pass that runs right after.
This was showing a disagreement between my shader and app code. Shader calls something a :register (t0), which should correspond to an SRV, but the resource binding was a UAV.
Generally when there are disagreements like these, the behavior is undefined.
For example, a while ago I remember seeing a bug in a D3D11 application where the C++ said a resource was a Texture2DMS, while the shader code called it a Texture2D. This resource got bound and shader code did a sample from it. Well on some implementations, the implementation would 'figure it out' and somehow find a way to treat it as a single-sampled resource. On others, it would be device removed. The level of fault-tolerance is really up to the GPU implementation. If it's your bug, ideally you can catch it proactively.
Again, I think I was ‘getting lucky’ with this before, where the underlying implementation could figure out what to do with the disagreement. Fast forward to today, the implementation I tried it on was strict.
Anyway, I fixed this by changing the resource binding to be SRV. Easy enough.
Problem 3: Case of the missing geometry
After fixing the above things, the application runs and doesn't crash. That said, it doesn't yet have correct behavior.
It's supposed to look like

Instead, it looks like

The floor, and billboarded image and text are missing. It’s a little strange, since this demo’s acceleration structure contains 4 geometries- 3 very simple ones and 1 more complicated one—and it’s the simple ones that were missing.
As a quick check, I tried the app on WARP and the missing geometry did not repro with it. It also did not repro on NVIDIA. Therefore the problem looks specific to when the application is run on AMD platform. It's likely the application is doing something incorrect that is getting lucky on the other platforms, where AMD is strict. Whatever it is, it's not being caught by SDK layers, so the next step is to narrow down the problem and probably to use graphics debuggers.
As an educated guess I first added some extra flushes (UAV barrier on null), to rule out the possibility of missing barrier. It made no difference, so that ruled that out.
Next I forced the closest hit shader to be dead simple, return hardcoded red, and disabled raygen culling. For this application, the closest hit shader (CHS) does a bunch of stuff to evaluate color based on material then casts a secondary ray for the shadow. If simplifying the shaders like this showed the simple geometries in red, that would mean the problem is in CHS or raygen.
The result looked like

Meaning, the problem was not in raygen or CHS, but something about the acceleration structure (AS).
As an additional step to narrow things down, I disabled updating of the AS, so the AS is only built once as the application launches. This made it so the scene doesn’t animate any more (normally the statue ‘floats’ up and down). If this were to fix it, it would tell me there’s a mistake in my updating of the AS. This too didn’t make a difference.
So the problem is not in the updating of the AS, but in the creation of the AS.
With that I took a closer look at the AS.
The latest public release of PIX (version 2409.23 at the time) actually showed empty AS with NaN bounds:

further confirming something was wrong on that front.
To get more information about the BLAS I used AMD's public tool, Radeon Raytracing Analyzer (RRA).

The BLAS tab showed the visible mesh, and not the invisible ones, as expected:

In the “Geometries” tab, I saw something reassuring. All 4 of my geometries were there with the right primitive counts.

But in the “Triangles” view is where things looked wrong. All triangles showed as coming from primitive ID 1, and none from other primitive IDs:

This means something was going wrong with BLAS creation. Geometries with valid primitives are going in, and no triangles are coming out.
With that, I took a closer look at the actual descs being sent to the BLAS build.
On that front, I noticed that the order of the mesh loading seemed to matter. The mesh that works is the one that gets loaded first, at vertex buffer (VB) index 0. Then the subsequent meshs’ data get appended at the end. Indices get incremented straightforwardly to the index buffer (IB). All 4 meshes share the same VB and IB. This clued me into something being wrong in how the descs were set up.
The problem ended up being this:
typedef struct D3D12_RAYTRACING_GEOMETRY_TRIANGLES_DESC {
D3D12_GPU_VIRTUAL_ADDRESS Transform3x4;
DXGI_FORMAT IndexFormat;
DXGI_FORMAT VertexFormat;
UINT IndexCount;
UINT VertexCount;
D3D12_GPU_VIRTUAL_ADDRESS IndexBuffer;
D3D12_GPU_VIRTUAL_ADDRESS_AND_STRIDE VertexBuffer;
} D3D12_RAYTRACING_GEOMETRY_TRIANGLES_DESC;
The important field is VertexCount. The app was setting VertexCount to be the number of vertices needed for each mesh.
If you look at the DirectX Raytracing spec in the section for D3D12_RAYTRACING_GEOMETRY_TRIANGLES_DESC:
UINT VertexCount | Number of vertices (positions) in VertexBuffer. If an index buffer is present, this must be at least the maximum index value in the index buffer + 1. |
The VertexCount actually has a slightly different meaning from how the application was treating it, it’s more of a ‘limit’ from the start of the vertex buffer, not just the count that that desc has. For example if a mesh only consisted of 1 vertex at position 5000 in the vertex buffer, it needs to have a VertexCount of around 5000, not 1. It’s IndexCount that would probably be 1.
Once I changed VertexCount to agree with the spec, the missing geometry was fixed:

After fixing that 3rd and final problem, everything is working well again.
To download this application, see the Release posted here: