Subsurface scattering in Vulkan path tracing

Started by
68 comments, last by taby 2 months, 1 week ago

Maybe this shows some useful math on how to weight and accumulate marched steps:

https://www.shadertoy.com/view/wdffWj

Paper and lots of follow up work is on Keenan Cranes webpage.
Not really a path tracing topic maybe, but i guess it's the same thing.

I would be interested in using this for fog.
Skin works pretty well with screenspace fakery, but for fog most games just change the color of distant stuff, without blurring it as well. So the DOF effect is missing.
Besides fog, underwater scenes would be another application.

This could be used to generate interesting atmosphere and mood for a game.
Especially if we model the fog with some low res volume data, ideally dynamic.

I stare a lot to the morning fog covering the mountain outside. Sometimes it's so dense you can see a pretty sharp cut. One house on the mountain is clearly visible, another just 10m higher is already completely hidden.
The blur happening from the scattering seems subtle, looking like blurred lower frequencies mixing with hf detail which is still fully visible as well. It gives a kind of lonely / strange out of body experience, sucking your soul out into the void where nothing matters anymore. Very nice actually, not only for horror games.

But the killer application for games would be glowing projectiles flying through the fog, and you can see their light diffusing and spreading. :D

Advertisement

That would be awesome!

Here is my fog. It is affected by the lighting.

The code is:

void get_fog(inout float dist_color, inout float dist_opacity, vec3 origin, vec3 direction, vec3 hitPos, float hitColor, float hue, float eta)
{
	vec3 start = origin;
	vec3 end = origin;

	float t0 = 0.0;
	float t1 = 0.0;
	const float target_step_length = 0.1;

	if(in_aabb(origin, aabb_min, aabb_max))
	{
		vec3 backout_pos = origin - direction.xyz*10000.0;

		if(BBoxIntersect(aabb_min, aabb_max, backout_pos, direction.xyz, t0, t1))
		{
			start = origin;//backout_pos + direction.xyz*t0;
			end = backout_pos + direction.xyz*t1;
		}
	}
	else
	{
		if(BBoxIntersect(aabb_min, aabb_max,origin, direction.xyz, t0, t1))
		{
			start = origin + direction.xyz*t0;
			end = origin + direction.xyz*t1;
		}
	}

	if(distance(origin, start) > distance(origin, hitPos.xyz))
		start = hitPos.xyz + rayPayload.normal * 0.01f;

	if(distance(origin, end) > distance(origin, hitPos.xyz))
		end = hitPos.xyz + rayPayload.normal * 0.01f;

	const int num_steps = int(floor((distance(start, end) / target_step_length)));

	if(num_steps >= 2)
	{
		const vec3 step = (end - start) / (num_steps - 1);

		vec3 curr_step = start;

		for(int j = 0; j < num_steps; j++, curr_step += step)
		{
			const vec3 mask = hsv2rgb(vec3(hue, 1.0, 1.0));
			float colour = get_omni_radiance_backward(10, 25, curr_step, hue, eta);

			float noise = noise3(curr_step, 1.0);

	//		noise *= noise3(curr_step, 1.0/2.0);
	//		noise *= noise3(curr_step, 1.0/4.0);
	//		noise *= noise3(curr_step, 1.0/8.0);

			//colour *= noise;

			const float trans = 1.0 - clamp(dist_opacity, 0.0, 1.0);
			dist_color += colour*trans;
			dist_opacity += 0.01*colour*trans;
		}
	}

	const vec3 mask = hsv2rgb(vec3(hue, 1.0, 1.0));
	dist_color *= (fog_colour.r*mask.r + fog_colour.g*mask.g + fog_colour.b*mask.b);
}

Yeah, that's great. :D

But for the blur from scattering you would also need to randomly nudge the ray direction with each step, i guess.

I remember this paper, trying to fake it in screenspace:

https://www.researchgate.net/publication/260584488_Real-Time_Screen-Space_Scattering_in_Homogeneous_Environments

OK, this path tracer code is getting super long, and it runs at less than 1 FPS on my puny 3060.

You had once before brought up the possibility of switching from using an array to using an SSBO. Why would that make it faster? Or this this just a hunch?

This looks promising:

https://github.com/vblanco20-1/vulkan-guide/blob/4fafdfee151fea55d036c727b3b0b372d1c9239e/docs/chapter-4/storage_buffers.md

taby said:
You had once before brought up the possibility of switching from using an array to using an SSBO.

First, some nitpicking on terminology…

When you use a local array allocated in a shader, unique to a single thread, you do not know it is an array at all. Because the thread has no local memory. It only has registers. And you can not index registers with an array index. So the compiler might need to do lots of branches to find the random register associated to the given index, which is why it's potentially slow.

Contrary, the SSBO is just VRAM. So it will be addressable by index in O(1) like an array.

The problem is ofc. that you need huge amounts of memory, e.g. one array for each pixel.
And VRAM is much slower than registers, even if cached.

If i have to guess, i assume using SSBO is not worth it, and it will end up even slower.
But i'm not sure at all. It's maybe no so much work to try it out.

But no matter what, you won't be able to make it fast enough this way.
What made realtime PT possible was mainly the progress on reducing sample counts, enabled by spatio temporal denoising with Quake II RTX, and later additionally with Restir. NVidias next step seems a neural radiance cache, which is ML where the training isn't offline but aims to improve based on current view and lighting conditions.

If you want path traced games seriously, no way around of diving into these complex topics. It's why current games doing this usually require only 1 sample per pixel. It's also why their results are blurry, good to see in current RTX Remix mods for old games.

If you just want to get rid of the array, likely you have to sacrifice the bidirectional method for those technical reasons (which might be needed anyway).
The truth is that PT conflicts with the way GPUs work, and accelerating the traceRay function is not enough to change this.
The other truth is that PT is inefficient by definition, even when ignoring current HW memory limitations. It is easy and flexible, so ideal for offline, but for realtime apps we want some form of caching to share work, instead redoing it independently for each pixel and frame again and again. But because any cache is a dsicretization, we must trade some accuracy for the win (which any form of spatio temporal denoising or sample reuse does anyway, just not with ideal efficiency.)

I can tell you what to do from looking into my crystal ball, telling me the future:
Don't work too hard on rendering.
Work on the game and render just boxes.
Then you enter this to the prompt each frame:
‘Generate a image which looks like real. Red boxes should become castles, and blue boxes become knights with swords. Make up the rest yourself.’
Done. \:D/

Well, i never know myself if i'm joking of if i'm serious, currently.

It really depends on what your crystal ball shows. ; )

I have SSBOs almost working. Instead of crashing, it locks. 🙂

Any ideas on what I might be doing wrong?

https://github.com/sjhalayka/bidirectional_path_tracer/tree/c3e0bb02d6e8e7a85c4dc2c4c1c4989034ae98cb

taby said:
it locks.

In compute shaders you would need a memory barrier in the shader if you want to read data which has been written from the same shader just before. If write and read happens in different shaders, you need the barrier on API side, to ensure all writing shaders are done before the reading shaders start.

Maybe something like this. Post a link the shader(s) in question eventually…

I'm just thinking, could you add the array to the ray payload?

We should keep that small, but technically it might be still better than VRAM.

This topic is closed to new replies.

Advertisement