Need help converting a custom shader to URP - c#

For the last few days I've been following Sebastian Lague's videos about procedual generation. Since my project is based on URP and the custom shader responsible for rendering textures upon the mesh is just written for SRP I have to do an conversion. I've been tinkering around with shader graphs and found a possible solution for the conversion. But still, neither the tint nor the textures are getting rendered. I will add both codes: The one original and the attempted coversion.
Edit: The only thing I got from tinkering a little more is that when I manually set the "layers" value in the shadergraph, it allows me to go and change the base color of said layer. Still, it only affects the one single layer, doesn't apply any textures and only works under a certein y value in global space
This is the code for setting the variables. The same for both:
public void ApplyToMaterial(Material material)
material.SetInt("layerCount", layers.Length);
material.SetColorArray("baseColours", layers.Select(x => x.tint).ToArray());
material.SetFloatArray("baseStartHeights", layers.Select(x => x.startHeight).ToArray());
material.SetFloatArray("baseBlends", layers.Select(x => x.blendStrength).ToArray());
material.SetFloatArray("baseColourStrength", layers.Select(x => x.tintStrength).ToArray());
material.SetFloatArray("baseTextureScales", layers.Select(x => x.textureScale).ToArray());
Texture2DArray texturesArray = GenerateTextureArray(layers.Select(x => x.texture).ToArray());
material.SetTexture("baseTextures", texturesArray);
UpdateMeshHeights(material, savedMinHeight, savedMaxHeight);
public void UpdateMeshHeights(Material material, float minHeight, float maxHeight)
savedMaxHeight = maxHeight;
savedMinHeight = minHeight;
material.SetFloat("minHeight", minHeight);
material.SetFloat("maxHeight", maxHeight);
The original shader:
Shader "Custom/Terrain" {
testTexture("Texture", 2D) = "white"{}
testScale("Scale", Float) = 1
Tags { "RenderType" = "Opaque" }
LOD 200
// Physically based Standard lighting model, and enable shadows on all light types
#pragma surface surf Standard fullforwardshadows
// Use shader model 3.0 target, to get nicer looking lighting
#pragma target 3.0
const static int maxLayerCount = 8;
const static float epsilon = 1E-4;
int layerCount;
float3 baseColours[maxLayerCount];
float baseStartHeights[maxLayerCount];
float baseBlends[maxLayerCount];
float baseColourStrength[maxLayerCount];
float baseTextureScales[maxLayerCount];
float minHeight;
float maxHeight;
sampler2D testTexture;
float testScale;
struct Input {
float3 worldPos;
float3 worldNormal;
float inverseLerp(float a, float b, float value) {
return saturate((value - a) / (b - a));
float3 triplanar(float3 worldPos, float scale, float3 blendAxes, int textureIndex) {
float3 scaledWorldPos = worldPos / scale;
float3 xProjection = UNITY_SAMPLE_TEX2DARRAY(baseTextures, float3(scaledWorldPos.y, scaledWorldPos.z, textureIndex)) * blendAxes.x;
float3 yProjection = UNITY_SAMPLE_TEX2DARRAY(baseTextures, float3(scaledWorldPos.x, scaledWorldPos.z, textureIndex)) * blendAxes.y;
float3 zProjection = UNITY_SAMPLE_TEX2DARRAY(baseTextures, float3(scaledWorldPos.x, scaledWorldPos.y, textureIndex)) * blendAxes.z;
return xProjection + yProjection + zProjection;
void surf(Input IN, inout SurfaceOutputStandard o) {
float heightPercent = inverseLerp(minHeight,maxHeight, IN.worldPos.y);
float3 blendAxes = abs(IN.worldNormal);
blendAxes /= blendAxes.x + blendAxes.y + blendAxes.z;
for (int i = 0; i < layerCount; i++) {
float drawStrength = inverseLerp(-baseBlends[i] / 2 - epsilon, baseBlends[i] / 2, heightPercent - baseStartHeights[i]);
float3 baseColour = baseColours[i] * baseColourStrength[i];
float3 textureColour = triplanar(IN.worldPos, baseTextureScales[i], blendAxes, i) * (1 - baseColourStrength[i]);
o.Albedo = o.Albedo * (1 - drawStrength) + (baseColour + textureColour) * drawStrength;
FallBack "Diffuse"
The attempt plus the shader graph:
const static int maxLayerCount = 8;
const static float epsilon = 1E-4;
float layerCount;
float3 baseColours[maxLayerCount];
float baseStartHeights[maxLayerCount];
float baseBlends[maxLayerCount];
float baseColourStrength[maxLayerCount];
float baseTextureScales[maxLayerCount];
float3 triplanar(float3 worldPos, float scale, float3 blendAxes, Texture2DArray textures, SamplerState ss, int textureIndex) {
float3 scaledWorldPos = worldPos / scale;
float3 xProjection = SAMPLE_TEXTURE2D_ARRAY(textures, ss, float2(scaledWorldPos.y, scaledWorldPos.z), textureIndex) * blendAxes.x;
float3 yProjection = SAMPLE_TEXTURE2D_ARRAY(textures, ss, float2(scaledWorldPos.x, scaledWorldPos.z), textureIndex) * blendAxes.y;
float3 zProjection = SAMPLE_TEXTURE2D_ARRAY(textures, ss, float2(scaledWorldPos.x, scaledWorldPos.y), textureIndex) * blendAxes.z;
return xProjection + yProjection + zProjection;
float inverseLerp(float a, float b, float c)
return saturate((c - a) / (b - a));
void layer_terrain_float(float3 worldPos, float heightPercent, float3 worldNormal, Texture2DArray textures, SamplerState ss, int layerCount, out float3 albedo) {
float3 blendAxes = abs(worldNormal);
blendAxes /= blendAxes.x + blendAxes.y + blendAxes.z;
albedo = 0.0f;
for (int i = 0; i < layerCount; i++) {
float drawStrength = inverseLerp(-baseBlends[i] / 2 - epsilon, baseBlends[i] / 2, heightPercent - baseStartHeights[i]);
float3 baseColour = baseColours[i] * baseColourStrength[i];
float3 textureColour = triplanar(worldPos, baseTextureScales[i], blendAxes, textures, ss, i) * (1 - baseColourStrength[i]);
albedo = albedo * (1 - drawStrength) + (baseColour + textureColour) * drawStrength;
I am banging my head against a wall here for a few days straight. So help would be really appreciated


The difference in the speed of moving objects through the CPU and GPU shader in Unity

I have been testing moving a lot of objects in Unity through normal C# code and through HLSL shaders. However, there is no difference in speed. FPS remains the same. Different perlin noise is used to change the position. The C# code uses the standard Mathf.PerlinNoise, while the HLSL uses a custom noise function.
Scenario 1 - Update via C# code only
Object spawn:
private GameObject prefab;
private void Start()
for (int i = 0; i < 50; i++)
for (int j = 0; j < 50; j++)
GameObject createdParticle;
createdParticle = Instantiate(prefab);
createdParticle.transform.position = new Vector3(i * 1f, Random.Range(-1f, 1f), j * 1f);
Code to move an object via C#. This script is added to each created object:
private Vector3 position = new Vector3();
private void Start()
position = new Vector3(transform.position.x, Mathf.PerlinNoise(Time.time, Time.time), transform.position.z);
private void Update()
position.y = Mathf.PerlinNoise(transform.position.x / 20f + Time.time, transform.position.z / 20f + Time.time) * 5f;
transform.position = position;
Scenario 2 - via Compute Kernel (GPGPU)
Part 1: C# client code
Object spawn, running the calculation on the shader and assigning the resulting value to the objects:
public struct Particle
public Vector3 position;
private GameObject prefab;
private ComputeShader computeShader;
private List<GameObject> particlesList = new List<GameObject>();
private Particle[] particlesDataArray;
private void Start()
private void Update()
private void CreateParticles()
List<Particle> particlesDataList = new List<Particle>();
for (int i = 0; i < 50; i++)
for (int j = 0; j < 50; j++)
GameObject createdParticle;
createdParticle = Instantiate(prefab);
createdParticle.transform.position = new Vector3(i * 1f, Random.Range(-1f, 1f), j * 1f);
Particle particle = new Particle();
particle.position = createdParticle.transform.position;
particlesDataArray = particlesDataList.ToArray();
computeBuffer = new ComputeBuffer(particlesDataArray.Length, sizeof(float) * 7);
computeShader.SetBuffer(0, "particles", computeBuffer);
private ComputeBuffer computeBuffer;
private void UpdateParticlePosition()
computeShader.SetFloat("time", Time.time);
computeShader.Dispatch(computeShader.FindKernel("CSMain"), particlesDataArray.Length / 10, 1, 1);
for (int i = 0; i < particlesDataArray.Length; i++)
Vector3 pos = particlesList[i].transform.position;
pos.y = particlesDataArray[i].position.y;
particlesList[i].transform.position = pos;
Part 2: Compute kernel (GPGPU)
#pragma kernel CSMain
struct Particle {
float3 position;
float4 color;
RWStructuredBuffer<Particle> particles;
float time;
float mod(float x, float y)
return x - y * floor(x / y);
float permute(float x) { return floor(mod(((x * 34.0) + 1.0) * x, 289.0)); }
float3 permute(float3 x) { return mod(((x * 34.0) + 1.0) * x, 289.0); }
float4 permute(float4 x) { return mod(((x * 34.0) + 1.0) * x, 289.0); }
float taylorInvSqrt(float r) { return 1.79284291400159 - 0.85373472095314 * r; }
float4 taylorInvSqrt(float4 r) { return float4(taylorInvSqrt(r.x), taylorInvSqrt(r.y), taylorInvSqrt(r.z), taylorInvSqrt(r.w)); }
float3 rand3(float3 c) {
float j = 4096.0 * sin(dot(c, float3(17.0, 59.4, 15.0)));
float3 r;
r.z = frac(512.0 * j);
j *= .125;
r.x = frac(512.0 * j);
j *= .125;
r.y = frac(512.0 * j);
return r - 0.5;
float _snoise(float3 p) {
const float F3 = 0.3333333;
const float G3 = 0.1666667;
float3 s = floor(p + dot(p, float3(F3, F3, F3)));
float3 x = p - s + dot(s, float3(G3, G3, G3));
float3 e = step(float3(0.0, 0.0, 0.0), x - x.yzx);
float3 i1 = e * (1.0 - e.zxy);
float3 i2 = 1.0 - e.zxy * (1.0 - e);
float3 x1 = x - i1 + G3;
float3 x2 = x - i2 + 2.0 * G3;
float3 x3 = x - 1.0 + 3.0 * G3;
float4 w, d;
w.x = dot(x, x);
w.y = dot(x1, x1);
w.z = dot(x2, x2);
w.w = dot(x3, x3);
w = max(0.6 - w, 0.0);
d.x = dot(rand3(s), x);
d.y = dot(rand3(s + i1), x1);
d.z = dot(rand3(s + i2), x2);
d.w = dot(rand3(s + 1.0), x3);
w *= w;
w *= w;
d *= w;
return dot(d, float4(52.0, 52.0, 52.0, 52.0));
[numthreads(10, 1, 1)]
void CSMain(uint3 id : SV_DispatchThreadID)
Particle particle = particles[id.x];
float modifyTime = time / 5.0;
float positionY = _snoise(float3(particle.position.x / 20.0 + modifyTime, 0.0, particle.position.z / 20.0 + modifyTime)) * 5.0;
particle.position = float3(particle.position.x, positionY, particle.position.z);
particles[id.x] = particle;
What am I doing wrong, why is there no increase in calculation speed? :)
Thanks in advance!
TL;DR: your GPGPU (compute shader) scenario is unoptimized thus skewing your results. Consider binding a material to the computeBuffer and rendering via Graphics.DrawProcedural. That way everything stays on the GPU.
What am I doing wrong, why is there no increase in calculation speed?
Essentially, there are two parts to your problem.
(1) Reading from the GPU is slow
With most things GPU-related, you generally want to avoid reading from the GPU since it will block the CPU. This is true also for GPGPU scenarios.
If I were to hazard a guess it would be the GPGPU (compute shader) call computeBuffer.GetData() shown below:
private void Update()
private void UpdateParticlePosition()
computeBuffer.GetData(particlesDataArray); // <----- OUCH!
Unity (my emphasis):
Read data values from the buffer into an array...
Note that this function reads the data back from the GPU, which can be slow...If any GPU work has been submitted that writes to this buffer, Unity waits for the tasks to complete before it retrieves the requested data. Tell me more...
(2) Explicit GPU reading is not required in your scenario
I can see you are creating 2,500 "particles" where each particle is attached to a GameObject. If the intent is to just draw a simple quad then it's more efficient to create an array structs containing a Vector3 position and then performing a batch render call to draw all the particles in one go.
Proof: see video below of nBody simulation. 60+ FPS on 2014 era NVidia card
e.g. for my GPGPU n-Body Galaxy Simulation I do just that. Pay attention to the StarMaterial.SetBuffer("stars", _starsBuffer) during actual rendering. That tells the GPU to use the buffer that already exists on the GPU, the very same buffer that the computer shader used to move the star positions. There is no CPU reading the GPU here.
public class Galaxy1Controller : MonoBehaviour
public Texture2D HueTexture;
public int NumStars = 10000; // That's right! 10,000 stars!
public ComputeShader StarCompute;
public Material StarMaterial;
private ComputeBuffer _quadPoints;
private Star[] _stars;
private ComputeBuffer _starsBuffer;
private void Start()
_updateParticlesKernel = StarCompute.FindKernel("UpdateStars");
_starsBuffer = new ComputeBuffer(NumStars, Constants.StarsStride);
_stars = new Star[NumStars];
// Create initial positions for stars here (not shown)
_quadPoints = new ComputeBuffer(6, QuadStride);
_quadPoints.SetData(...); // star quad
private void Update()
// bind resources to compute shader
StarCompute.SetBuffer(_updateParticlesKernel, "stars", _starsBuffer);
StarCompute.SetFloat("deltaTime", Time.deltaTime*_manager.MasterSpeed);
StarCompute.SetTexture(_updateParticlesKernel, "hueTexture", HueTexture);
// dispatch, launch threads on GPU
var numberOfGroups = Mathf.CeilToInt((float) NumStars/GroupSize);
StarCompute.Dispatch(_updateParticlesKernel, numberOfGroups, 1, 1);
// "Look Ma, no reading from the GPU!"
private void OnRenderObject()
// bind resources to material
StarMaterial.SetBuffer("stars", _starsBuffer);
StarMaterial.SetBuffer("quadPoints", _quadPoints);
// set the pass
// draw
Graphics.DrawProcedural(MeshTopology.Triangles, 6, NumStars);
n-Body galaxy simulation of 10,000 stars:
I think everyone can agree that Microsoft's GPGPU documentation is pretty sparse so your best bet is to check out examples scattered around the interwebs. One that comes to mind is the excellent "GPU Ray Tracing in Unity" series over at Three Eyed Games. See the link below.
See also:
MickyD, "n-Body Galaxy Simulation using Compute Shaders on GPGPU via Unity 3D", 2014
Kuri, D, "GPU Ray Tracing in Unity – Part 1", 2018
ComputeBuffer.GetData is very long. The CPU copies data from the GPU. This stops the main thread.
Then you loop around all transforms to change their positions, this is certainly faster than thousands of MonoBehaviour, but also very long.
There are two ways to optimize your code.
C# Job System + Burst
Detailed tutorial:
Use the structured buffer calculated in the compute shader without copying it back to the CPU. Here is a detailed tutorial on how to do it:

Calculating 2D Gaussian filter in Fragment Shader

I would like to calculated the 2D Gaussian function and the input is X,Y texture UV coordinate and get the corresponding gaussian value.
I'm facing difficulties on how to get the corresponding Texel's uv gaussian value.
float Gaussian2D(float x, float y)
float x_y_squared = x * x + y * y;
float stDevSquared = 2 *_2D_StandardDeviation * _2D_StandardDeviation;
float div = x_y_squared / stDevSquared;
float gauss = pow(E, -div);
return gauss;
float Gaussian(int offset)
float stDevSquared = _StandardDeviation * _StandardDeviation;
float gauss = (1 / sqrt(2 * PI * stDevSquared)) * pow(E, -((offset * offset) / (2 * stDevSquared)));
return gauss;
fixed4 frag(v2f i) : SV_Target
fixed source = tex2D(_MainTex, i.uv).r;
float g0 = Gaussian(0);
float g1 = Gaussian(1);
float g2 = Gaussian(2);
float g3 = Gaussian(3);
float g4 = Gaussian(4);
float g5 = Gaussian(5);
float omega = g0 + g1 + g2 + g3 + g4 + g5;
float gauss = Gaussian2D(i.uv.x, i.uv.y);
fixed prev_a = tex2D(_HistoryA, i.uv).r;
fixed prev_b = tex2D(_HistoryB, i.uv).r;
fixed prev_c = tex2D(_HistoryC, i.uv).r;
fixed prev_d = tex2D(_HistoryD, i.uv).r;
fixed prev_e = tex2D(_HistoryE, i.uv).r;
fixed current = (gauss*source * g0 + gauss*prev_a * g1 + gauss*prev_b * g2 + gauss*prev_c * g3 + gauss*prev_d * g4 + gauss*prev_e * g5)/(omega);
float diff = source - prev_a;
if (diff <= _dataDelta)
return current;
return source;
Update to the Amazing work by Spektre
sampler2D _MainTex;
sampler2D _HistoryA;
sampler2D _HistoryB;
sampler2D _HistoryC;
sampler2D _HistoryD;
float4 _MainTex_TexelSize;
float _dataDelta;
float _blurRadius;
float _stepsDelta;
float _resolution;
float4 _MainTex_ST;
float _StandardDeviation;
#define E 2.71828182846
#define PI 3.14159265359
v2f vert(appdata v) {
v2f o;
o.vertex = UnityObjectToClipPos(v.vertex);
o.uv = v.uv;
return o;
float Gaussian(int offset)
float stDevSquared = _StandardDeviation * _StandardDeviation;
float gauss = (1 / sqrt(2 * PI * stDevSquared)) * pow(E, -((offset * offset) / (2 * stDevSquared)));
return gauss;
float blur2d_horizontal(sampler2D tex, v2f i, float hstep, float vstep) {
float2 uv = i.uv;
float sum = 0;
float2 tc = uv;
//blur radius in pixels
float blur = _blurRadius / _resolution / 4;
sum += tex2D(tex, float2(tc.x - 4.0 * blur * hstep, tc.y - 4.0 * blur * vstep)).r * 0.0162162162;
sum += tex2D(tex, float2(tc.x - 3.0 * blur * hstep, tc.y - 3.0 * blur * vstep)).r * 0.0540540541;
sum += tex2D(tex, float2(tc.x - 2.0 * blur * hstep, tc.y - 2.0 * blur * vstep)).r * 0.1216216216;
sum += tex2D(tex, float2(tc.x - 1.0 * blur * hstep, tc.y - 1.0 * blur * vstep)).r * 0.1945945946;
sum += tex2D(tex, float2(tc.x, tc.y)).r * 0.2270270270;
sum += tex2D(tex, float2(tc.x + 1.0 * blur * hstep, tc.y + 1.0 * blur * vstep)).r * 0.1945945946;
sum += tex2D(tex, float2(tc.x + 2.0 * blur * hstep, tc.y + 2.0 * blur * vstep)).r * 0.1216216216;
sum += tex2D(tex, float2(tc.x + 3.0 * blur * hstep, tc.y + 3.0 * blur * vstep)).r * 0.0540540541;
sum += tex2D(tex, float2(tc.x + 4.0 * blur * hstep, tc.y + 4.0 * blur * vstep)).r * 0.0162162162;
return sum;
fixed4 frag(v2f i) : SV_Target {
const int m = 5;
float d = 5.0;
float z[m];
float gauss_curve[m];
float zed;
_resolution = 900;
z[0] = tex2D(_MainTex, i.uv).r;// oldest 2 frames
z[1] = tex2D(_HistoryA, i.uv).r;
if (abs(z[0] - z[1]) < _dataDelta) // threshold depth change
// z[0] = 0.0;
// 2D spatial gauss blur of z0
z[0] = blur2d_horizontal(_MainTex, i, _stepsDelta, _stepsDelta);
// fetch depths from up to m frames
z[2] = tex2D(_HistoryB, i.uv).r;
z[3] = tex2D(_HistoryC, i.uv).r;
z[4] = tex2D(_HistoryD, i.uv).r;
zed = 0.0;
gauss_curve[0] = Gaussian(0);
gauss_curve[1] = Gaussian(1);
gauss_curve[2] = Gaussian(2);
gauss_curve[3] = Gaussian(3);
gauss_curve[4] = Gaussian(4);
float sum = 0.0;
// 1D temporal gauss blur
for (int idx = 1; idx <= m; idx++)
zed += gauss_curve[idx - 1] * z[idx - 1];
zed = z[0];
return fixed4(zed, zed, zed, 0.0);
OK I think I managed to do this... well +/- as the equation:
Is just symbolical simplification (common in CV/DIP) not complete equation not uniquely determined... So its interpretation (and implementation) is not clear from it... However I managed to combine the missing stuff into something like this (GLSL):
// Vertex
#version 420 core
layout(location=0) in vec4 vertex;
out vec2 pos; // screen position <-1,+1>
void main()
// Fragment
#version 420 core
in vec2 pos; // screen position <-1,+1>
out vec4 gl_FragColor; // fragment output color
uniform sampler2D txr_rgb;
uniform sampler2D txr_zed0;
uniform sampler2D txr_zed1;
uniform sampler2D txr_zed2;
uniform sampler2D txr_zed3;
uniform sampler2D txr_zed4;
uniform float xs,ys; // texture resolution
uniform float r; // blur radius
float G(float t)
return 0.0;
void main()
vec2 p;
vec4 rgb;
const int m=5;
const float Th=0.0015;
float z[m],zed;
p=0.5*(pos+1.0); // p = pos position in texture
rgb=texture2D(txr_rgb ,p); // rgb color (just for view)
z[0]=texture2D(txr_zed0,p).r; // oldest 2 frames
if (abs(z[0]-z[1])>Th) // threshold depth change
int i;
float x,y,xx,yy,rr,dx,dy,w,w0;
// 2D spatial gauss blur of z0
for (dx=1.0/xs,x=-r,p.x=0.5+(pos.x*0.5)+(x*dx);x<=r;x++,p.x+=dx){ xx=x*x;
for (dy=1.0/ys,y=-r,p.y=0.5+(pos.y*0.5)+(y*dy);y<=r;y++,p.y+=dy){ yy=y*y;
if (xx+yy<=rr)
// fetch depths from up to m frames
// 1D temporal gauss blur
for (zed=0.0,i=1;i<=m;i++) zed+=exp(0.5*float(i*i)/float(m*m))*z[i-1];
else zed=z[0];
zed*=20.0; // debug view: emphasize depth so its color is visible
// gl_FragColor=rgb; // debug view: render RGB texture
gl_FragColor=vec4(zed,zed,zed,0.0); // render resulting depth texture
I used this dataset for testing However the depth resolution is not very good...
Using garlic_7_1 dataset I got this result (emphasized depth):
The temporal depth is m (hard coded) and spatial is r (uniform). The last m frames are passed in txr_zed0...txr_zed(m-1) where txr_zed0 is the oldest one. The threshold Th must be chosen so the algo select correct regions!!!
In order this to work properly You should replace txr_zed0 after applying this shader by its result (on CPU side or render to texture and then swap ids ...). Otherwise the spatial Gauss blurring will not be applied to older frames.
Here the preview (outputting red inside the if instead of blurring) for Th=0.01;
As you can see it selects the edges ... So the change (just for chosing Th) is:
// Fragment
#version 420 core
in vec2 pos; // screen position <-1,+1>
out vec4 gl_FragColor; // fragment output color
uniform sampler2D txr_rgb;
uniform sampler2D txr_zed0;
uniform sampler2D txr_zed1;
uniform sampler2D txr_zed2;
uniform sampler2D txr_zed3;
uniform sampler2D txr_zed4;
uniform float xs,ys; // texture resolution
uniform float r; // blur radius
float G(float t)
return 0.0;
void main()
vec2 p;
vec4 rgb;
const int m=5;
// const float Th=0.0015;
const float Th=0.01;
float z[m],zed;
p=0.5*(pos+1.0); // p = pos position in texture
rgb=texture2D(txr_rgb ,p); // rgb color (just for view)
z[0]=texture2D(txr_zed0,p).r; // oldest 2 frames
if (abs(z[0]-z[1])>Th) // threshold depth change
gl_FragColor=vec4(1.0,0.0,0.0,0.0); // debug output
int i;
float x,y,xx,yy,rr,dx,dy,w,w0;
// 2D spatial gauss blur of z0
for (dx=1.0/xs,x=-r,p.x=0.5+(pos.x*0.5)+(x*dx);x<=r;x++,p.x+=dx){ xx=x*x;
for (dy=1.0/ys,y=-r,p.y=0.5+(pos.y*0.5)+(y*dy);y<=r;y++,p.y+=dy){ yy=y*y;
if (xx+yy<=rr)
// fetch depths from up to m frames
// 1D temporal gauss blur
for (zed=0.0,i=1;i<=m;i++) zed+=exp(w0*float(i*i))*z[i-1];
else zed=z[0];
zed*=40.0; // debug view: emphasize depth so its color is visible
// gl_FragColor=rgb; // debug view: render RGB texture
gl_FragColor=vec4(zed,zed,zed,0.0); // render resulting depth texture

Unity - performant options for loading large numbers of vertices to ComputeShader for raytracing?

I've been doing some self-guided training in ComputeShaders using Daerst's awesome Unity Raytracing tutorials (from
I'm currently in the process of extending the raytracer to accept arbitrary mesh objects so I can visualize other objects, rather than just spheres and planes. I wrote my own implementation of Moeller-Trombore using the Wikipedia implementation of a guide, and that worked as expected for low numbers of triangles (order 700).
However, I'm finding that writing a mesh with 80,000 vertices to the buffer is taking an obscene amount of time on the order of minutes. I know this vertex count is pretty low by rendering standards, so I figure there must be something in the way I'm handling this that's causing the performance issues.
To clarify, my performance issue isn't in terms of FPS once the mesh is loaded- for my purposes, anything greater than .1FPS is great! It's loading the mesh that feels like it's going way too slowly.
Here's my ComputeShader code:
#pragma kernel CSMain
RWTexture2D<float4> Result;
float4x4 _CameraToWorld;
float4x4 _CameraInverseProjection;
float4 _DirectionalLight;
float2 _PixelOffset;
Texture2D<float4> _SkyboxTexture;
SamplerState sampler_SkyboxTexture;
static const float PI = 3.14159265f;
float sdot(float3 x, float3 y, float f = 1.0f)
return saturate(dot(x, y) * f);
float energy(float3 color)
return dot(color, 1.0f / 3.0f);
float2 _Pixel;
float _Seed;
float rand()
float result = frac(sin(_Seed / 100.0f * dot(_Pixel, float2(12.9898f, 78.233f))) * 43758.5453f);
_Seed += 1.0f;
return result;
struct Geometry
uint type; //4;
float smoothness; //8;
float3 albedo; //20;
float3 specular; //32;
float3 emission; //44;
int3 verts; //56
StructuredBuffer<Geometry> _Geometries;
StructuredBuffer<float3> _Vertices;
//- RAY
struct Ray
float3 origin;
float3 direction;
float3 energy;
Ray CreateRay(float3 origin, float3 direction)
Ray ray;
ray.origin = origin;
ray.direction = direction; = float3(1.0f, 1.0f, 1.0f);
return ray;
Ray CreateCameraRay(float2 uv)
// Transform the camera origin to world space
float3 origin = mul(_CameraToWorld, float4(0.0f, 0.0f, 0.0f, 1.0f)).xyz;
// Invert the perspective projection of the view-space position
float3 direction = mul(_CameraInverseProjection, float4(uv, 0.0f, 1.0f)).xyz;
// Transform the direction from camera to world space and normalize
direction = mul(_CameraToWorld, float4(direction, 0.0f)).xyz;
direction = normalize(direction);
return CreateRay(origin, direction);
struct RayHit
float3 position;
float distance;
float3 normal;
float3 albedo;
float3 specular;
float smoothness;
float3 emission;
RayHit CreateRayHit()
RayHit hit;
hit.position = float3(0.0f, 0.0f, 0.0f);
hit.distance = 1.#INF;
hit.normal = float3(0.0f, 0.0f, 0.0f);
hit.albedo = float3(0.0f, 0.0f, 0.0f);
hit.specular = float3(0.0f, 0.0f, 0.0f);
hit.smoothness = 0.0f;
hit.emission = float3(0.0f, 0.0f, 0.0f);
return hit;
struct Triangle
float3 vertexA; //12
float3 vertexB; //24
float3 vertexC; //36
float3 albedo; //48
float3 specular; //60
float smoothness; //64
float3 emission; //76
float3 GetTriangleNormal(float3 vA, float3 vB, float3 vC)
return cross(vB-vA, vC-vA);
Triangle TriangleFromGeometry(Geometry geometry)
Triangle tri;
tri.albedo = geometry.albedo;
tri.specular = geometry.specular;
tri.smoothness = geometry.smoothness;
tri.emission = geometry.emission;
tri.vertexA = _Vertices[geometry.verts[0]];
tri.vertexB = _Vertices[geometry.verts[1]];
tri.vertexC = _Vertices[geometry.verts[2]];
return tri;
void IntersectTriangle(Ray ray, inout RayHit bestHit, Triangle tri)
float epsilon = 0.0000001;
float3 pA = tri.vertexA;
float3 pB = tri.vertexB;
float3 pC = tri.vertexC;
float3 edge1 = pB - pA;
float3 edge2 = pC - pA;
float3 rayVector = ray.direction;// - ray.origin;
float3 h = cross(rayVector, edge2);
float a = dot(edge1, h);
if (a > -epsilon && a < epsilon)
float f = 1/a;
float3 s = ray.origin - pA;
float u = f * dot(s, h);
if (u < 0.0f || u> 1.0f)
float3 q = cross(s, edge1);
float v = f * dot(rayVector, q);
if (v < 0.0 || u + v > 1.0)
float t = f * dot(edge2, q);
if (t > epsilon && t < bestHit.distance)
bestHit.distance = t;
bestHit.position = ray.origin + rayVector * t;
bestHit.normal = GetTriangleNormal(pA, pB, pC);
bestHit.albedo = tri.albedo;
bestHit.specular = tri.specular;
bestHit.smoothness = tri.smoothness;
bestHit.emission = tri.emission;
void IntersectGeometry(Ray ray, inout RayHit bestHit, Geometry geometry) {
if (geometry.type == 1) {
Triangle tri = TriangleFromGeometry(geometry);
IntersectTriangle(ray, bestHit,tri);
RayHit Trace(Ray ray)
RayHit bestHit = CreateRayHit();
uint numGeometries, geometryStride;
_Geometries.GetDimensions(numGeometries, geometryStride);
for (uint i = 0; i < numGeometries; i++)
IntersectGeometry(ray, bestHit, _Geometries[i]);
return bestHit;
float3x3 GetTangentSpace(float3 normal)
// Choose a helper vector for the cross product
float3 helper = float3(1, 0, 0);
if (abs(normal.x) > 0.99f)
helper = float3(0, 0, 1);
// Generate vectors
float3 tangent = normalize(cross(normal, helper));
float3 binormal = normalize(cross(normal, tangent));
return float3x3(tangent, binormal, normal);
float3 SampleHemisphere(float3 normal, float alpha)
// Sample the hemisphere, where alpha determines the kind of the sampling
float cosTheta = pow(rand(), 1.0f / (alpha + 1.0f));
float sinTheta = sqrt(1.0f - cosTheta * cosTheta);
float phi = 2 * PI * rand();
float3 tangentSpaceDir = float3(cos(phi) * sinTheta, sin(phi) * sinTheta, cosTheta);
// Transform direction to world space
return mul(tangentSpaceDir, GetTangentSpace(normal));
float SmoothnessToPhongAlpha(float s)
return pow(1000.0f, s * s);
float3 Shade(inout Ray ray, RayHit hit)
if (hit.distance < 1.#INF)
// Calculate chances of diffuse and specular reflection
hit.albedo = min(1.0f - hit.specular, hit.albedo);
float specChance = energy(hit.specular);
float diffChance = energy(hit.albedo);
// Roulette-select the ray's path
float roulette = rand();
if (roulette < specChance)
// Specular reflection
ray.origin = hit.position + hit.normal * 0.001f;
float alpha = SmoothnessToPhongAlpha(hit.smoothness);
ray.direction = SampleHemisphere(reflect(ray.direction, hit.normal), alpha);
float f = (alpha + 2) / (alpha + 1); *= (1.0f / specChance) * hit.specular * sdot(hit.normal, ray.direction, f);
else if (diffChance > 0 && roulette < specChance + diffChance)
// Diffuse reflection
ray.origin = hit.position + hit.normal * 0.001f;
ray.direction = SampleHemisphere(hit.normal, 1.0f); *= (1.0f / diffChance) * hit.albedo;
// Terminate ray = 0.0f;
return hit.emission;
// Erase the ray's energy - the sky doesn't reflect anything = 0.0f;
// Sample the skybox and write it
float theta = acos(ray.direction.y) / -PI;
float phi = atan2(ray.direction.x, -ray.direction.z) / -PI * 0.5f;
return _SkyboxTexture.SampleLevel(sampler_SkyboxTexture, float2(phi, theta), 0).xyz;
void CSMain (uint3 id : SV_DispatchThreadID)
_Pixel = id.xy;
// Get the dimensions of the RenderTexture
uint width, height;
Result.GetDimensions(width, height);
// Transform pixel to [-1,1] range
float2 uv = float2((id.xy + _PixelOffset) / float2(width, height) * 2.0f - 1.0f);
// Get a ray for the UVs
Ray ray = CreateCameraRay(uv);
// Trace and shade the ray
float3 result = float3(0, 0, 0);
for (int i = 0; i < 4; i++)
RayHit hit = Trace(ray);
result += * Shade(ray, hit);
if (!any(
Result[id.xy] = float4(result, 1);
and here's the C# class which writes information to the shader:
using System.Collections.Generic;
using UnityEngine;
using UnityEngine.Profiling;
public class RayTracingMaster : MonoBehaviour
public ComputeShader RayTracingShader;
private RenderTexture _target;
public Texture SkyboxTexture;
private uint _currentSample = 0;
private Material _addMaterial;
private ComputeBuffer _geometryBuffer;
private ComputeBuffer _vertexBuffer;
private Camera _camera;
public Light DirectionalLight;
private RenderTexture _converged;
public int SphereSeed = 2018;
public List<Geometry> geometries = new List<Geometry>();
private List<Vector3> _vertices = new List<Vector3>();
private void Awake()
_camera = GetComponent<Camera>();
private void Update()
if (transform.hasChanged)
_currentSample = 0;
transform.hasChanged = false;
if (DirectionalLight.transform.hasChanged)
_currentSample = 0;
DirectionalLight.transform.hasChanged = false;
private void OnEnable()
_currentSample = 0;
private void OnDisable()
if (_geometryBuffer != null)
private void SetUpScene()
Profiler.BeginSample("Geometry Buffer Creation");
_geometryBuffer = new ComputeBuffer(geometries.Count, 56);
_vertexBuffer = new ComputeBuffer(_vertices.Count, 12);
private void SetShaderParameters()
RayTracingShader.SetMatrix("_CameraToWorld", _camera.cameraToWorldMatrix);
RayTracingShader.SetMatrix("_CameraInverseProjection", _camera.projectionMatrix.inverse);
RayTracingShader.SetVector("_PixelOffset", new Vector2(Random.value, Random.value));
Vector3 l = DirectionalLight.transform.forward;
RayTracingShader.SetVector("_DirectionalLight", new Vector4(l.x, l.y, l.z, DirectionalLight.intensity));
Debug.Log("Geometries: " + _geometryBuffer.count);
RayTracingShader.SetBuffer(0, "_Geometries", _geometryBuffer);
RayTracingShader.SetBuffer(0, "_Vertices", _vertexBuffer);
RayTracingShader.SetFloat("_Seed", Random.value);
RayTracingShader.SetTexture(0, "_SkyboxTexture", SkyboxTexture);
private void OnRenderImage(RenderTexture source, RenderTexture destination)
private void Render(RenderTexture destination)
// Make sure we have a current render target
// Set the target and dispatch the compute shader
RayTracingShader.SetTexture(0, "Result", _target);
int threadGroupsX = Mathf.CeilToInt(Screen.width / 8.0f);
int threadGroupsY = Mathf.CeilToInt(Screen.height / 8.0f);
RayTracingShader.Dispatch(0, threadGroupsX, threadGroupsY, 1);
// Blit the result texture to the screen
if (_addMaterial == null)
_addMaterial = new Material(Shader.Find("Hidden/AddShader"));
_addMaterial.SetFloat("_Sample", _currentSample);
Graphics.Blit(_target, destination, _addMaterial);
private void InitRenderTexture()
if ((_target == null || _target.width != Screen.width || _target.height != Screen.height) || (_converged == null || _converged.width != Screen.width || _converged.height != Screen.height))
// Release render texture if we already have one
if (_target != null)
// Get a render target for Ray Tracing
_target = new RenderTexture(Screen.width, Screen.height, 0,
RenderTextureFormat.ARGBFloat, RenderTextureReadWrite.Linear);
_target.enableRandomWrite = true;
if (_converged != null)
_converged = new RenderTexture(Screen.width, Screen.height, 0, RenderTextureFormat.ARGBFloat,RenderTextureReadWrite.Linear);
_converged.enableRandomWrite = true;
_currentSample = 0;
public Geometry[] ShaderGeometryFromMesh(Mesh mesh, Vector3 albedo, Vector3 specular, float smoothness, Vector3 emission, Vector3 worldScale, Quaternion worldRotation) {
Profiler.BeginSample("Shader Geometry Creation");
Geometry[] geometry = new Geometry[mesh.triangles.Length/3];
for (int i = 0; i < geometry.Length; i++)
int[] vertIndices = new int[3];
for (int j = 0; j < 3; j++)
Vector3 vert = mesh.vertices[mesh.triangles[i * 3 + j]];
if (!_vertices.Contains(vert))
vertIndices[j] = _vertices.IndexOf(vert);
geometry[i] = new Geometry(albedo, specular, smoothness, emission,vertIndices);
return geometry;
public struct Geometry
public uint type;
public Vector3 albedo;
public Vector3 specular;
public float smoothness;
public Vector3 emission;
public Vector3Int verts;
public Geometry( Vector3 albedo, Vector3 specular, float smoothness, Vector3 emission, int[] verts, uint type = 1) {
this.albedo = albedo;
this.specular = specular;
this.smoothness = smoothness;
this.emission = emission;
this.verts = new Vector3Int(verts[0],verts[1],verts[2]);
this.type = type;
Turns out I didn't have the problem I thought I had at all! my bottleneck was the ShaderGeometryFromMesh() method attempted to avoid duplicate vertices by checking if each and every vertex being added was unique. Obviously, this took a ton of time as the list expanded to thousands or tens of thousands of items. Uploading to the buffer is relatively trivial.

How to do real time Raytracing in unity with C#

I am making a video-game in unity, and decided to use ray-tracing. I have the code, But as you will see in a second. It isn't exactly rendering frame by frame.
Here is my raytracing code, this is the main script attached to the main camera.
using UnityEngine;
using System.Collections;
public class RayTracer : MonoBehaviour
public Color backgroundColor =;
public float RenderResolution = 1f;
public float maxDist = 100f;
public int maxRecursion = 4;
private Light[] lights;
private Texture2D renderTexture;
void Awake()
renderTexture = new Texture2D((int)(Screen.width * RenderResolution), (int)(Screen.height * RenderResolution));
lights = FindObjectsOfType(typeof(Light)) as Light[];
void Start()
void OnGUI()
GUI.DrawTexture(new Rect(0, 0, Screen.width, Screen.height), renderTexture);
void RayTrace()
for (int x = 0; x < renderTexture.width; x++)
for (int y = 0; y < renderTexture.height; y++)
Color color =;
Ray ray = GetComponent<Camera>().ScreenPointToRay(new Vector3(x / RenderResolution, y / RenderResolution, 0));
renderTexture.SetPixel(x, y, TraceRay(ray, color, 0));
Color TraceRay(Ray ray, Color color, int recursiveLevel)
if (recursiveLevel < maxRecursion)
RaycastHit hit;
if (Physics.Raycast(ray, out hit, maxDist))
Vector3 viewVector = ray.direction;
Vector3 pos = hit.point + hit.normal * 0.0001f;
Vector3 normal = hit.normal;
RayTracerObject rto = hit.collider.gameObject.GetComponent<RayTracerObject>();
//Does the object we hit have that script?
if (rto == null)
var GO = hit.collider.gameObject;
Debug.Log("Raycast hit failure! On " + + " position " + GO.transform.position.ToString());
return color; //exit out
Material mat = hit.collider.GetComponent<Renderer>().material;
if (mat.mainTexture)
color += (mat.mainTexture as Texture2D).GetPixelBilinear(hit.textureCoord.x, hit.textureCoord.y);
color += mat.color;
color *= TraceLight(rto, viewVector, pos, normal);
if (rto.reflectiveCoeff > 0)
float reflet = 2.0f * Vector3.Dot(viewVector, normal);
Ray newRay = new Ray(pos, viewVector - reflet * normal);
color += rto.reflectiveCoeff * TraceRay(newRay, color, recursiveLevel + 1);
if (rto.transparentCoeff > 0)
Ray newRay = new Ray(hit.point - hit.normal * 0.0001f, viewVector);
color += rto.transparentCoeff * TraceRay(newRay, color, recursiveLevel + 1);
return color;
Color TraceLight(RayTracerObject rto, Vector3 viewVector, Vector3 pos, Vector3 normal)
Color c = RenderSettings.ambientLight;
foreach (Light light in lights)
if (light.enabled)
c += LightTrace(rto, light, viewVector, pos, normal);
return c;
Color LightTrace(RayTracerObject rto, Light light, Vector3 viewVector, Vector3 pos, Vector3 normal)
float dot, distance, contribution;
Vector3 direction;
switch (light.type)
case LightType.Directional:
contribution = 0;
direction = -light.transform.forward;
dot = Vector3.Dot(direction, normal);
if (dot > 0)
if (Physics.Raycast(pos, direction, maxDist))
if (rto.lambertCoeff > 0)
contribution += dot * rto.lambertCoeff;
if (rto.reflectiveCoeff > 0)
if (rto.phongCoeff > 0)
float reflet = 2.0f * Vector3.Dot(viewVector, normal);
Vector3 phongDir = viewVector - reflet * normal;
float phongTerm = max(Vector3.Dot(phongDir, viewVector), 0.0f);
phongTerm = rto.reflectiveCoeff * Mathf.Pow(phongTerm, rto.phongPower) * rto.phongCoeff;
contribution += phongTerm;
if (rto.blinnPhongCoeff > 0)
Vector3 blinnDir = -light.transform.forward - viewVector;
float temp = Mathf.Sqrt(Vector3.Dot(blinnDir, blinnDir));
if (temp != 0.0f)
blinnDir = (1.0f / temp) * blinnDir;
float blinnTerm = max(Vector3.Dot(blinnDir, normal), 0.0f);
blinnTerm = rto.reflectiveCoeff * Mathf.Pow(blinnTerm, rto.blinnPhongPower) * rto.blinnPhongCoeff;
contribution += blinnTerm;
return light.color * light.intensity * contribution;
case LightType.Point:
contribution = 0;
direction = (light.transform.position - pos).normalized;
dot = Vector3.Dot(normal, direction);
distance = Vector3.Distance(pos, light.transform.position);
if ((distance < light.range) && (dot > 0))
if (Physics.Raycast(pos, direction, distance))
if (rto.lambertCoeff > 0)
contribution += dot * rto.lambertCoeff;
if (rto.reflectiveCoeff > 0)
if (rto.phongCoeff > 0)
float reflet = 2.0f * Vector3.Dot(viewVector, normal);
Vector3 phongDir = viewVector - reflet * normal;
float phongTerm = max(Vector3.Dot(phongDir, viewVector), 0.0f);
phongTerm = rto.reflectiveCoeff * Mathf.Pow(phongTerm, rto.phongPower) * rto.phongCoeff;
contribution += phongTerm;
if (rto.blinnPhongCoeff > 0)
Vector3 blinnDir = -light.transform.forward - viewVector;
float temp = Mathf.Sqrt(Vector3.Dot(blinnDir, blinnDir));
if (temp != 0.0f)
blinnDir = (1.0f / temp) * blinnDir;
float blinnTerm = max(Vector3.Dot(blinnDir, normal), 0.0f);
blinnTerm = rto.reflectiveCoeff * Mathf.Pow(blinnTerm, rto.blinnPhongPower) * rto.blinnPhongCoeff;
contribution += blinnTerm;
if (contribution == 0)
return light.color * light.intensity * contribution;
case LightType.Spot:
contribution = 0;
direction = (light.transform.position - pos).normalized;
dot = Vector3.Dot(normal, direction);
distance = Vector3.Distance(pos, light.transform.position);
if (distance < light.range && dot > 0)
float dot2 = Vector3.Dot(-light.transform.forward, direction);
if (dot2 > (1 - light.spotAngle / 180))
if (Physics.Raycast(pos, direction, distance))
if (rto.lambertCoeff > 0)
contribution += dot * rto.lambertCoeff;
if (rto.reflectiveCoeff > 0)
if (rto.phongCoeff > 0)
float reflet = 2.0f * Vector3.Dot(viewVector, normal);
Vector3 phongDir = viewVector - reflet * normal;
float phongTerm = max(Vector3.Dot(phongDir, viewVector), 0.0f);
phongTerm = rto.reflectiveCoeff * Mathf.Pow(phongTerm, rto.phongPower) * rto.phongCoeff;
contribution += phongTerm;
if (rto.blinnPhongCoeff > 0)
Vector3 blinnDir = -light.transform.forward - viewVector;
float temp = Mathf.Sqrt(Vector3.Dot(blinnDir, blinnDir));
if (temp != 0.0f)
blinnDir = (1.0f / temp) * blinnDir;
float blinnTerm = max(Vector3.Dot(blinnDir, normal), 0.0f);
blinnTerm = rto.reflectiveCoeff * Mathf.Pow(blinnTerm, rto.blinnPhongPower) * rto.blinnPhongCoeff;
contribution += blinnTerm;
if (contribution == 0)
return light.color * light.intensity * contribution;
float max(float x0, float x1)
return x0 > x1 ? x0 : x1;
And this is the code attached to the Objects in the scene
using UnityEngine;
using System.Collections;
public class RayTracerObject : MonoBehaviour
public float lambertCoeff = 1f;
public float reflectiveCoeff = 0f;
public float phongCoeff = 1f;
public float phongPower = 2f;
public float blinnPhongCoeff = 1f;
public float blinnPhongPower = 2f;
public float transparentCoeff = 0f;
public Color baseColor = Color.gray;
void Awake()
if (!GetComponent<Renderer>().material.mainTexture)
GetComponent<Renderer>().material.color = baseColor;
How would I go about doing this? And what would the code be?
Though raytracing in the primary thread is a perfectly acceptable design, it's probably not what you want in Unity as it blocks everything else.
Now you could arguably spawn a child thread to perform the raytracing and having the primary thread render the results. The problem though is that neither approach makes use of the GPU which sort of defeats the point using Unity in the first place.
How to do real time Raytracing in unity with C#
It all depends on what your scene consists of and how you intend to render it. You could arguably render something simple in real-time at low resolution, however rendering with a reasonable screen resolution and with reasonable levels of ray bouncing i.e. the number of recursive light rays cast with reflective or transmissive materials would perhaps be much more difficult.
Instead I would urge you to follow the changing trend in raytracing where realtime raytracing is now being performed on the GPU using techniques known as General Purpose GPU or GPGPU. nVidia has some talks on this subject and are available on YouTube. Here is my sample Unity GPGPU galaxy simulation that might prove useful as a background to GPGPU.
Sample GPGPU kernel merely to show you what GPGPU is about:
// File: Galaxy1Compute.compute
// Each #kernel tells which function to compile; you can have many kernels
#pragma kernel UpdateStars
#include "Galaxy.cginc"
// blackmagic
#define BLOCKSIZE 128
RWStructuredBuffer<Star> stars;
Texture2D HueTexture;
// refer to
SamplerState samplerHueTexture;
// time ellapsed since last frame
float deltaTime;
const float Softening=3e4f;
#define Softening2 Softening * Softening
static float G = 6.67300e-11f;
static float DefaultMass = 1000000.0f;
// Do a pre-calculation assuming all the stars have the same mass
static float GMM = G*DefaultMass*DefaultMass;
void UpdateStars (uint3 id : SV_DispatchThreadID)
uint i = id.x;
uint numStars, stride;
stars.GetDimensions(numStars, stride);
float3 position = stars[i].position;
float3 velocity = stars[i].velocity;
float3 A=float3(0,0,0);
for (uint j = 0; j < numStars; j++)
if (i != j)
float3 D = stars[j].position - stars[i].position;
float r = length(D);
float f = GMM / (r * r + Softening2);
A += f * normalize(D);
velocity += A * deltaTime;
position += velocity * deltaTime;
if (i < numStars)
stars[i].velocity = velocity;
stars[i].position = position;
stars[i].accelMagnitude = length(A);
Additionally there are some fine books on the subject. Real-time Volume Graphics, though it covers volumes, it does cover casting rays - the essence of ray-tracing. The hardest paradigm shift is the writing for GPGPU, once you understand it, writing a GPGPU raytracer is an easy step from GPGPU volume shaders.
A marvellous tome to accompany any raytrace author is Matt Pharr's Physically Based Rendering book (there is a 2nd edition but I have not read that)
GPU Ray Tracing in Unity – Part 1
Nvidia announced NVIDIA RTX™, a ray-tracing technology that brings real-time, cinematic-quality rendering to content creators and game developers.
It consists of a ray-tracing engine running on NVIDIA Volta architecture GPUs. It’s designed to support ray tracing through a variety of interfaces.
And these results in bringing the game developers to do raycasting in their work to get a movie quality output.
Unity in the future update would support this new DirectX Raytracing API. Then the game developers can enjoy the photorealistic quality output in their unity rendering pipeline.
So after we all saw a hype around RTX cards, we need to answer a question, what is it actually doing? Well, basically it is hardware accelerated raycaster, which is well optimized to do its job at it.
But nobody said you can't do hardware accelerated raycasting on let's say any other graphics card. In Unity, you have access to hardware acceleration in the form of shaders. you can write your own raycaster with the power of compute shaders. which will be much slower then very very optimized RTX cards but give you an advantage in some areas.
But hey man, since it is slower then RTX why would I need to do so. Well, in general, you can enhance your rendring with this method. For example softening shadows, attempting Global illumination, all sorts of stuff. But to answer your question, you won't be able to do a full-blown raytracing without RTX cards.

Black Screen due to shaders ps_3

I'm currently using shaders in my game, it's working fine with a nVidia GeForceGT330m but with an ATI 4670 (which supports ps_4.1) I encounter a black screen.
Here is the source of the HLSL effect:
struct Explo
float3 position;
float4 color;
float power;
int time;
float2 DisplacementScroll;
texture colortexture;
int nb;
Explo explos[5];
float ambient;
float4 ambientColor;
float screenWidth;
float screenHeight;
sampler ColorMap = sampler_state
Texture = <colortexture>;
float4 CalculateLight(Explo ex, float4 base, float3 pixelPosition)
float3 direction = ex.position - pixelPosition;
float distance = 1 / length(ex.position - pixelPosition) * ex.power;
float amount = max(dot(base, normalize(distance)), 0);
return base * distance * amount * ex.color * ambient;
float4 Explosion(float2 texCoords : TEXCOORD0) : COLOR
//texCoords = tex2D(NormalMap, DisplacementScroll + texCoords / 3)*0.2 - 0.15;
float4 base = tex2D(ColorMap, texCoords);
float3 pixelPosition = float3(screenWidth * (texCoords.x),
screenHeight * (texCoords.y),0);
float4 finalColor = (base * ambientColor * ambient);
for (int i=0; i<nb; i++)
finalColor += CalculateLight(explos[i], base, pixelPosition);
return finalColor;
technique KaBoom
pass Pass1
PixelShader = compile ps_3_0 Explosion();
I remember once I had a similar problem. A shader just didn't work on ATI. The problem was that the vertex and pixel shaders were compiled to different shader models (vs_3_0 and ps_2_0). It worked for NVIDIA, but not for ATI. In your case you're only binding a pixel shader for the pass and who knows what the last vertex shader was.
Granted, this is relevant only if you're dead sure the problem is with the shader and not something else, e.g. your DIPs.
Good luck
