Visit the About Me page for more extensive information on my current projects and experience
Posted: 2016-02-25 00:00:00

Parallax Occlusion Mapping ~ First Attempt

Parallax occlusion mapping (POM) is a form of image-space (/fragment shader based) displacement. It aims to approximate the visual displacement of bumpy surfaces by using a displacement map to offset the rendering position of a given pixel on a surface.

- Does not require complex tesselation processes
- Relatively straight forward to achieve
- Can work on any angled surface
- Effect only runs for visible pixels

- High end effect, still too slow to achieve in real time (along side other effects) on slower Computers.
- Requires a higher number of samples for more significant displacement.
- Texture distortion when up close (caused by stretching textures)
- Difficult to align along hard edges


This is my first attempt at creating the effect. Whilst I am proud to say that I came up with the method myself based on my understanding of vector projection and spaces, my current implementation is flawed and I will try to explain why. Nonetheless this is an exciting start to an effect which has huge potential to look fantastic in a real game scenario!

My Method

My intuition on how to achieve the effect broke down into a number of simple steps. Create a new vector basis consisting of using the surface normal (which points outwards from a surface), then two orthogonal vectors (that I will call tangent and bitangent) which exist within the surface of any given polygon.

Once I had this, I then needed to calculate the vector from the eye to the initial position on the surface of the rendered pixel.

Now comes the fun part, given a displacement map, I assumed that this map would represent a virtual offset. To do this, I set it up in such a way that black meant 0 displacement, and white meant 100% displacement, and I could visualise any flat surface as follows:
Now given this representation, It is time to transform our coordinates into the this newly defined local space which I will call tangent-space.

Now i've talked about the conversion process, an interesting thing to note, is that we ALREADY have a coordinate in this exact system, and that is the texture coordinate of the pixel at that given point. However alone, this is not very useful.
What we want is a means of adding a 3rd dimension to this coordinate system, i.e a depth. We can achieve this by allowing our transformation mapping to happen as follows:

world space x -> texture_u (tangent space x)
world space y -> texture_v (tangent space y)
world space z -> flipped_screen_space_normal (So that it points down into the surface.) (tangent space z)

Now given a point on the surface, we need to be able to step along the inside of the surface until our ray intersects the heightmap offset at a given point.
The following process sums up what we want to now achieve:

1) Transform View Vector (Camera to world space pixel coordinate) into tangent space (as defined above).

2) Begin raymarching with a step distance equal to a pre-defined step-size constant multiplied by our normalized tangent space view vector.

3) Sample depth at each step, compare depth of heightmap to tangent space z. If sample depth is greater than tangent depth, we have intersected.

4) Return tangent space (x, y). This will represent our new texture coordinate.

WHERE tangent depth = tangent space z coordinate (0 = on surface of triangle, 1 = maximum offset defined by some scaling constant)

This will give us our desired point of intersection, and thus equate to a new texture coordinate for that pixel. When repeated for every pixel, we will achieve parallax mapping. It is worth noting however at this stage, we have only considered a simple implementation which will have a number of artifacts and ultimately require a large number of maximum samples to achieve good results.

Improving the result

We can improve the result and optimise the process by introducing a simple binary search in our ray march. As we are stepping in fixed intervals defined by a constant, the precision of our offset is defined by this step size, and the maximum height we can achieve is defined by the total number of samples we are willing to perform.

For production code, we would ideally want <= 30 samples in total per pixel. We can drastically improve our results by allocating 20 samples towards our initial reconstruction, and then a further 5 samples for a binary search.

The way this works, is that given a smaller number of samples, we know the true corrected coordinate lies somewhere within the last band, equal in length to the maximum possible offset / number of samples.
We also know that the band is equal to this length multiplied by our tangent-space view vector.

In order to perform a binary search to improve our result, we do the following:

1) Find the tangent space coordinate at the start of that step.

2) Find the tangent space coordinate at the end of the step we are in.

3) Find the midpoint of that band.

4) Perform the same test (i.e Compare sample depth to the heightmap depth). If sample depth is greater than heightmap depth, we are still "inside" the texture, so move our result to the midpoint, and repeat binary search using the FIRST HALF of the step as the input. Otherwise, we are outside of the texture and have gone too far back, so keep our result where it is, and repeat the process on the SECOND HALF of step.

Note how each time, we are dividing the size of the step by 2 meaning we are giving a drastic increase to precision with few numbers of iterations. After a number of samples the result will converge:

As you can see, merely 3 samples will put our coordinate almost dead-on where we want it to be. In practise, the will give the effect of smoothing any banding between results.

Conclusion of my implementation

Whilst I believe that in theory, my implemtation is good, the actual result is flawed. Whilst the effect works, it does not work correctly along the edges of geometry. This is because I believe my conversion is incorrect in places and leads to the heightmap being pulled out of the texture, rather than being pushed in.
This is because when you walk near the edge of geometry, the top height pixels warp, whereas the pixels with the greatest depth should be the ones that warp and the ones which closely align with their original position should stay where they are.

My implementation also suffers from another subtle warping artifact in which pixels closer to the camera appear to curve/ripple. This could be a fault of my initial world-space reconsutrction,

or could perhaps be an issue associated with using a linear depth buffer rather than an exponential one for that conversion, as the near pixels are proportionally less-precise.

Ultimately, more work will need to be done to improve the result. Performance-wise, the effect works well and appears to perform reasonably well!