Anybody here into non-realtime graphics rendering?

SharkyUK · Jan 8, 2018

sn00p said:
Nice.

I’ve managed to get an apprentice which has taken a load of work off me, took him a while to get up to speed with the Qt desktop software I wrote (he was amazed one person wrote it). I currently have a 24Ghz spectrum analyser set up on my desk in the office as I’ve been bringing up the necessary code to drive our own design of radar antenna, culmination of a years work!

An apprentice?! I hope he's up to standard mate!

Congratulations on the work and effort over the past year; I can imagine how much work has gone into that. I'm sure that my current client could do with someone with your skills at the moment! Are you effectively at a stage now where you have a 'product' based on your work?

STEVE.M said:
Can any of you check your tyre pressures?

FLOL!

Piece of cake, mate. I've even changed headlight bulbs, oil and filter AND successfully bled my brakes. Just. It was a struggle.

sn00p · Jan 8, 2018

SharkyUK said:
An apprentice?! I hope he's up to standard mate!
Congratulations on the work and effort over the past year; I can imagine how much work has gone into that. I'm sure that my current client could do with someone with your skills at the moment! Are you effectively at a stage now where you have a 'product' based on your work?

FLOL! Piece of cake, mate. I've even changed headlight bulbs, oil and filter AND successfully bled my brakes. Just. It was a struggle.

Yeah, product has been shipping for a couple of years, but using somebody else’s antenna. Problem is, that due to the collapse of the pound and impending Brexit we needed to bring the antenna part in House. Lots of specialist equipment (£££) and a load of very complex engineering and time but we have a solution for all our products. Planar antennas are black magic and voodoo.

There aren’t many manufacturers of antenna in the world, maybe 3 or 4 so having our own design which we can manufacture easily is a massive bonus for us, it’s cost a fortune but means we’re no longer tied to a manufacturer anymore, now the complete product is all our own design.

I also wrote a java virtual machine to run on the micro controller (20kb approx of ram for it) which will be put to use in the future as well.

SharkyUK · Jan 8, 2018

sn00p said:
Yeah, product has been shipping for a couple of years, but using somebody else’s antenna. Problem is, that due to the collapse of the pound and impending Brexit we needed to bring the antenna part in House. Lots of specialist equipment (£££) and a load of very complex engineering and time but we have a solution for all our products. Planar antennas are black magic and voodoo.

There aren’t many manufacturers of antenna in the world, maybe 3 or 4 so having our own design which we can manufacture easily is a massive bonus for us, it’s cost a fortune but means we’re no longer tied to a manufacturer anymore, now the complete product is all our own design.

I also wrote a java virtual machine to run on the micro controller (20kb approx of ram for it) which will be put to use in the future as well.

Fair play - it makes sense bringing it all in-house given the uncertain future and specialist domain in which you're working. I'm sure the investment will be repaid in no time, if not already. And several times over. I won't lie though... I'm not sure I would have the patience and will/desire to write a Java VM...

Hahahaha!

SharkyUK · Jan 8, 2018

Finally can go to bed now I've fixed the bug and my virtual camera is working...

Here's a video of the first attempt at running my path tracer running on GPU (CUDA) in real time. It's a bit basic and 'brute force' right now, but it's a start. It's fully interactive with a real time camera (WASD) and OpenGL is used to display the rendered image. I appreciate this will be extremely boring and look crap to most people!

C++ (Visual Studio) / Win 10 Pro (64-bit)
nVidia Geforce GTX 1080 Ti
Intel i7-6900K

Running at 60fps...

sn00p · Jan 8, 2018

That's very cool.

sn00p · Jan 8, 2018

SharkyUK said:
Fair play - it makes sense bringing it all in-house given the uncertain future and specialist domain in which you're working. I'm sure the investment will be repaid in no time, if not already. And several times over. I won't lie though... I'm not sure I would have the patience and will/desire to write a Java VM... :smile: Hahahaha!

The amount of equipment we've bought over the past few years is staggering. We have a state of the art pick and place line with solder jet printer. Means we can do JIT production and we can build prototype hardware in house as well.

Not bad going considering we accidentally stumbled into the radar market.

SharkyUK · Jan 9, 2018

sn00p said:
The amount of equipment we've bought over the past few years is staggering. We have a state of the art pick and place line with solder jet printer. Means we can do JIT production and we can build prototype hardware in house as well.

Not bad going considering we accidentally stumbled into the radar market.

No, not bad going at all mate! It's funny how things work out sometimes. I don't know much about radar at all - it goes way, WAY over my head and level of understanding. It sounds interesting when I hear folks talking about it in work but it's definitely - as you say - voodoo and black magic!

SharkyUK · Jan 9, 2018

A couple more images to add showing off the development of the GPU-based path tracer. I implemented depth of field last night through the addition of focal distance and aperture properties on the camera. These are both fully controllable on-the-fly and work in pretty much the same way as with a real camera (simulating the light coming into the camera through the aperture and hitting the image plane).

I also added a new metal-like material type. It exhibits Phong and glossy attributes to produce the sort of result seen in the central in-focus sphere in the following image.

SharkyUK · Jan 10, 2018

Another video of the CUDA-based path tracer. This stuff is so addictive. Added depth of field and a couple more basic material types (that need work). Video compression doesn't work so well with the noisier imagery so... at least you get the idea. I think there's a nVidia Volta out there with my name on...!

SharkyUK · Jan 11, 2018

More work this evening on something called image-based lighting (IBL). It's a common technique used by the movie industry (and faked by the game industry) to provide global illumination within a 3D scene - i.e. indirect lighting in a 3D scene that isn't specifically emitted from light sources placed into the world.

In my case I'm using a high dynamic range (HDR) radiance map, stored as a 32-bit floating point angular map light probe. Quite a mouthful. It's basically a 'photograph' of the environment in which the light intensity (radiance) is also captured [in the image] along with the colour information. This can then be used to generate indirect lighting for the scene and has the effect of grounding the objects more solidly into the world - i.e. making them look more realistic thanks to objects in the scene being affected by said indirect lighting from the radiance map.

The following images don't have any direct lights in them at all; they are all wholly produced from indirect light bouncing around the scene and the intensity of the light stored in the radiance map. The exact same scene is shown in each image, the only difference being the radiance map used to render the scene.

Just changing the lighting environment can have a significant impact on the final result. I'll make a video to show the effect when I get chance.

SharkyUK · Jan 16, 2018

Here's a video showing a few different lighting environments (as per my last post).

SharkyUK · Jan 22, 2018

Slightly tweaked the way the materials are handled. Still not quite right but some pretty realistic results can be achieved nonetheless. Running on my GPU these scenes can be interacted with in real time, although it still take a few minutes to progressively refine the quality to the level seen in these images. Running across 8 cores / 16 threads on my CPU the same scenes take HOURS to reach the an equal level of quality!

SharkyUK · Jan 24, 2018

Another evening and another feature

This time I've added motion blur, which is determined by the object's velocity, time sample (period over which the motion is sampled) and the camera shutter speed. It only works on sphere primitives at the moment.

(One day I might get a life and get laid... you never know...)

Darren S · Jan 24, 2018

@SharkyUK - realistically Andy, what would a PC fitted with a Quadro GP100 give you with the likes of the above?

Would it dramatically improve the rendering time, or are these types of images not really the type to gain benefits of a Quadro? Are the gains to be seen in the likes of architectural or engineering designs instead?

SharkyUK · Jan 24, 2018

@Darren S - Good question

In the likes of the above the only thing the Quadro GP100 would leave you with is a much emptier wallet...! To be fair, the Big Pascal Quadro GP100 is a bit of a strange card from nVidia and, I think, has a quite specific market. It's got great performance for graphics and FP32 compute, but not as good as the lesser Quadro P6000 (I say lesser yet the P6000 still costs over 6 grand). It's the FP32 performance that is key here and considering the fact that Titan Xp and 1080 Ti perform better than the Quadro P6000 we can quickly see that:

Titan Xp / 1080 Ti > Quadro P6000 > Quadro GP100

Of course, things are never quite as clear cut as that and the comparison I give here is very much based on the fact that we are only talking CUDA / OpenCL path tracing and ray tracing type applications that use single-precision (FP32).

The majority of rendering tasks on GPU don't require FP64 double-precision. In fact, you can quite easily produce movie quality CGI using FP32 as it affords plenty enough precision to generate realistic visuals (time and resources permitting). As a result, you can happily use a mainstream Geforce GTX with CUDA support for handling the grunt work and, in pretty much most cases, you will find that the Quadro is out-performed by the gaming Geforce GTX (or Titan) GPU. The number of CUDA cores and the clock speeds are key here. The more CUDA cores you have, and the faster they can run, the better. If we take a look at the GP100 and the 1080 Ti they appear similar in terms of CUDA core counts and clock speeds (more or less) but the 1080 Ti has the edge with higher core clock and boost clock rates. Simply put, the 1080 Ti has the faster and better FP32 compute performance as a result. I have seen the likes of the 1080 Ti (and Titan Xp) outperform the Quadro GP100 by some 30% in progressive refinement rendering tasks (such as Monte Carlo path tracing). Of course, the GP100 does have the higher bandwidth HBM2 memory interface and greater capacity for throwing data around... but this doesn't really have much of an impact with these sorts of rendering tasks (which are low latency). You don't need that bandwidth with this GPU compute tasking. Hence, to answer your question, the GP100 would offer no benefits for the above and would result in a lower-performance path tracer.

However... the Quadro GP100 does have additional benefits if you need the GPU for other tasks, such as CAD. It will run circles around the gaming cards for pushing polygons in a CAD environment where wireframe views, 2-sided lighting, etc. are king. It also has ECC memory, is more robust and is designed to run for prolonged periods whilst drawing less power. There are many other benefits too but, alas, nothing that really would make you choose it over a much cheaper 1080 Ti for rendering using GPU compute.

That said, the higher onboard RAM capacity can be useful with the Quadro cards. When rendering with GPU compute you really need to have your ENTIRE scene in memory at once to get maximum performance benefit. As soon as you exhaust GPU memory you're often in a whole world of pain. Some systems will drop back to a CPU-based render path (ouch, MUCH slower) or employ some clever page-swapping, which also massively hurts GPU compute performance - although likely not to the extent that a pure CPU-based render path would incur. Hence, a heap of onboard RAM is always a bonus.

sn00p · Jan 25, 2018

SharkyUK said:
Slightly tweaked the way the materials are handled. Still not quite right but some pretty realistic results can be achieved nonetheless. Running on my GPU these scenes can be interacted with in real time, although it still take a few minutes to progressively refine the quality to the level seen in these images. Running across 8 cores / 16 threads on my CPU the same scenes take HOURS to reach the an equal level of quality!

What CPU is that?

I just replaced my machine in the office, I’m rocking the built in graphics as I only ever access the machine over RDP....

I Posted a pic on Facebook of the task manager and my mate said “that’s not a PC, it’s a server!”

Coffee lake 6 core with 32 gig of ram, builds fly on it!

SharkyUK · Jan 26, 2018

sn00p said:
What CPU is that?

I just replaced my machine in the office, I’m rocking the built in graphics as I only ever access the machine over RDP....

I Posted a pic on Facebook of the task manager and my mate said “that’s not a PC, it’s a server!”

Coffee lake 6 core with 32 gig of ram, builds fly on it!

Yeah, I'm not surprised they fly on that machine.

I have a new CPU sat here in front of me ready to go into action, but it will be a week or two away yet as I am still deciding on which cooling option to take. That new CPU is an Intel i9-7900X Skylake-X with 10 cores. At the moment I am running an Intel i7 6900K Broadwell-Extreme with 8 cores. I also have 64GB of G.Skill TridentZ quad channel memory to help things along.

Darren S · Jan 26, 2018

SharkyUK said:
@Darren S - Good question In the likes of the above the only thing the Quadro GP100 would leave you with is a much emptier wallet...! To be fair, the Big Pascal Quadro GP100 is a bit of a strange card from nVidia and, I think, has a quite specific market. It's got great performance for graphics and FP32 compute, but not as good as the lesser Quadro P6000 (I say lesser yet the P6000 still costs over 6 grand). It's the FP32 performance that is key here and considering the fact that Titan Xp and 1080 Ti perform better than the Quadro P6000 we can quickly see that:

Titan Xp / 1080 Ti > Quadro P6000 > Quadro GP100

Of course, things are never quite as clear cut as that and the comparison I give here is very much based on the fact that we are only talking CUDA / OpenCL path tracing and ray tracing type applications that use single-precision (FP32).

The majority of rendering tasks on GPU don't require FP64 double-precision. In fact, you can quite easily produce movie quality CGI using FP32 as it affords plenty enough precision to generate realistic visuals (time and resources permitting). As a result, you can happily use a mainstream Geforce GTX with CUDA support for handling the grunt work and, in pretty much most cases, you will find that the Quadro is out-performed by the gaming Geforce GTX (or Titan) GPU. The number of CUDA cores and the clock speeds are key here. The more CUDA cores you have, and the faster they can run, the better. If we take a look at the GP100 and the 1080 Ti they appear similar in terms of CUDA core counts and clock speeds (more or less) but the 1080 Ti has the edge with higher core clock and boost clock rates. Simply put, the 1080 Ti has the faster and better FP32 compute performance as a result. I have seen the likes of the 1080 Ti (and Titan Xp) outperform the Quadro GP100 by some 30% in progressive refinement rendering tasks (such as Monte Carlo path tracing). Of course, the GP100 does have the higher bandwidth HBM2 memory interface and greater capacity for throwing data around... but this doesn't really have much of an impact with these sorts of rendering tasks (which are low latency). You don't need that bandwidth with this GPU compute tasking. Hence, to answer your question, the GP100 would offer no benefits for the above and would result in a lower-performance path tracer.

However... the Quadro GP100 does have additional benefits if you need the GPU for other tasks, such as CAD. It will run circles around the gaming cards for pushing polygons in a CAD environment where wireframe views, 2-sided lighting, etc. are king. It also has ECC memory, is more robust and is designed to run for prolonged periods whilst drawing less power. There are many other benefits too but, alas, nothing that really would make you choose it over a much cheaper 1080 Ti for rendering using GPU compute.

That said, the higher onboard RAM capacity can be useful with the Quadro cards. When rendering with GPU compute you really need to have your ENTIRE scene in memory at once to get maximum performance benefit. As soon as you exhaust GPU memory you're often in a whole world of pain. Some systems will drop back to a CPU-based render path (ouch, MUCH slower) or employ some clever page-swapping, which also massively hurts GPU compute performance - although likely not to the extent that a pure CPU-based render path would incur. Hence, a heap of onboard RAM is always a bonus.

Interesting stuff Andy!

Looking at it - the Quadros don’t appear to use SLi either, more the NVLink - which I assume having the potential of 32GB of memory available, would be a big benefit it in the right circumstances?

What I didn’t know was just how energy efficient these things are compared to the Titans and Ti cards. A significant improvement! But like you say, they are probably designed to work flat out for hours on jobs - which really wouldn’t be a design consideration for most of the mainstream graphics cards, I would have thought?

So for your line of work, the non-Quadro cards would be the overall winner then? [emoji41]

sn00p · Jan 26, 2018

SharkyUK said:
Yeah, I'm not surprised they fly on that machine. :smile:

I have a new CPU sat here in front of me ready to go into action, but it will be a week or two away yet as I am still deciding on which cooling option to take. That new CPU is an Intel i9-7900X Skylake-X with 10 cores. At the moment I am running an Intel i7 6900K Broadwell-Extreme with 8 cores. I also have 64GB of G.Skill TridentZ quad channel memory to help things along. :smile:

Well I'm moist. :smile:

SharkyUK · Jan 27, 2018

Darren S said:
Interesting stuff Andy!

Looking at it - the Quadros don’t appear to use SLi either, more the NVLink - which I assume having the potential of 32GB of memory available, would be a big benefit it in the right circumstances?

What I didn’t know was just how energy efficient these things are compared to the Titans and Ti cards. A significant improvement! But like you say, they are probably designed to work flat out for hours on jobs - which really wouldn’t be a design consideration for most of the mainstream graphics cards, I would have thought?

So for your line of work, the non-Quadro cards would be the overall winner then? [emoji41]

Yeah, the GP100 range have NVLINK which considerably ups the bandwidth between two GPUs. The 32GB memory is definitely a bonus, too. However, having not explored that particular avenue in depth as yet, I'm not sure whether or not each GPU still has to maintain its own copy of the data for rendering the scene or whether a not a single copy of the data in the 'shared' memory is now sufficient. In my setup, I am limited to the amount of RAM on a single GPU. Whilst my path tracer would scale quite nicely on multi-GPU setups I still have to provide a copy of the 3D scene data to each GPU wholly. This may no longer be necessary on the Quadro GP100... not sure buddy!

The newer Unified Memory has certainly made coding a lot easier (although I'm not currently taking full advantage of it). Originally you had to effectively create two copies data for processing - a copy on the CPU and the 'mirrored' copy for the GPU. The system then keeps them synchronised and data can be copied to/from one to the other at suitable points. This made programming messy as you had to maintain host (CPU) and device (GPU) copies of pointers, memory allocations, etc. Now, you can just create the data and allocations once and the system is clever enough to use that single instance on both CPU and GPU. Some black magic goes on under the hood. It just makes for cleaner and easier coding.

Yes - the Quadro cards (and Teslas) are designed for heavy-lifting and heavy workloads. You use them where you need robustness and system stability. Heavy gamers might use their gaming rigs for a few hours a night and at weekends but Quadros in workstations are often working constantly throughout the week on design projects and during regular office hours, day after day. Same with Teslas... they need to be seriously stable due to running in supercomputer arrays (possibly) 24/7.

So, for this line of work the 1080 Ti is great and yields better performance. BUT... if I were rendering commercially for a film and needed arrays of these things then I'd be looking at the likes of the GP100-based Teslas. They offer incredible performance when scaled-up and are designed for these types of tasks (e.g. render farms) but obviously come at a heck of a cost. Mind you, I'm sure Disney and Pixar have a few quid in the bank.

I'm a few years away from going commercial with my homebrew path tracing project so mainstream gaming cards will be just fine for me. For now.

sn00p said:
Well I'm moist. :smile:

Me too, mate. Constantly. It's the sweat due to the heat these things kick out!!!!

SharkyUK · Jan 27, 2018

So... I managed to implement a simple tri-mesh loader for loading 3d objects from basic .ply files (they only contain the bare minimum geometric data - i.e. vertex positions and face descriptions - face normals are calculated on the fly). That was the easy bit. I then managed to get my first pass of a bounding volume hierarchy (BVH) working using surface area heuristics (SAH) to determine the splitting of the node and leaf structures within the hierarchy. This is all done on the CPU and results in the scene geometry (i.e. triangle polys) being loaded and placed in some ordered hierarchy in the BVH for optimal processing. I then create another BVH structure that mirrors the CPU version, albeit in a GPU cache-friendly way. This involves lots of tables, indices and offsets so that the CPU hierarchy can easily be packed into textures (data arrays) for processing on the GPU. The BVH is then saved to file (to save having to recreate it every time) and the GPU-friendly BVH is uploaded to the GPU along with scene geometry, camera information, etc. to allow the scene to be rendered.

Here's a selection of images from the first test of this system.

It's just a shame that this work has pretty much screwed up every other part of my software! Back to the drawing board...

sn00p · Jan 28, 2018

SharkyUK said:
It's just a shame that this work has pretty much screwed up every other part of my software! Back to the drawing board...

That’s why god invented GIT!

SharkyUK · Jan 30, 2018

Slowly working on improving the tri-mesh handling. It's quite enjoyable work (well, if you're wired like @sn00p and myself!) as you really have to think about your data structures and how you package that data up. It's surprising how quickly you burn through memory when dealing with 3D worlds, textures, material definitions, etc. I might have a whopping 64GB to play with in my system but I am 'limited' to the 11GB of RAM on my GPU.

A few more random images...

The 'noise' (grain) is a by-product of the Monte Carlo method used to sample the image. Basically, the algorithm integrates over all the illuminance arriving at a specific point on an object's surface. The illuminance is then modulated by a BRDF (bidirectional reflectance distribution function) which effectively determines how much of that illuminance is directed by towards the viewer (and how much bounces elsewhere throughout the 3D world. This process is done over and over for every single pixel in the image and the illuminance is averaged by the number of samples that each pixel has been subjected to. The unbiased and pseudo-random nature of the paths traced through the scene means that, over time, the image converges and gradually tends towards the 'perfect' image - i.e. less noise and more realistic aesthetics. By mixing this technique with BRDF's that are physically based we can create realistic looking imagery that is indistinguishable from the real thing.

The Millenium Falcon was rendered at 4k resolution and it took about 20 minutes to reach the quality depicted (ignoring jpeg compression artifacts). The same image / quality settings using the CPU would have taken hours and hours (and would not have been fully interactive either).

The skull has a certain level of realism despite only being rendered using a basic diffuse material. My system can't replicate the look of bone yet.

Again, this is all rendered with zero direct lighting and is all wholly lit by the indirect light from the surrounding environment. This is sometimes referred to as global illumination (or indirect illumination).

Erm... a zombie head.

sn00p · Jan 30, 2018

Awesome stuff dude. We are just working on a new design and it looks like I will have a whopping 1 megabyte of ram to play with! Lol

sn00p · Feb 5, 2018

You know what I said about GIT...

I moved our Gitlab server to our new NAS, my developer was trying to push some changes to a project but none of the remote branches were showing up....

Turns out the backup didn’t restore that project, so I redid the backup from the old server and restored it again on the new one, this time everything was fine.

But in the mean time, I’d removed origin from his working copy and when I added it back accidentally added a different project repository which has the same working copy names.

I realised, so changed it to the new one, he pushed his changes and all was good.

Until I pulled them, now I seem to have a hybrid of both projects in one with completely separate branches under the name branch name.

I think tomorrow I’m going to be restoring the backup again, getting him to archive his working copy, pull down a new working copy and then manually copy his changed source files back over,

What a f**k up. What’s really weird is that on the server it all looks fine, but it’s buggered in my working copy.

First time I’ve had a git calamity.

SharkyUK · Feb 8, 2018

sn00p said:
You know what I said about GIT...

Whoa! That sounds like an interesting 'mix-up' somewhere in the depths of GIT... have you got it sorted now mate?

On the plus side, at least you're not using Subversion / SVN...
Oh my word! I can't say too much in public but what a mess. Just imagine what would happen if a project became so large and complex that the underlying versioning / control software couldn't handle it properly and you lost confidence in it to the point that you couldn't guarantee that check-in/-outs were correct and that grabbing the latest version of something was, in fact, the latest version...

sn00p · Feb 9, 2018

SharkyUK said:
Whoa! That sounds like an interesting 'mix-up' somewhere in the depths of GIT... have you got it sorted now mate?

On the plus side, at least you're not using Subversion / SVN...
Oh my word! I can't say too much in public but what a mess. Just imagine what would happen if a project became so large and complex that the underlying versioning / control software couldn't handle it properly and you lost confidence in it to the point that you couldn't guarantee that check-in/-outs were correct and that grabbing the latest version of something was, in fact, the latest version...

Yeah, it was very weird. All I did was pull down the changes from another branch but then hell let loose. What I discovered though was that everything looked correct on the gitlab server, so I checked another machine and low and behold the code looked correct there.

So deleted by local copy and pulled down from the server again and all was right with the world.

Absolutely no idea what happened or how it happened, but I certainly didn't merge the projects my end so something went a bit wonky somewhere (I use gitkracken).

Anyway, after 2-3 years we've now labelled that platform as stable and "feature complete", I've started work on the next generation of product now.

SharkyUK · Feb 26, 2018

I've not had much chance to work on the software recently due to a mixture of ill-health, crazy work schedule and completing one contract whilst preparing my next one! However, I did fix a few little issues and also added support for simple material import support. It's not great, but it's a start - I can now create scenes in Blender / 3DSMax / Maya and export .obj / .mtl files. It's an old 3D format by today's standards but it's relatively simple to parse and as good a place to start as any. There's still no texture (or uv) support and I have noticed a few glitches in the rendered images (which I think are due to incorrectly calculated normals and/or errors in the barycentric coordinate calculations.

Test Render by Andy Eder, on Flickr

SharkyUK · Mar 3, 2018

A few more images...

Test Render by Andy Eder, on Flickr

Test Render by Andy Eder, on Flickr

...and another video update on the path tracer project.

SharkyUK · Mar 5, 2018

I managed to get hold of a free Pagani Zonda C12 model so decided to throw it in and see what came out. I definitely have a few issues / bugs with the geometry normals but I haven't got a clue what is causing them yet - in the debugger the values appear to be valid and as expected.

Test Render by Andy Eder, on Flickr

SharkyUK · Mar 6, 2018

Meh.

Test Render by Andy Eder, on Flickr

SharkyUK · Mar 16, 2018

Finally fixed a bug that's been annoying the tits off me for weeks.

SharkyUK · Mar 18, 2018

With the recent bug fixes going in I can now start to look at adding more / new features again. Hence, I decided to add simple texture / uv mapping support this weekend... which went in surprisingly easily thanks to the fact that I actually have a pretty good idea what I'm doing these days after 25 years doing this stuff!

I've just got to figure out the best way of getting the texture data to the GPU and associating textures with materials. Here are a few test renders with simple 3D models and basic diffuse / albedo texture maps applied, rendered using a HDRi environment map for the global illumination in an attempt to 'sit' the models into the scene.

Test Render by Andy Eder, on Flickr

SharkyUK · Mar 22, 2018

A few more bug fixes and a massive 'hack' to allow lots of textures to be accessed by the CUDA render kernel. There are still a few issues with rendering artifacts due to screwed-up normals but I can't for the life of me track down the underlying problem as yet. Another batch of test images...

Test Render by Andy Eder, on Flickr

Test Render by Andy Eder, on Flickr

All of the above were running at 1080p at interactive framerates, although the final rendered image did take a minute or two to converge.

SharkyUK · Mar 31, 2018

Fixed another issue... I figured out why the colours look a bit washed-out (for example, the juice bottle and wooded table top in the previous post). I wasn't converting the texture maps to the correct colour space (and wasn't taking gamma correction into account). Here's the fixed version of the image:

Test Render by Andy Eder, on Flickr

A couple more screenshots:

Test Render by Andy Eder, on Flickr

Test Render by Andy Eder, on Flickr

And the latest video showing the (limited) texture mapping support in action:

SharkyUK · May 18, 2018

Up to this point the path tracer I've been working on uses a static BVH for all the elements in a scene. So... I made a few changes to show that it can also handle dynamic elements as well. I wrote a simple sphere-based physics simulation to provide dynamic motion for the spheres in a scene and then let my path tracer handle the rendering of the simulation in real time. The physics simulation is a little buggy but good enough for my needs. I also added functionality to allow manipulation of the image-based lighting. By simply rotating the environment image (that contains colour and light intensity information) you can easily change the entire lighting setup for the scene. Here's a quick video I put together:

Next up I plan to make a few improvements to the tone mapping and to add simple post processing to provide a bloom effect for high intensity lighting.

Darren S · May 21, 2018

Andy - in the clip above, does the inclusion of moving (and reflective) objects in what I'd class as the 'middle ground' - have a significant impact on performance?

By that, you have the reflective balls on the table, showing the alleyway on them as they rotate. If there were say a dozen people walking up and down that alleyway - not only do the balls have to reflect the static background of the alleyway itself - but also now deal with the reflections of the moving people within the sequence. Would that significantly increase the burden on a real-time sequence - not necessarily a pre-rendered one?

SharkyUK · May 22, 2018

@Darren S - I think I understand what you are saying...

First of all, let's get the physics out of the way! As you are no doubt aware it it possible to run physics simulations on the GPU these days. However, this is not the case here (the GPU is purely being used to trace the rays/paths of light bouncing through the scene and ultimately generating the resulting image). My little physics simulation is running on the CPU. I take the results of the physics simulation and 'mirror' them to their corresponding GPU counterparts (effectively updating their positions, velocities, etc). The results of the CPU simulation have to be uploaded to the GPU each frame (the physics simulation is running at 60 fps independent of the rendering frame rate). Due to the fact that the physics simulation is quite simple - and running on a rather tasty CPU - the performance hit is pretty much negligible. Consequently, the dynamic nature of the balls don't impact much at all on performance. It's the rendering of the balls that really impact the performance in a significant way.

If we consider the balls in the first part of the demo there are several different materials applied to them. It's in the realisation of these materials where MASSIVE performance gains or losses can be found, due to the complexity of the mathematics behind them. Likewise, the complexity of the materials and how much of the view is covered by those complex materials also makes a big difference. If you are rendering a single ball that is some distance from the camera then it might only cover an area of, say, 10x10 pixels (100 pixels) in total. However, if that same ball is right in front of the camera and occupying the entire view then you suddenly have to consider, say, an area of 1920x1080 pixels (2,073,600 pixels) for a HD display. As path tracing is performed 'per pixel' then you suddenly have MANY more computations to crunch through.

The following materials types were used in the demo - diffuse, metal, emissive, 'coat', ideal specular, ideal refraction - and they all have varying levels of computational complexity and cost. The ideal specular material (i.e. the mirror chrome-like material) is quite 'cheap' to render compared to the ideal refraction material (i.e. the glass-like material) for example. I'm not sure you'd be too interested in the reasons why so I won't go into detail here as it might get a bit boring!

My point is that some materials are considerably more complex to simulate and the more pixel area those materials cover, the more the performance is impacted. In terms of ray/path calculations we are talking in the billions and trillions of calculations per frame. The numbers genuinely are astronomical.

Another factor impacting the performance is the number of 'bounces' a path can take through the scene. The more bounces a path takes, the more realistic the result (at the expense of computational cost). Consider a ray is traced from the camera and into the scene. If it hits nothing in the scene then the camera simply takes the colour of the environment and moves on to calculate another ray. However, it might hit an object. That's a bounce. At that hit point we need to query the material properties to determine how the ray's path is affected. Is it reflected, is it transmitted (e.g. through glass) or something else? Let's assume it's a mirror material so the ray's path is reflected based on the angle of incidence. That newly created ray then continues the journey through the scene. It might not hit anything else at which point it escapes into the environment and we start again with another ray. But, that new path might hit another object. That's the second bounce. We go through the same process of querying the surface, determining how it affects the ray, and so on and so on. The more bounces that happen, the more accurate the generated colour for the pixel being rendered. The bounce count can be imposed by a hard limit or sometimes analytically (I use a hard limit). I generally impose a limit of 4-8 bounces for 'pretty' renders. Pixar used 10 bounces for Big Hero 6 as it required that many bounces to provide a suitably realistic appearance for the white inflatable suit.

With respect to your question on people walking up and down the alleyway and the impact on performance... yes, it would adversely affect performance. It's not really possible to say by how much, though. It (again) very much depends on the complexity of the materials used to realise those people. Geometric model complexity would also add cost (i.e. more polygons mean more intersection / hit tests have to be performed by the GPU to determine if rays hit the geometry or not). Due to the recursive(ish) way that path tracing works the reflections would happen automatically so the balls wouldn't need any additional information to tell them to reflect the people walking in the alleyway.

Real time path tracing (as I'm doing here) really is sensitive to material and geometric complexity. It's so very easy to completely use every bit of performance the GPU has to offer (and then some) resulting in horrendous performance dips. I've caused the GPU to slow to such a degree that the Windows watchkeeper process thought the driver had crashed and thus performed a GPU/driver reset (completely crashing the renderer!)

TL;DR - it depends.

Darren S · May 22, 2018

SharkyUK said:
@Darren S - I think I understand what you are saying...

First of all, let's get the physics out of the way! As you are no doubt aware it it possible to run physics simulations on the GPU these days. However, this is not the case here (the GPU is purely being used to trace the rays/paths of light bouncing through the scene and ultimately generating the resulting image). My little physics simulation is running on the CPU. I take the results of the physics simulation and 'mirror' them to their corresponding GPU counterparts (effectively updating their positions, velocities, etc). The results of the CPU simulation have to be uploaded to the GPU each frame (the physics simulation is running at 60 fps independent of the rendering frame rate). Due to the fact that the physics simulation is quite simple - and running on a rather tasty CPU - the performance hit is pretty much negligible. Consequently, the dynamic nature of the balls don't impact much at all on performance. It's the rendering of the balls that really impact the performance in a significant way.

If we consider the balls in the first part of the demo there are several different materials applied to them. It's in the realisation of these materials where MASSIVE performance gains or losses can be found, due to the complexity of the mathematics behind them. Likewise, the complexity of the materials and how much of the view is covered by those complex materials also makes a big difference. If you are rendering a single ball that is some distance from the camera then it might only cover an area of, say, 10x10 pixels (100 pixels) in total. However, if that same ball is right in front of the camera and occupying the entire view then you suddenly have to consider, say, an area of 1920x1080 pixels (2,073,600 pixels) for a HD display. As path tracing is performed 'per pixel' then you suddenly have MANY more computations to crunch through.

The following materials types were used in the demo - diffuse, metal, emissive, 'coat', ideal specular, ideal refraction - and they all have varying levels of computational complexity and cost. The ideal specular material (i.e. the mirror chrome-like material) is quite 'cheap' to render compared to the ideal refraction material (i.e. the glass-like material) for example. I'm not sure you'd be too interested in the reasons why so I won't go into detail here as it might get a bit boring! My point is that some materials are considerably more complex to simulate and the more pixel area those materials cover, the more the performance is impacted. In terms of ray/path calculations we are talking in the billions and trillions of calculations per frame. The numbers genuinely are astronomical.

Another factor impacting the performance is the number of 'bounces' a path can take through the scene. The more bounces a path takes, the more realistic the result (at the expense of computational cost). Consider a ray is traced from the camera and into the scene. If it hits nothing in the scene then the camera simply takes the colour of the environment and moves on to calculate another ray. However, it might hit an object. That's a bounce. At that hit point we need to query the material properties to determine how the ray's path is affected. Is it reflected, is it transmitted (e.g. through glass) or something else? Let's assume it's a mirror material so the ray's path is reflected based on the angle of incidence. That newly created ray then continues the journey through the scene. It might not hit anything else at which point it escapes into the environment and we start again with another ray. But, that new path might hit another object. That's the second bounce. We go through the same process of querying the surface, determining how it affects the ray, and so on and so on. The more bounces that happen, the more accurate the generated colour for the pixel being rendered. The bounce count can be imposed by a hard limit or sometimes analytically (I use a hard limit). I generally impose a limit of 4-8 bounces for 'pretty' renders. Pixar used 10 bounces for Big Hero 6 as it required that many bounces to provide a suitably realistic appearance for the white inflatable suit.

With respect to your question on people walking up and down the alleyway and the impact on performance... yes, it would adversely affect performance. It's not really possible to say by how much, though. It (again) very much depends on the complexity of the materials used to realise those people. Geometric model complexity would also add cost (i.e. more polygons mean more intersection / hit tests have to be performed by the GPU to determine if rays hit the geometry or not). Due to the recursive(ish) way that path tracing works the reflections would happen automatically so the balls wouldn't need any additional information to tell them to reflect the people walking in the alleyway.

Real time path tracing (as I'm doing here) really is sensitive to material and geometric complexity. It's so very easy to completely use every bit of performance the GPU has to offer (and then some) resulting in horrendous performance dips. I've caused the GPU to slow to such a degree that the Windows watchkeeper process thought the driver had crashed and thus performed a GPU/driver reset (completely crashing the renderer!)

TL;DR - it depends.

As per usual - thanks Andy for twisting my tiny brain!

I had never really thought of the 'closeness' issue that you mentioned with one ball taking up the entire screen real-estate.

Using much simplification and imagining that the entire background was solid black - would it be fair to assume the rendering for say 50 balls at distance that occupied around 1 million pixels in total - would require half the workload of just one ball in extreme close-up for standard HD?

In a similar train of thought - it's easy to see how you can cripple the GPU performance without much effort at all - like you say above. I take it that textured surfaces such as a golf ball or the pitted leather of rugby ball could be massively demanding too?

SharkyUK · May 22, 2018

Darren S said:
As per usual - thanks Andy for twisting my tiny brain!

?Experience is everything mate! It probably sounds / reads a lot more complex than it is. Maybe. Hmm.

I had never really thought of the 'closeness' issue that you mentioned with one ball taking up the entire screen real-estate.

Using much simplification and imagining that the entire background was solid black - would it be fair to assume the rendering for say 50 balls at distance that occupied around 1 million pixels in total - would require half the workload of just one ball in extreme close-up for standard HD?

If the entire background was black then the resulting image would be black... unless at least one of those balls was a light emitter or there was some form of direct lighting being used to illuminate the balls.

The background light is used to light the scene in a physically accurate way hence why the alleyway background in my demo is encoded as a HDR image - providing both colour and light intensity information. It also does not matter what colour the background environment colour actually is. The exact same process and calculations have to be performed to trace a ray's path through the scene. The only difference is that, with a black background, when that ray ends its processing it finds a black environment hence there is no light information available. The pixel renders black (assuming, as already said before) there's no other form of direct lighting from elsewhere.

As for the 50 balls at distance vs. one ball extremely close scenario... it depends!

Seriously, it really does. If we assume that the balls all use exactly the same material then we still cannot say for sure. With path tracing there is an element of randomness in terms of calculating the direction of rays travelling and bouncing around a scene. It's quite possible that the balls may be in such positions that rays hit and bounce off many other balls before finally resolving to a pixel colour contribution. There really isn't much in way of a clear answer. That said, if we dumbed down the rendering to a minimum such that complexity was pretty much the same for the two scenarios then, yes (as a rule of thumb) a 2x increase in pixel coverage would typically equate to a 2x increase in computational cost.

In a similar train of thought - it's easy to see how you can cripple the GPU performance without much effort at all - like you say above. I take it that textured surfaces such as a golf ball or the pitted leather of rugby ball could be massively demanding too?

Texturing itself isn't too expensive and still offers a way to increase perceived image fidelity whilst keeping geometric complexity down. For example, through the use of bump and normal maps to simulate surface perturbations rather than actually having those perturbations modelled using millions of additional polygons in the 3D meshes (as seen in all modern day games). If the textured surface is a simple material that returns the texture colour then that's a fairly minimal cost. If it uses a normal or bump map to alter the direction of the ray that hit the surface then the cost goes up a bit as the new direction the ray must take is affected by the data in the normal or bump map. Again, on its own, it's not a particularly expensive step. However... if it's a texture representative of human skin on the face and the material is simulating skin then the cost can increase significantly. Skin is expensive to simulate. Some of the light hitting the skin is reflected, some is absorbed. Some of the light enters through the skin, bounces around just under the surface of the skin, and then exits at some point later (subsurface scattering).

Anybody here into non-realtime graphics rendering?

Similar threads