Graphics options/capabilities

Rob · Nov 28, 2012

Or have I misunderstood - are you asking a more general question?!

In that case, I'd say it's very much related to why both CPU and GPU core speeds haven't increased much in recent years. Sure, they've increased in fits and starts, but not by much.

Everything's going parallel. Throughput is theoretically (and often practically) much higher, but speed (per-core performance) isn't. Sure, faster cores help, but with faster cores comes more heat, requirement for more voltage, reduced stability, etc. Obviously this also depends on the fabrication process.

This is even more relevant for graphics cards than it is for CPUs, due to the limitations of monitors. Sure, you can get monitors with high refresh rates, but generally we're interested in getting at least 60FPS - whether we get any more or not is irrelevant (in conventional application). Note that 60hz is much, much slower than modern GPU cores, which have tended to be around 600-1000hz for quite some time, which is not mere happenstance (although these numbers can't really be compared quite as easily as that, but still, the point stands). So the GPU manufacturers haven't been too worried about optimising core speed, because they can instead get a much better overall performance boost by throwing more transistors in. For a decade or more, the bottleneck has not been fillrate. It's simply easier for them to add more shader cores, because there's no problem with them maxing out at a few hundred FPS, which is supposedly faster than the human eye can perceive.

So, if I'm reading into your question correctly, am I right in thinking that you're trying to render 100s...1000s FPS, and struggling? Perhaps you could try rendering multiple frames simultaneously? Could be done by doubling the effective screen resolution, and duplicating the scene? Then, perhaps using different irregular rasters on the two halves would allow your stochastic SSAA to still function - of course, you'd have to do some bookkeeping/processing afterwards in order to merge the information (i.e. average pixel colours). This could scale by continuing to multiply the effective screen resolution. Not saying it's necessarily possible... just an idea. Or alternatively you could simply increase the number of samples per pixel. Or am I still barking up the wrong tree?

leafspring · Nov 29, 2012

Brendan said:
Not even sure where to start on this thread anymore, but y'all (yeah you, Rob) seem pretty knowledgeable on graphics cards etc, so can anyone explain to me why pixel fill rate isn't improving with recent cards? Is it actually a limitation of massive parallelism, or just an oversight due to the inefficiency of typical graphics engines?

Well, vendors have a certain budget per card design they can fill with components and currently they seem to be more focused using it for processing power (and maybe texture access) instead of pixel fill rate(ROPs). I wouldn't necessarily call it an oversight as they are trying to strike a balance between the different parts and, in my opinion, are quite successful at it.

Kale · Nov 29, 2012

Madoc said:
We're PC gamers and we hate what consoles have done to games. We want more from games than something you play on a sofa with friends while eating pizza and having a conversation, we want something that requires more than 1% of your attention. I'd rather watch a film than play games like that.

Eh... I think there are games that should allow for either. I like talking to friends when I play. Of course, it'd be better if it's about the game instead of random talk.

Brendan · Nov 30, 2012

Rob said:
This is even more relevant for graphics cards than it is for CPUs, due to the limitations of monitors. Sure, you can get monitors with high refresh rates, but generally we're interested in getting at least 60FPS - whether we get any more or not is irrelevant (in conventional application). Note that 60hz is much, much slower than modern GPU cores, which have tended to be around 600-1000hz for quite some time, which is not mere happenstance (although these numbers can't really be compared quite as easily as that, but still, the point stands).

Didn't think GPU core clock speed (in MHz) had anything to do with screen refresh speed (in Hz), am I missing something there?

Obviously graphics card performance is a tricky one with all the various complex architectures and potential bottlenecks in different scenarios etc, I was really just wondering whether the lack of, or incosistent at least fillrate increases, pixel in particular vs other aspects like processing power is more of a design choice or technological limitation.

You're right though, the 660 does have about 2.2x the fill of the 8800 and probably wasn't the best example, but that's still not a huge improvement for an equivalently high end card released 5 years later. A better example is a GTX 260, which has more pixel fill than a GTX 560 Ti.

leafspring said:
Well, vendors have a certain budget per card design they can fill with components and currently they seem to be more focused using it for processing power (and maybe texture access) instead of pixel fill rate(ROPs). I wouldn't necessarily call it an oversight as they are trying to strike a balance between the different parts and, in my opinion, are quite successful at it.

You probably have a point there, and maybe it's nothing to do with tech limits. With the recent trend of using inefficient, often high level "uber" shaders the design choice even makes sense.

Either way, it's just a shame because fillrate is most definitely still a bottleneck in a lot of cases!

Rob · Nov 30, 2012

Hi Brendan,

Brendan said:
Didn't think GPU core clock speed (in MHz) had anything to do with screen refresh speed (in Hz), am I missing something there?

That's not quite what I meant, although I can see how my statement was easily misinterpreted. Let me illustrate with an artifical example that more clearly demonstrates my thinking:

Suppose you've got a 1080p screen, i.e. ~2MP. Now take your 660, which is rated at about 20GP/s. Keeping things unrealistically simple, this equates to ~10,000FPS. This is the maximum theoretical FPS, assuming that you are doing simple operations (which makes sense, as you'd expect to get a few thousand FPS in 2D mode on a 1080p monitor). However, we don't actually benefit from that many FPS, since we are maxed out at 60FPS (60Hz) on a standard monitor. That's what I meant by the "limitations of monitors".

This means that, under the conventional model, there is no incentive to want a higher max fillrate! Increasing the fillrate is not what's been deemed important - the challenge is being able to have a higher throughput when dealing with more complex operations, i.e. better graphics.

There are two ways this can be done - increase the core speed, or increase the number of cores (of course, there's GPU core, shader core, etc., but we won't make this distinction for the sake of this simplified example). At present, it doesn't matter whether we increase the core speed or the number of cores - doubling either would have approximately the same effect, since importantly the numbers of cores that we're talking about is much much less than the number of pixels to be rendered. Consequently, we have a choice: increase number of cores, or increase core speed. From the trends over recent years, it is clear that GPU manufacturers deem it easier to consistently increase the number of cores than it is to increase the core speed, as that is what they have done!

Naturally, you would think that doing either of these things would increase the fillrate. However, since the manufacturers deem fillrate to not be of primary importance, they have simultaneously optimised the architecture. Seemingly, these optimisations result in more complex processing units, which result in the GPU being able to process more complicated operations more efficiently, instead of neccesarily increasing the max fillrate.

Basically, with including more cores, and generally making the cards more powerful, there is more fillrate than is needed. Consequently, the manufacturers are allowed to implement more complex hungry architectures that perform better when processing modern graphics.

Brendan said:
Obviously graphics card performance is a tricky one with all the various complex architectures and potential bottlenecks in different scenarios etc, I was really just wondering whether the lack of, or incosistent at least fillrate increases, pixel in particular vs other aspects like processing power is more of a design choice or technological limitation.

My above argument would lead to the conclusion that it is most likely a design choice.

Brendan said:
A better example is a GTX 260, which has more pixel fill than a GTX 560 Ti.

This is certainly an interesting example. Further to the fillrate, the 260 has higher raw processing power (GFLOPS) than the 550. The fact that the 550 performs so much better is further evidence supporting my above conclusions.

(Disclaimer - I don't work for any graphics card manufacturer, so I can't verify the above info. Rather that's the logical conclusions I've come to from interpreting what I've read, and subsequently thinking about it. My arguments are evidence-based speculations - please feel free to come to your own conclusions rather than simply believing me!!! And I invite anyone working for Nvidia/AMD to provide insider details on this topic!!!)

Brendan · Nov 30, 2012

Ello Rob,

Fair enough, some good points and a good explanation there, but I don't think it's that simple.

The reality is that pixel fill is used for a lot more than maintaining a pointlessly high amount of FPS. In practice, things like depth complexity, shadow maps, multiple fullscreen buffers, buffer operations and supersampling all use it up, even if the calculations involved are all very simple.

This makes pixel fill a key performance statistic for a graphics card, assuming it doesn't have to deal with inefficient, demanding high level shaders or anything else particularly extreme.

Think you're right though and it is most likely a design choice, just an imbalanced one.

From what I can tell, unsurprisingly, the GTX 560 Ti (not the 550) has more GFLOPS than the GTX 260 (1260 vs 715), despite the lower pixel fill.

Madoc · Nov 30, 2012

Yeah, for a lot of stuff it's a terrible hard limit. Things like high resolution shadow maps (dynamic ones, not baked lightmaps!), dense vegetation, particles, full screen effects, supersampling, procedural textures, reflections, refraction etc. it just becomes a limit that cannot be optimised. I've managed to optimise everything else to the point where it's stupidly efficient and the only reason we have to sacrifice quality is because there just isn't enough pixel fill.

The only other thing that gives us performance problems is strange behaviours from ATI drivers. Using a certain combination of basic features suddenly produces terrible performance or unjustifiably high CPU usage that grind the entire pipeline to a halt.

Edit: If you're looking to buy a graphics card buy an nVidia with high pixel fill!

Tony · Dec 1, 2012

Madoc said:
Edit: If you're looking to buy a graphics card buy an nVidia with high pixel fill!

I'm curious, have any of the devs tested the game on a 680 GTX? If so, how did it perform? And do you have many options to scale the graphics up/down yet besides tweaking the AA? Also, if you're planning on doing Mac/Linux ports as an eventuality does this mean you've built the game engine around the OpenGL API?

JamesButlin · Dec 1, 2012

This thread has made me realise how little I know about GFX cards

.

Have to say I'm rather fond of the dev team's opinion and approach to developing games, I love that they want to achieve the best they can possibly get out of the technology we have got (unlike the COD series and other similarly terrible looking games).

The screenshots and videos look absolutely fantastic, i'd like to see some uncompressed (well, as uncompressed as is feasible) 1080p gameplay videos, and I simply cannot wait to get my hands on the alpha - I hope it has some of the editing capabilities shown in the videos (moving & placing objects) just to let us get creative with the engine

Madoc · Dec 1, 2012

We've never gotten our hands on 680, we got a 660 recently. We have a tonne of graphics options but we tend to have everything on pretty much max on any decent card.

Many of the options we have are related to the complexity of materials and quality of shading and shadows etc. The rendering is not deferred and we don't use "uber shaders". We have a really feature rich material system with far more parameters than could be supported by a deferred renderer and you can freely combine them in any material. I developed a system that builds optimal shading programs on the fly based on the material specifications, this allows all that flexibility and also gives amazing performance. It does complex materials with multiple dynamic light sources in one rendering pass and so efficiently that you're mostly left with the pixel fill limitation. You can have hundreds of unique material configurations and they can interact with more global settings and effects.

Something like this is a bitch to implement to begin with but once you have it you've got a really intuitive material system that just works. Artists can just play around with material settings to get the look they want without being limited to preset material types and subsets of features or worrying about the implications of using them. For example we don't have "vegetation shaders" or "skin shaders" etc. you just have translucency and transmittance parameters and the like, you can use these in combination with all other material features.

Yes, we use OpenGL and supporting Mac and Linux shouldn't be a problem. I think Linux will take a little more work but we'll see.

the0thMonkey · Dec 2, 2012

Madoc said:
I've managed to optimise everything else to the point where it's stupidly efficient and the only reason we have to sacrifice quality is because there just isn't enough pixel fill.!

On behalf of all the people who want to play this game and have computers that are not the best (which is putting it lightly), I wish to thank you for being so considerate!

Rob · Dec 3, 2012

Hi Brendan and Madoc,

Thanks for the replies!

Certainly, the explanation (or rather, thought process) that I outlined above was very simplistic, to the point of being naive. As you point out, there is an awful lot more at play - I wasn't going to go there, as it's clear that you are much more knowledgeable and experienced with dealing with such things than I am! However, I was hoping that my general thought process might nevertheless apply and thus be useful.

For sure, whatever improvements the manufacturers are making are noticeably improving FPS in most current games and benchmarks. Presumably, this is based on the assumption that, given a fixed architecture and core speeds, increasing the number of cores increases the throughput, thus increases FPS. They think that focusing on increasing fillrate isn't necessary, because it's not the only way to increase performance. Are you saying that the fillrate bottlenecks you're experiencing violate this assumption?

One important thing to note is that "fillrate" is the effect and not the cause. Traditionally, I thought that fillrate is the number of pixels the card can render per second... so it is the effect of having X shader processors running at Y speed given Z architecture (...although, again, perhaps it's more complicated than that).

However, it seems that fillrate may be being quoted per core:

Model - shaders - MHz - GPs - GTs - GFLOPS
GT 640 - 384 - 950 - 15.2 - 30.4 - 729.6
GTX 650 Ti - 768 - 925 - 14.8 - 59.2 - 1420.8

Clearly, GPs is tied to core clock. However, it is not affected by the shader core count!

I have always thought that having more shader cores will allow more pixels to be simultaneously processed (providing Ncores < Npixels), thus improving throughput. Consequently, wouldn't that mean that more recent cards are performing better, on the basis of having more cores?

Cutting to the chase... are you saying that, in your implementation, the GT 640 would perform better than the GTX 650 Ti, on the basis of increased quoted fillrate? Or does the 650Ti perform better than the 640, in terms of FPS, despite the fillrate?

Brendan said:
From what I can tell, unsurprisingly, the GTX 560 Ti (not the 550) has more GFLOPS than the GTX 260 (1260 vs 715), despite the lower pixel fill.

You're right - you mentioned the 560 but I started talking about the 550. My bad.

Madoc said:
The only other thing that gives us performance problems is strange behaviours from ATI drivers. Using a certain combination of basic features suddenly produces terrible performance or unjustifiably high CPU usage that grind the entire pipeline to a halt.

That sounds very peculiar. It would be interesting to get to the bottom of that one! My last Nvidia card was a 6-series... I've always gone for ATI cards since then, on the basis of cost/performance ratio. Perhaps I should think about going Nvidia for my next one in order to run Sui Generis properly...

Madoc · Dec 3, 2012

It would take some detailed benchmarking to even guess at exactly what's going on, something we haven't done (and we don't have access to a useful range of hardware).

What we seem to observe superficially is that a huge increase in shading power brings a small benefit to performance whereas pixel fill makes a gigantic difference. Also low pixel fill = poor performance regardless of everything else which is mostly easily scalable.

With ATIs you have to "stay on the fast path", raw performance is good but only if you're doing exactly what ATI expects you to, which doesn't include many things apparently. Often when you do something slightly unconventional performance falls through the floor even when you're not using any advanced features. The drivers are stupidly CPU hungry. On some drivers (especially low/mid range laptop cards) basic features that have been around for over 10 years just don't work at all or come and go with driver versions. We often have to sacrifice quality on ATI due to bugs and maintaining good performance can be tricky. There's a couple of things they actually do better than Nvidia but not much. Nvidias pretty much always work as expected and give consistent performance whatever you do which as a developer I can say is HUGELY appreciated. I always feel that Nvidia go the extra mile rather than just trying to get on top of those benchmarks. They actively support innovation.

Rob · Dec 3, 2012

Madoc said:
What we seem to observe superficially is that a huge increase in shading power brings a small benefit to performance whereas pixel fill makes a gigantic difference. Also low pixel fill = poor performance regardless of everything else which is mostly easily scalable.

What graphics cards are you basing this on?

Could it be that something correlated with pixel fill is causing the bottleneck, such as memory bandwidth?

Do you find that performance scales linearly/proportionally with pixel fill? You should find that that's true for identical architectures (e.g. adjusting GPU clock), but it would be fascinating/worrying if that's the case across architectures.

Along those lines, I would be interested to know whether core count is just as important as pixel fill. The best test case would probably be comparing a GTX 260 with a GTX 260 core 216, which differ only by the core counts. I would hope that performance scales linearly with number of cores, allowing more modern architectures to perform better than older ones, despite per-core fillrate.

Of course, as you say, where are you going to get the cards to benchmark?! Have you tried talking to Hilbert from Guru3D about these issues? He gets loads of cards through his fingers, and I know that he knows about Sui Generis - he wrote a couple of news articles about it...

Madoc · Dec 3, 2012

Our observations are based on only a few examples and going by specs provided by the actual manufacturers, not sure about exact examples. Would really need to do those tests. Fill rates in reviews or articles seem to often be calculated from other specs or benchmarks that I won't give credit to without knowing details. There are always many factors to consider and I'm for the hands on approach and what I see first hand.

My engine is unusual in its performance characteristics, I explained some of this in an earlier post. I don't write shader programs, I programmed a shader programmer that does it for me in real time. The resulting programs aren't just stitched together program fragments, they are well optimised, line by line, and remove the need for many expensive things. There are many advantages to this but the point here is that not much shading power is needed. With the average depth complexity for the game we get in excess of 200 fps in 1920x1200 on a mid range nvidia card (if you reduce the resolution enough you start to approach 1000 fps so you can mostly discount other bottlenecks).

Basically it's just annoying to be able to optimise everything so much and then hit a brick wall: pixel fill.

I guess the bottom line is I don't think there's anything wrong with what hardware manufacturers are doing given the current trends in 3D graphics programming. Most programmers would probably consider the path I have taken complete insanity.

Somewhat unrelated to the fill issue, my dream is that one day Sui Generis or the engine become important enough products that I could get a chance to influence API development (at least OpenGL, Direct3D can go to hell) to allow a different approach like mine to be viable. At the moment it looks like in future it might become less and less so. It's not that far fetched, game companies interact closely with IHVs on the development of new API features.

Rob · Dec 3, 2012

Hardware limitations are frustrating.

If you're not fully utilising some components of the graphics card, have you been able to identify which components these are? You say "not much shading power is needed" - well is there anything that you can do with that shading power? Thinking along the lines of GPGPUs, is there anything that you can do in the background?

I suppose all you can do in this situation is try to optimise and find reasonable approximations. The difficulty is trying to accommodate underpowered systems. The ability to turn features off is the obvious solution, but I'm sure it's often not as easy as that.

At the same time, it's important to try and optimise high-end features so that they can operate more efficiently... slightly off this topic: I'm still interested to know what stochastic SSAA algorithm you're using... unless you're keeping that proprietary. E.g. are you using heterogeneous sampling to focus AA in complex regions; are you using iterative FPS-regulated single-sample stochastic AA in order to automatically maximise quality given power, etc?

Rob · Dec 3, 2012

Actually, could heterogeneous sampling help with your fillrate issue, in the case of SSSAA? To me it sounds nice in theory, but have you tried seeing whether it's practical?

Rob · Dec 3, 2012

Madoc said:
My engine is unusual in its performance characteristics, I explained some of this in an earlier post. I don't write shader programs, I programmed a shader programmer that does it for me in real time.

Sorry... I've just twigged what you've been saying! Basically, you've completely done away with the way of thinking over the past decade or so, instead digressing to a time when shaders weren't the bottleneck.

And I've been going on about shader cores... really need to get more sleep.

So what you could really do with is more ROPs? And that sucks because recent years have seen a huge increase in the number of shader cores, and not ROPs.

So what you want to do is convince GPU manufacturers to increase the number of ROPs they put in cards, even though this was seen as irrelevant in comparison to shader cores ever since DirectX9... tricky one.

Sorry it took me a while to twig!!!

Almost makes you want to throw some extra stuff into the shaders, just to make use of them! Although I suppose you must have done everything you could already!

(P.S. the question as to whether heterogeneous sampling would help with the fillrate issue still stands! Might involve some unconventional practices, but could work...)

Rob · Dec 3, 2012

So to answer Brendan's original question: it's a design choice! They can get better fillrate by putting more ROPs in!

Madoc · Dec 3, 2012

I don't think there's anything I could realistically do with the spare computing power short of more complex shading calculations. At the moment I don't see any need for them. I suspect memory bandwidth in general plays a role here too.

Unfortunately in many cases there isn't much you can just turn off, you have to reduce depth complexity (i.e. simpler trees and effects) and resolutions (e.g. blocky shadow maps). Stylised graphics and static or hybrid lighting solutions are typically more pixel fill friendly.

The SSAA is ultimately very simple, it's merely the sample patterns that are stochastic rather than regular. I don't think heterogenous sampling is realistic given how the hardware works. I've played around with time sensitive and incremental approaches but these proved quite impractical. Mostly what I've done with SSAA is found ways to distribute work to improve its performance and looked for other benefits that could be drawn from it beyond AA. Initially I didn't really plan to use it in SG but once we tried we were so impressed with the overall visual quality it provided over hardware AA (for some things such as vegetation it is more noticeable) that it became one of our favourite features and we started to design a lot things around it.

Edit: Got ninja'd here so just to clarify, I don't hope to influence hardware design, merely I'd like to see better support for certain ways of doing things in OpenGL.