How Many Triangles and Clipping & Culling (EN)

Влияние кол-ва шейпов и треугольников на фпс.

Даны данные по древним видеокартам и процессорам!
Однако, исходя из этого, можно провести примерные вычисления, чего можно добиться на "современном" железе.

Также отмечается польза от небольших (по размеру) шейпов, против одного большого.

Проще - чем меньше шейп, тем его быстрее выбросит из кадра.

Копипаста статьи из MaxImmerse.chm (С)

Clipping & Culling

The clipping/culling behavior of Gamebryo is the other issue that must be kept in mind while creating scene geometry.

· Clipping is the process of dividing polygons into only the fragments that will appear on screen.

· Culling is an attempt to avoid clipping by rejecting whole NiTriShapes if no part of them appears on the screen. In general, culling is preferable to clipping because clipping is vastly more expensive than culling.

You can structure your scenes to improve culling by limiting the volume of space an NiTriShape occupies.

A small NiTriShape is more likely to be completely off screen while a very large one (e.g. a huge floor) is likely to be, at least, partially on screen all the time.

Clipping vs. Culling

The clipping/culling behavior of NetImmerse is the other issue that must be kept in mind while creating scene geometry. Clipping is the process of dividing polygons into only the fragments that will appear on screen. Culling is an attempt to avoid clipping by rejecting whole NiTriShapes if no part of them appears on the screen. In general, culling is preferable to clipping because clipping is vastly more expensive than culling. You can structure your scenes to improve culling by limiting the volume of space an NiTriShape occupies. A small NiTriShape is more likely to be completely off screen while a very large one (e.g. a huge floor) is likely to be, at least, partially on screen all the time.

The results are summarized in the following table:

Triangle/NiTriShape Ratio

The triangle/NiTriShape ratio is the most important geometric metric for game performance. NiTriShapes are NetImmerse’s logical clusters of triangles. They roughly correspond to Max’s Mesh. The issue with triangles and NiTriShapes is that when rendering an NiTriShape, NetImmerse must do a fixed amount of work on the CPU (property-state setup, texture swapping, etc.) that does not vary according to the number of triangles in the NiTriShape. You should, thus, try to pack as many triangles as possible into each NiTriShape. In general, a game should never have fewer than 20 triangles per NiTriShape. You don’t have to increase the number of triangles just to improve the triangle/NiTriShape ratio, but if doing so will improve vertex lighting or some geometric detail, it won’t hurt the performance. Another way to tackle improving the ratio is to collapse similar meshes with the same materials that are close together in a scene. This collapsed mesh will be converted to a single NiTriShape instead of several separate NiTriShapes, thus improving the overall ratio.

The importance of a large triangle/NiTriShape ratio (i.e. a lot of triangles per NiTriShape) is increased on hardware transform and lighting cards (high-end graphics cards). Hardware transform and lighting cards perform vertex transformation, lighting, and rasterization on the graphics card. Earlier cards could only perform the rasterization while the CPU was forced to do the vertex transformation and lighting. Hardware transform and lighting cards both free the CPU of this task and perform it faster than the CPU ever could. This decreases the time required to render an individual polygon but leaves the fixed amount of work that NetImmerse must do for each NiTriShape (discussed earlier) unchanged. Because of this, the effect of a low triangle/NiTriShape will be particularly noticeable. In a low triangle/NiTriShape situation the CPU will become the bottleneck (doing the NiTriShape setup) and the full rendering capabilities of the graphics card will not be used. In contrast, a high triangle/NiTriShape ratio will allow the graphics card to draw as many polygons as possible and leave the CPU free to perform other operations.

Transform-Rate, Fill-Rate, and How Many Triangles

The basic geometric question facing all real-time 3D artists is “How should I construct my scene so that I can maintain great real-time rendering speed?” An answer to this question takes into account many different factors, including how many triangles (or faces) to use, the target platform, and the number of textures applied to a single surface, etc. Before he can even address the question, however, an artist must understand two basic principles, transform-rate and fill-rate.

Transform-Rate

The transform-rate is the number of vertices a graphics card can process in a given time period. When the number of vertices to be transformed per time period exceeds the graphics card’s capabilities, an application is said to be “transform limited.”

Fill-Rate

Not only is a graphics card limited in the number of vertices it can transform, it is also limited in the number of pixels it can write to the backbuffer per second. The backbuffer is the portion of a graphics card’s memory that is used as a scratch pad while the final image is being assembled. When this process of writing to the backbuffer exceeds the graphics card’s capability, an application is described as being “fill-rate limited.”

The “fill*.Max” files show the testing of fill-rate behavior of a graphics card. In all these test cases there are very few triangles but they force each pixel to be drawn multiple times (since each layer of triangles must be drawn). The results are summarized in the following table. Note that the number of layers within a scene is known as its depth complexity.

How Many Triangles?

How, then, does the knowledge of transform-rate and fill-rate limitation determine the number of triangles? The answer is that it does not, but it does allow you to determine the Maximum number of triangles you could ever use. For example, if a game had a 733Mhz G400 as its minimum platform and a consistent 60 frames per second was required, then no single scene could ever have more than 50,000 triangles or a depth complexity greater than 10. These two numbers were obtained using the prior two graphs and represent “not to be exceeded” values. Every additional feature that is added to the game (e.g. multitexturing, more lights, alpha blending, etc.) will further reduce the possible number of triangles and limit their distribution in the scene.

Выдержки из оригинальной справки. (NDL Gamebryo 1.1)

Multi/Sub-Object and Triangle/NiTriShape Ratios

A convenient way of texturing an object in Max is to use the Multi/Sub-Object material. However, NetImmerse only supports one material per NiTriShape because most hardware can only handle one material per set of triangles. When MaxImmerse encounters a Multi/Sub-Object material it must, thus, split the Max Mesh into multiple NiTriShapes (one for each material). If used indiscriminately, Multi/Sub-Object materials can lay waste to the triangle/NiTriShape ratio. Since each Multi/Sub-Object material creates more NiTriShapes but keeps the same number of triangles they will only ever decrease the ratio. Multi/Sub-Object materials do not need to be completely avoided but they should be used cautiously and with a consideration of the triangle/NiTriShape ratio constraints.

How to Reconcile Triangle/TriShape Ratios and Clipping/Culling Behavior

These two issues, triangle/NiTriShape ratio and the culling/clipping behavior, place somewhat contradictory demands on you. To improve the triangle/NiTriShape ratio you must have the most triangles in a NiTriShape possible (mesh collapsing, increasing the number of triangles in the mesh, etc.). Simultaneously, the NiTriShapes must be kept compact to allow efficient culling. There is no simple solution to this problem and you must constantly balance the two constraints. In general, triangles should be grouped into NiTriShapes so that culling will still be effective but the NiTriShapes contain the most triangles possible. For example, when modeling the four walls of a room, if the walls are relatively complex (i.e. more than 20 triangles) each wall should be in its own NiTriShape. Having all the walls in a single NiTriShape would improve the triangle/NiTriShape ratio but would force clipping on all the geometry. By dividing the walls into four NiTriShapes, two of them will usually be culled leaving the other two to be clipped. However, if the walls were very simple (i.e. 2 triangles each) then it might make sense to clump all the wall triangles together to avoid having several 2 triangle NiTriShapes.

One useful trick to see if a poor triangle/NiTriShape ratio is slowing performance is to (in Max) collapse all the geometry in question into one Mesh. For this to be meaningful all the geometry must be on screen at one time (i.e. the effect of clipping vs. culling must be eliminated, since the collapse could result in more clipping). Additionally, all the geometry (both before and after the collapse) should be assigned the same material. This will eliminate the effect of shading changes wrought by the collapse. Once these two criteria are met, the frames per second before and after the collapse can be compared to roughly determine how much performance is being lost to a low triangle/NiTriShape ratio.

Transform-Rate, Fill-Rate

Transform-Rate

The transform-rate is the number of vertices a graphics card can process in a given time period. When the number of vertices to be transformed (moved, rotated, and lit) per time period exceeds the graphics card's capabilities, an application is said to be "transform limited."

The following table shows the maximum T&L rate for various PC graphics hardware:

Manufacturer & Card	T&L - million vertex/sec
nVidia GeForce 2 MX 20	20
nVidia GeForce Ti 4600 136	136
NVidia GeForce FX 5800 Ultra 200	200
ATI Radeon 7500 39	39
ATI Radeon 8500 75	75
ATI Radeon 9700 300	300

These values are, of course, theoretical peaks and do not represent real-world game situations. However, these numbers can be useful in examining performance. If you wish to achieve 60 fps on a GeForce 2 MX, the absolute top amount of vertices you can transform in a frame is 333,333. Let's say that every object you render requires two passes. The theoretical top you could transform is now halved to 166,666 vertices. Mind you, these vertices all belong to one object and are untextured and flat shaded, drawn as optimally as possible with absolutely nothing else happening in the application. No interesting game could ever hope to achieve this situation.

Fill-Rate

Not only is a graphics card limited in the number of vertices it can transform, it is also limited in the number of pixels it can write to the backbuffer per second. The backbuffer is the portion of a graphics card's memory that is used as a scratch pad while the final image is being assembled. When this process of writing to the backbuffer exceeds the graphics card's capability, an application is described as being "fill-rate limited."

The following chart shows the maximum fill rate for various PC graphics hardware:

Manufacturer & Card	Pixel fillrate - million pixel/sec
nVidia GeForce 2 MX	350
nVidia GeForce 3	800
nVidia GeForce Ti 4600	1200
nVidia GeForce FX 5800 Ultra	4000
ATI Radeon 7500	580
ATI Radeon 8500	1100
ATI Radeon 9700	2200
Matrox Parhelia PH-A128B	800
3dfx Voodoo 5	667

These values are, of course, theoretical peaks and do not represent real-world game situations. However, these numbers can be useful in examining performance. Let's see what we can do with a GeForce 2 MX at a display resolution of 1024 by 768. We'll assume for the moment that transformation and lighting comes for free (which it never does). 1024 by 768 resolution is 786,432 pixels. We'd like to run at 60 fps, so that involves rendering that 1024 by 768 image 60 times for a grand total of 47.19 million pixels. Assuming each pixel is drawn more than once, the maximum number of pixel writes we can do on each pixel is roughly 7. This, of course, assumes that all operations that write a pixel cost the same. Coloring a triangle by a diffuse color is significantly cheaper than multi-texturing that pixel.

Triangles & NiTriShapes

The triangle to NiTriShape ratio is the most important geometric metric for game performance. NiTriShapes are Gamebryo's logical clusters of triangles. The issue with triangles and NiTriShapes is that when rendering an NiTriShape, Gamebryo must do a fixed amount of work on the CPU (property-state setup, texture swapping, etc.) each time it passes down the set of triangles, no matter how big. You should, thus, try to pack as many triangles as possible into each NiTriShape.

In general, a game should never have fewer than 20 triangles per NiTriShape. You don't have to increase the number of triangles just to improve the triangle/NiTriShape ratio, but if doing so will improve vertex lighting or some geometric detail, it won't hurt the performance. Another way to tackle improving the ratio is to collapse similar meshes with the same materials that are close together in a scene. This collapsed mesh will be converted to a single NiTriShape instead of several separate NiTriShapes, thus improving the overall ratio.

Hardware transform and lighting cards both free the CPU of this task and perform it faster than the CPU ever could. This division of labor decreases the time required to render an individual polygon but leaves the fixed amount of work that Gamebryo must do for each NiTriShape (discussed earlier) unchanged. In a low triangle/NiTriShape situation the CPU will become the bottleneck (doing the NiTriShape setup) and the full rendering capabilities of the graphics card will not be used. In contrast, a high triangle/NiTriShape ratio will allow the graphics card to draw as many polygons as possible and leave the CPU free to perform other operations.

Note that this graph is only intended to show the relative time penalty of having only a few triangles per NiTriShape. The test samples used to generate the above graph did not have enough triangles to demonstrate the effect of transform-rate and fill-rate limitation. The G400 and GeForce2 in particular hit their Maximum frame rate in the 192 and 300 triangles/NiTriShape. In a larger test case, transform- and fill-rate limitation would prevent these cards from reaching their Maximum frame rate.