From 20e8b71bf627c57d33e4b0814bc73e50f659fddf Mon Sep 17 00:00:00 2001
From: Daniel Wohlgemuth Some of these features are in some games, but only in
-heavily scripted and constrained ways – frequently they play little part in
-actual gameplay, and so can look rather artificial – which of course is exactly
+heavily scripted and constrained ways – frequently they play little part in
+actual gameplay, and so can look rather artificial – which of course is exactly
what they are. Using Cellular Automata (CA) to simulate these ideas can lead to
far more dynamic and realistic behaviour, and allow new types of gameplay and
new tactics within games. At the very least, they allow more realism, better
@@ -94,9 +94,9 @@ Introduction
CA Basics
itself with its neighbouring cells. Differences between them result in changes
to the state of the cell and/or its neighbours according to various laws. In
this article, these will be based very loosely on real physical laws. One of
-the best-known CAs is “Conway’s Game of Life” [Conway ref]. This is an
-extremely simple CA – it has a single bit of state – whether the cell is full
-or not – and some extremely simple rules for changing state according to the
+the best-known CAs is “Conway’s Game of Life” [Conway ref]. This is an
+extremely simple CA – it has a single bit of state – whether the cell is full
+or not – and some extremely simple rules for changing state according to the
state of neighbouring cells. Nevertheless, even this incredibly simple model is
Turing-Machine-compatible [Conway Turing ref].
In a 3D array of cubic cells, there are three possible -definitions of “neighbour” cells:
+definitions of “neighbour” cells:Two possible solutions present themselves. One method, used by the first in the X-Com series of games in their impressive and innovative use of CAs [X-Com ref], is to model the faces between the cells as entities, as well as the cells themselves. So walls, floors and ceilings always lie between two cells, along cell faces. This works quite well, but it does mean that there -are now two distinct classes of object – things that fill a whole cube (rock, +are now two distinct classes of object – things that fill a whole cube (rock, dirt, furniture, tall grass); and things that sit between two cells (walls, floorboards, short grass, doors). This creates annoying special cases in the code used to model substances and their interactions, and causes code replication between the two types, and debugging spaghetti. However, if this -model fits, then it is a viable one, and it is fairly intuitive – the internal +model fits, then it is a viable one, and it is fairly intuitive – the internal representation of objects matches their rendered shape fairly closely.
The other solution is one that retains its generality without resorting to many tiny cells. This is far more flexible about where the -visual “edges” of the cells are in the world. Rather than the concept of a +visual “edges” of the cells are in the world. Rather than the concept of a fixed solid cubic cell, the edges between cells can move about a bit according to the contents. This allows a thin wall to be chopped into half-metre squares, and each square lives in a cell. Because the walls are only a few centimetres thick, neighbouring cells are thought of as expanding to make up the extra -space. This “expansion” is simply a way of thinking about it – the CA code +space. This “expansion” is simply a way of thinking about it – the CA code itself does not know or care what shape the objects it represents are. As far as the CA physics are concerned, everything is still half a metre thick. Most of the work in making things look otherwise is in the rendering, rather than in the CA routines. It is the job of the rendering to ensure that water goes all -the way to the wall’s mesh, and not just the edge of the CA cube, which would +the way to the wall’s mesh, and not just the edge of the CA cube, which would leave a large gap.
In this scheme, a one-metre-wide corridor with thin wooden -walls is represented by a plane of “wood” cells, a plane of “air” cells, and -then a plane of “wood” cells. Since the centres of the cells are each half a +walls is represented by a plane of “wood” cells, a plane of “air” cells, and +then a plane of “wood” cells. Since the centres of the cells are each half a metre away from each other, the total width from wall to wall is still one metre. Of course, the graphical representation of the world still shows that -the “cubes” of wood are not cubes at all but flat planes a few centimetres +the “cubes” of wood are not cubes at all but flat planes a few centimetres thick, and this is the representation that will be used for any collision-detection, but the distinction makes very little difference to the -things that are modelled with the CA – water, air, fire. Because these entities +things that are modelled with the CA – water, air, fire. Because these entities are fairly amorphous, the difference between what is rendered and what is actually being modelled is very hard for the player to see. Again, accuracy is sacrificed for speed wherever the game can get away with it.
-The next factor to consider is a gameplay decision – the +
The next factor to consider is a gameplay decision – the difference between using passive scenery and active scenery.
In this system, as far as the CA is concerned, the scenery is inert. Water will flow around scenery; fire, air and heat will be stopped by -scenery. But the scenery is not affected by the actions of the CA in any way – +scenery. But the scenery is not affected by the actions of the CA in any way – it does not burn, it does not get damaged, it does not get wet, or move in currents. This is the simpler of the two representations, but still allows discrete objects such as the ubiquitous oil drum and crate to float away on rivers of water, or to explode or burn when heated by fire.
Because the CA only knows about cells, not polygons, the -scenery must be converted into a cell representation – usually as a +scenery must be converted into a cell representation – usually as a pre-processing step. These cells are simply marked as inert rock or similar, and their only function is to ensure that water cannot flow into them, and that heat is not exchanged with them. Of course, the scenery is usually a collection @@ -213,23 +213,23 @@
The far more versatile, though also more adventurous option, is to have the scenery modelled by the CA as well, and for fire, water and so -on to affect bits of the scenery. This also opens up the “totally destructible -world” concept that many are looking to as the next big thing in games, though +on to affect bits of the scenery. This also opens up the “totally destructible +world” concept that many are looking to as the next big thing in games, though as with everything, there is nothing truly new in computer games [XCom ref again].
In this system, rather than simply being cells of inert -material, scenery is modelled by its actual properties –current temperature, +material, scenery is modelled by its actual properties –current temperature, how easily and fiercely it burns, how strong it is, and so on. As the cells of the CA change their state according to the physical rules of the CA, so the -graphics engine changes how it renders the associated polygonal objects – they +graphics engine changes how it renders the associated polygonal objects – they become sooty, damaged, glow red hot, or (if the graphics engine can handle it) they vanish altogether.
In the latter case, the graphics engine can either be of the -“Geo-Mod” type [Red Faction ref], or the object itself can simply have been +“Geo-Mod” type [Red Faction ref], or the object itself can simply have been specially marked as destructible, such as a thin wooden wall, and have an -alternative “broken” representation, as done by current engines when dealing +alternative “broken” representation, as done by current engines when dealing with damage by weapon fire.
An octree is ideally suited to storing this arrangement, specifically a dynamically-allocated octree. In any implementation of the -octree, remember that by far the most common operation in a CA is “find the -cell next to this one”, so optimising for this type of operation when +octree, remember that by far the most common operation in a CA is “find the +cell next to this one”, so optimising for this type of operation when implementing the octree will pay off in terms of speed. If this request is made, and there is no neighbouring cell in the octree, it is assumed that the neighbouring cell is air at standard temperature and pressure. The physical -simulations are carried out accordingly, and if they result in the “missing” +simulations are carried out accordingly, and if they result in the “missing” cell becoming significantly different from standard temperature and pressure, a cell with the new properties is created and inserted in the octree. When an air cell returns to within a certain tolerance of standard temperature and @@ -258,8 +258,8 @@
for ( neigh = each neighbour cell )
{
- if ( neigh->Material->IsInert() )
+Â Â Â if ( neigh->Material->IsInert() )
{
continue;
}
- float DPress = cell->Pressure – neigh->Pressure;
- float Flow = cell->Material->Flow * DPress;
- Flow = clamp ( Flow, cell->Pressure / 6.0f, -neigh->Pressure / 6.0f );
- cell->NewPressure -= Flow;
- neigh->NewPressure += Flow;
+   float DPress = cell->Pressure – neigh->Pressure;
+Â Â Â float Flow = cell->Material->Flow * DPress;
+ Â Â Flow = clamp ( Flow, cell->Pressure / 6.0f, -neigh->Pressure / 6.0f );
+Â Â Â cell->NewPressure -= Flow;
+Â Â Â neigh->NewPressure += Flow;
}
@@ -334,21 +334,21 @@
if ( cell->Turn != CurrentTurn )
{
- cell->Turn = CurrentTurn;
- cell->Pressure = cell->NewPressure;
+Â Â Â cell->Turn = CurrentTurn;
+Â Â Â cell->Pressure = cell->NewPressure;
}
for ( neigh = each neighbour cell )
{
- if ( neigh->Material->IsInert() )
+Â Â Â if ( neigh->Material->IsInert() )
{
continue;
}
- if ( neigh->Turn != CurrentTurn )
- {
- neigh->Turn = CurrentTurn;
- neigh->Pressure = neigh->NewPressure;
- }
- // ... same physics code as before ...
+Â Â Â if ( neigh->Turn != CurrentTurn )
+Â Â Â {
+Â Â Â Â Â Â Â neigh->Turn = CurrentTurn;
+Â Â Â Â Â Â Â neigh->Pressure = neigh->NewPressure;
+Â Â Â }
+Â Â Â // ... same physics code as before ...
}
@@ -356,13 +356,13 @@ The above simple model works well for uniform redistribution of air pressure. At first glance, this is not something that is frequently -modelled in games, but in fact it is one of the commonest effects – explosions, +modelled in games, but in fact it is one of the commonest effects – explosions, and their effects on things. An explosive is simply a lump of material that produces a huge amount of air in a very short time. They can be modelled by finding the nearest CA cell to the centre of an exploding grenade, and adding a -large number to the cell’s pressure, then letting the CA propagate the pressure +large number to the cell’s pressure, then letting the CA propagate the pressure through the world. Damage is done to the surroundings by either high absolute -pressures or high pressure differences – in reality both do different sorts of +pressures or high pressure differences – in reality both do different sorts of damage to different objects, but that is usually unnecessary complication for the purposes of a game.
@@ -370,7 +370,7 @@Because the simulation of the flow of air is qualitatively @@ -388,47 +388,47 @@
In fact, the easiest way to simulate the transmission of pressure through water is to make it slightly compressible. This means pressure can be stored as a slight excess mass of water in the cell, above what the -cell’s volume should be able to hold. In practice, the amount of compression -needed is tiny – allowing just 1% more water per cell per cube height is easily +cell’s volume should be able to hold. In practice, the amount of compression +needed is tiny – allowing just 1% more water per cell per cube height is easily enough. In a static body of water whose cells can normally contain one litre of water each, the cells at the top will contain one litre, the ones under them will contain 1.01 litres, the cells under those will contain 1.02 litres, and so on to the bottom. This tiny amount of compression will be completely unnoticeable to the player, but has enough dynamic range to allow all the usual properties of liquids. For example, the levels of water in two containers joined by a -submerged pipe will be the same, even if water is poured into one of them – it +submerged pipe will be the same, even if water is poured into one of them – it will flow through the pipe to the other container.
if ( neighbour cell is above this one )
{
- if ( ( cell->Mass < material->MaxMass ) ||
- ( neigh->Mass < material->MaxMass ) )
- {
- Flow = cell->Mass - material->MaxMass;
- }
+Â Â Â Â if ( ( cell->Mass < material->MaxMass ) ||
+Â Â Â Â Â Â Â Â ( neigh->Mass < material->MaxMass ) )
+Â Â Â Â {
+Â Â Â Â Â Â Â Â Flow = cell->Mass - material->MaxMass;
+Â Â Â Â }
else
{
- Flow = cell->Mass – neigh->Mass - material->MaxCompress;
- Flow *= 0.5f;
+        Flow = cell->Mass – neigh->Mass - material->MaxCompress;
+Â Â Â Â Â Â Â Â Flow *= 0.5f;
}
}
else if ( neighbour cell is below this one )
{
- if ( ( cell->Mass < material->MaxMass ) ||
- ( neigh->Mass < material->MaxMass ) )
- {
- Flow = material->MaxMass - neigh->Mass;
- }
+Â Â Â Â if ( ( cell->Mass < material->MaxMass ) ||
+Â Â Â Â Â Â Â Â ( neigh->Mass < material->MaxMass ) )
+Â Â Â {
+Â Â Â Â Â Â Â Â Flow = material->MaxMass - neigh->Mass;
+Â Â Â }
else
{
- Flow = cell->Mass – neigh->Mass + material->MaxCompress;
- Flow *= 0.5f;
- }
+        Flow = cell->Mass – neigh->Mass + material->MaxCompress;
+Â Â Â Â Â Â Â Â Flow *= 0.5f;
+Â Â Â Â }
}
else // neighbour is on same level
{
- Flow = ( cell->Mass – neigh->Mass ) * 0.5f;
+ Flow = ( cell->Mass – neigh->Mass ) * 0.5f;
}
@@ -439,24 +439,24 @@ The two cases of code for both the up and the down case deal with different situations. The first case is where one of the two cells is not -full of water – on the surface of a body of water, or if the water is splashing -or falling (for example, in a waterfall). Here, the behaviour is simple – water -flows downwards to fill the lower cell of the two to the value MaxMass – this -is the mass of water that can be contained by a single cell’s volume. In the +full of water – on the surface of a body of water, or if the water is splashing +or falling (for example, in a waterfall). Here, the behaviour is simple – water +flows downwards to fill the lower cell of the two to the value MaxMass – this +is the mass of water that can be contained by a single cell’s volume. In the example above, the mass of 1 litre of water.
The second case is where both cells are full of water, or perhaps a bit over-full. This is an area of water that is at pressure, and are cells in the middle of a body of water. Here, the flow acts to try to make sure that the upper cell has exactly MaxCompress more water than the lower cell. -MaxCompress is the amount of “extra” water that can be fitted in because of -compression – in the example above, it would be the mass of 0.01 litres of +MaxCompress is the amount of “extra” water that can be fitted in because of +compression – in the example above, it would be the mass of 0.01 litres of water.
So far the air and water models have ignored a fairly -important property of any liquid or gas – its speed of flow. It has simply +important property of any liquid or gas – its speed of flow. It has simply taken the difference in pressures between two cells, and used that to move mass around. This is fine for relatively static environments that we wish to bring to a stable state (uniform air pressure, or water finding its level). Many @@ -479,7 +479,7 @@
There are two possible ways to think about flow. The first is to think of a flow vector as being the flow through the centre of the cell. -This is possibly the most intuitive model – the flow and the mass of the cell +This is possibly the most intuitive model – the flow and the mass of the cell are both measured at its centre. However, in this case, the flow is affected by the pressure differential between the two neighbouring cells, and in turn determines how mass flows from one neighbouring cell to the other. Note the @@ -489,11 +489,11 @@
The more useful model is to think of each component of the -flow vector as being the flow between two adjacent nodes – from the “current” +flow vector as being the flow between two adjacent nodes – from the “current” node to the node in the positive relevant direction. Thus the flow vector F stored at cell (x,y,z) has the meaning that F.x is the flow from cell (x,y,z) to cell (x+1,y,z); F.y is the flow from cell (x,y,z) to cell (x,y+1,z); and -similarly for F.z. The “meaning” of the vector F is now not as intuitive, but +similarly for F.z. The “meaning” of the vector F is now not as intuitive, but the physical model does seem more sensible. In practice, this is the most common model, but either model can be used for simulation with appropriate adjustment of the various constants.
@@ -505,12 +505,12 @@It is worth mentioning that although one of the most common -applications of flow is in rivers, in most “human-sized” games, large bodies of +applications of flow is in rivers, in most “human-sized” games, large bodies of water such as lakes and rivers are frequently far too large to participate in gameplay. Their behaviour will stay fairly constant whatever the player does, and if they do change, they will do so in highly constrained ways. They do not usually require the flexibility of a CA, and are often far better modelled and -rendered in more conventional ways – pre-animated meshes and collision models, +rendered in more conventional ways – pre-animated meshes and collision models, and scripted events. However, there are many other genres that operate on larger scales and will want to properly simulate rivers with a CA.
@@ -518,21 +518,21 @@Transmitting heat through the environment, whether from burning objects or from other sources, happens through three separate -mechanisms – conduction, convection and radiation.
+mechanisms – conduction, convection and radiation.Conduction is the simplest to simulate. Neighbouring cells pass heat energy between each other so that eventually they reach the same temperature as each other. This is complicated because different materials are -heated by different amounts by the same amount of energy – called the Specific -Heat Capacity (SHC – usually measured in J/kg°C). +heated by different amounts by the same amount of energy – called the Specific +Heat Capacity (SHC – usually measured in J/kg°C). If a hot cell made of water (high SHC, hard to heat up) is next to a colder cell made of the same mass of iron (low SHC), equilibrium will be reached at somewhere very close to the original temperature of the water, not at the average of the two temperatures. This is because when a given amount of energy -is transferred from the water to the iron, the water’s temperature drops far -less than the iron’s temperature rises.
+is transferred from the water to the iron, the water’s temperature drops far +less than the iron’s temperature rises.Note that the above example is true for the same mass of each substance. However, iron has a far greater density than water, and @@ -546,11 +546,11 @@
The code at the end is necessary if two materials with very -different SHCs are side by side – the temperatures of the two can oscillate +different SHCs are side by side – the temperatures of the two can oscillate violently, and can grow out of control. The physically correct solution is to integrate the transfer of heat over time. However, this approach simply finds the weighted average temperature (i.e. the temperature that the system would reach eventually). It is less accurate, but looks perfectly good to the eye, and is quite a bit quicker to execute. Importantly, it obeys the conservation -of energy, so any artefacts are purely temporary – the longer-term state is the +of energy, so any artefacts are purely temporary – the longer-term state is the same as a more realistic simulation.
Hot things glow – they emit light at various wavelengths +
Hot things glow – they emit light at various wavelengths which travels in straight lines, hits other surfaces, and in turn heats them up. This effect is very important physically, but unfortunately is also extremely expensive to model. Each source of heat must effectively shoot many @@ -607,7 +607,7 @@
-float Temp = cell->Temp – material->Flashpoint;
+float Temp = cell->Temp – material->Flashpoint;
if ( Temp < 0.0f )
{
// Not burning.
@@ -672,7 +672,7 @@ Fire
methods discussed above. In real-life fires, convection and radiation are
incredibly important for their behaviour. Convection makes fires spread
vertically far easier than spreading across floors, and leads to distinctive
-“walls of fire” in burning buildings. Radiation concentrates fire in corners of
+“walls of fire” in burning buildings. Radiation concentrates fire in corners of
rooms, causing fire to spread up the corners of room first.
Sadly, radiative heat, as mentioned above, is extremely hard
@@ -687,7 +687,7 @@
Fire
wall heats the air beside it, which rises, heats the section of wall higher up,
which makes it far easier for the flames to spread upwards. In
this hack, conduction of heat is made artificially asymmetrical. In the model
-presented above, a single factor – ConstantEnergyFlowFactor – was
+presented above, a single factor – ConstantEnergyFlowFactor – was
used for heat conduction for all six neighbors of a cell.
Instead of this, a higher figure is used when conducting heat upwards,
and a lower figure when conducting heat downwards.
@@ -697,7 +697,7 @@ Fire
which parts of a room would be more susceptible to fire because of the feedback
effects of radiative heat. One possibility is computing the ambient
occlusion term [Ambient ref] and using that to boost the heat generated
-by fire – generally around the edges and corners.
+by fire – generally around the edges and corners.
A factor that may not be immediately obvious is that these
hacks are far more controllable than any realistic solution. Convection in real
@@ -714,15 +714,15 @@
Dynamic Update Rates
The nature of some of the physical properties being
simulated here require high update rates to maintain realism. The flow of any
property from one cell to another can only proceed at a maximum speed of one
-cell per update cycle. Fire may spread quickly – metres per second or faster.
-Water spilling from a container may move faster – tens of metres per second.
-Explosions require extremely high update rates – real-life explosion shock
-waves spread at the speed of sound – roughly 340m/s.
+cell per update cycle. Fire may spread quickly – metres per second or faster.
+Water spilling from a container may move faster – tens of metres per second.
+Explosions require extremely high update rates – real-life explosion shock
+waves spread at the speed of sound – roughly 340m/s.
Simulating all the above implies that update rates of 680
cycles per second may be required. This is an awesome speed, and it seems
unlikely any current platform can sustain these sorts of update rates for a
-decently-sized game world – the number of cells to update per second is
+decently-sized game world – the number of cells to update per second is
enormous.
As with the optimisation of not storing or processing cells
@@ -750,8 +750,8 @@
Dynamic Update Rates
For example, if a child node needs to be updated every third turn, but the
parent node is marked as being updated every second turn, every sixth turn the
child node needs updating, but the parent does not. Because of the traversal
-algorithm’s early-out path, the child does not get updated this time, and in
-the end only gets updated every sixth turn – half the required frequency.
+algorithm’s early-out path, the child does not get updated this time, and in
+the end only gets updated every sixth turn – half the required frequency.
Quantising update rates to powers of two solves this problem, and also allows
some slight extra efficiency by storing the update rates as the power, rather
than the actual number.
@@ -765,7 +765,7 @@ Dynamic Update Rates
the variable update rate is to choose an update rate to ensure that cells have
fairly constant behaviour over the update interval.
-This assumption makes integration rather simple – simply
+
This assumption makes integration rather simple – simply
multiply the given behaviour (water flow, heat flow, burn rate, etc) by the
time period since the last update. This slight extra complication is more than
offset by the savings in processing time and memory bandwidth from the huge
@@ -808,13 +808,13 @@
References
[Conway Turing ref] An "implementation" of a Turning Machine within Conway's Game of Life, written by Paul Rendell. The original site has vanished off the web, but Archive.org still has a copy here: http://web.archive.org/web/20030210114324/http://www.rendell.uk.co/gol/tm.htm
-[XCom ref] “X-Com: UFO Defense”(US)/”UFO: Enemy Unknown”(UK)
+
[XCom ref] “X-Com: UFO Defense”(US)/“UFO: Enemy Unknown”(UK)
Microprose, 1994 (Codo games, Mobygames and Wikipedia)
-[Red Faction ref] “Red Faction”, developed by Volition Inc,
+
[Red Faction ref] “Red Faction”, developed by Volition Inc,
published by THQ Inc, 2000 (Mobygames). It should be noted that as far as I know, Red Faction did not use the sort of voxelised representation discussed here. My belief (from experimentation) is that it created a BSP representation of the scenery, and as the various "geomod" weapons and abilities were used, they inserted planes and nodes into the BSP. However, it is a good example of a game that takes the idea of "everything is destructible" and makes it a core gameplay feature.
-[Thatcher Ulrich ref] “Loose Octrees”, Thatcher Ulrich, Game
+
[Thatcher Ulrich ref] “Loose Octrees”, Thatcher Ulrich, Game
Programming Gems, Charles River Media, 2000 (http://tulrich.com/geekstuff/)
[Ambient ref] Ambient occlusion turns out to be the DC component of the elegantly general Spherical Harmonic representation. A good introduction to SH for the games programmer is a presentation I did in 2003 called Spherical Harmonics in Actual Games.
diff --git a/papers/gem_imp_filt.html b/papers/gem_imp_filt.html
index 8937812..fbfc0a6 100644
--- a/papers/gem_imp_filt.html
+++ b/papers/gem_imp_filt.html
@@ -1,6 +1,6 @@
- Impostors – Adding Clutter
+ Impostors – Adding Clutter
- Impostors – Adding Clutter
- Impos…what?
+ Impostors – Adding Clutter
+ Impos…what?
Impostoring is a term that is probably not familiar to many reading this.
- However, the concept may be – it has surfaced in various forms many times in
+ However, the concept may be – it has surfaced in various forms many times in
the history of 3D graphics. Simply put, it is about using sprites in a 3D
scene, but instead of an artist drawing or rendering the sprites beforehand,
they are updated on the fly by rendering triangles to them.
Instead of rendering a high-triangle object every frame, the high-triangle
- object is rendered to a texture only occasionally – usually on the order of
+ object is rendered to a texture only occasionally – usually on the order of
once every five to fifty frames. But every frame this texture is mapped onto a
much lower-triangle object and drawn.
The main target for impostors is scenes with lots of small static objects in
- them – clutter. Each of these objects will use an impostor, and most will be
- redrawn at a much lower rate than the main scene’s frame rate. By doing this,
+ them – clutter. Each of these objects will use an impostor, and most will be
+ redrawn at a much lower rate than the main scene’s frame rate. By doing this,
the perceived triangle density is far higher than the actual one, and the
visual quality allowed by all these incidental objects considerably increases
the realism of the scene. The main difference between an office or house in a
@@ -34,13 +34,13 @@
usually a bottleneck, and reducing this for some objects allows greater
triangle detail to be used on others. An impostor is a single texture, whereas
rendering the object normally may require multiple textures and multiple
- texture layers – changing texture flushes the texture cache and may require
+ texture layers – changing texture flushes the texture cache and may require
extra texture management from either the driver, the API, or the application.
Drawing the object each frame requires that it be lit each frame, even if the
lighting has not changed, and as lighting techniques become more sophisticated,
lighting becomes more expensive. And finally there is usually plenty of
application overhead when just thinking about drawing an object, even before a
- single API call is made – using impostors can avoid much of that work.
+ single API call is made – using impostors can avoid much of that work.
The Whole Process
Bounding boxes actually work quite well. Most things in everyday life fill their @@ -143,7 +143,7 @@
There are plenty of objects for which a bounding box is not a good enough approximation, and leads to unnecessary Z-buffer clashes. There is also another - factor to consider in choosing the shape of the impostor – parallax error. The + factor to consider in choosing the shape of the impostor – parallax error. The whole point of a 3D scene is that a camera can move through it, even if the objects in the scene are stationary. A box painted with a picture of something on it is not going to look like that something for long when the camera starts @@ -167,13 +167,13 @@
Another way to deal with this is to move the texture co-ordinates at each vertex each frame. The tri counts involved are fairly low, so a bit more work at each - vertex is unlikely to hurt performance much. The principle is fairly simple – + vertex is unlikely to hurt performance much. The principle is fairly simple – figure out where on the real object (and thus the impostor texture image) each - impostor object’s vertex lies when viewed from a certain angle. As this viewing + impostor object’s vertex lies when viewed from a certain angle. As this viewing angle changes, the texel that the vertex lies over will change. So for a new viewing angle, trace the line from the viewer through the impostor object vertex to the original object. Then work out which texel this part of the @@ -185,14 +185,14 @@
In practice what I have found to be far simpler, and works just fine, is to give - each vertex a "parallax factor" – the number of image texels to move + each vertex a "parallax factor" – the number of image texels to move per degree of viewer movement. This is a factor usually tweaked by hand, and - easily determines the vertex’s texel co-ordinates at runtime. This factor is + easily determines the vertex’s texel co-ordinates at runtime. This factor is only done once for each impostor object vertex, and hand-tweaking around eight to ten vertices per object does not take long.
Alternatively, to generate these parallax values automatically for moderately @@ -249,8 +249,8 @@
Changing the viewing angle is probably the most obvious factor that decides an impostor update. Note that what is important is the vector from the camera to the object in object space. This will change when the object rotates, and also - when the camera moves, both of which are important. The camera’s direction of - view is unimportant – unless an enormous field of view is used, the object does + when the camera moves, both of which are important. The camera’s direction of + view is unimportant – unless an enormous field of view is used, the object does not change appearance much when the camera rotates, only when it moves.
As well as the direction from the object to the camera, the distance between the @@ -271,7 +271,7 @@
The main efficiency hit on most cards is changing rendertarget. This causes @@ -290,9 +290,9 @@
When allocating a block of a certain size, it should be allocated as far down the quadtree as possible, i.e. in a node that is the correct size. If no node is the correct size, the smallest possible node should be split up to create - it. This again prevents larger blocks being split up unnecessarily – they may - be used later by a large impostor. Although this sounds expensive – having to - scan the whole quadtree for the smallest node – in practice a lot of caching + it. This again prevents larger blocks being split up unnecessarily – they may + be used later by a large impostor. Although this sounds expensive – having to + scan the whole quadtree for the smallest node – in practice a lot of caching can be done, and it does not take a significant amount of time.
When updating an impostor, the current state of the object is rendered to the @@ -312,17 +312,17 @@
However, with games such as first-person-shooters, objects in the world are - basically split into two distinct categories – those that almost never move + basically split into two distinct categories – those that almost never move (walls, furniture, clutter), and those that move erratically (players). The movement of the players is notoriously hard to predict, and it is probably a waste of time trying to impostor players.
On the other hand, impostoring the scenery and furniture is a far more viable - proposition. Prediction for them is trivial – they almost never move. And when + proposition. Prediction for them is trivial – they almost never move. And when they do move, they usually move under control of a player, i.e. erratically. The easiest thing is to simply disable impostoring for the duration of the movement.
For god games and Real Time Strategy (RTS) games, the problems are similar, but - the movement of the camera is very different. It is usually a bird’s-eye view, + the movement of the camera is very different. It is usually a bird’s-eye view, and most of the time it is either static (while issuing orders to units), or moving at constant speed over the map to get to a different area. Small, erratic movements are rare, which is fortunate since these are extremely hard @@ -336,14 +336,14 @@
Impostoring is useful when trying to draw scenes with lots of fairly static objects in them. The raw triangle count will overwhelm any bus and graphics device that tries to render them at top detail, and progressive mesh methods - can only do a certain amount to reduce the workload – texture changes and + can only do a certain amount to reduce the workload – texture changes and animation are extremely difficult to reduce in this way.
Impostoring is most effective on static objects some distance from the camera. Introducing this sort of clutter into games increases the visual quality substantially, especially since each object is still a real independent 3D object and can still be interacted with by the player. It also allows key objects to be "hidden in plain sight" amongst a lot of other objects - – something that has been extremely difficult to do with existing techniques + – something that has been extremely difficult to do with existing techniques and the limited number of objects available in a scene.
Even an implementation using a bounding box and some simple maths produces good results for the incidental objects that are currently missing from games, but diff --git a/papers/gem_vipm_webversion.html b/papers/gem_vipm_webversion.html index 97a9ee7..647458c 100644 --- a/papers/gem_vipm_webversion.html +++ b/papers/gem_vipm_webversion.html @@ -94,9 +94,9 @@
This Gem does assume a basic familiarity with VIPM, and there is no space for a thorough introduction here. However, there are several good guides both in print and online. The two best known are -Jan Svarovsky’s Gem in Game Programming Gems 1 [Svarovsky00] and Charles -Blooms’ website [Bloom01], both of which have excellent step-by-step guides to -implementations of the “vanilla” VIPM method. All the methods discussed here +Jan Svarovsky’s Gem in Game Programming Gems 1 [Svarovsky00] and Charles +Blooms’ website [Bloom01], both of which have excellent step-by-step guides to +implementations of the “vanilla” VIPM method. All the methods discussed here use the same basic collapse/split algorithm, but implement it in different ways.
@@ -142,7 +142,7 @@Vertex cache coherency will be quoted in terms of the number of vertices loaded or processed per triangle drawn, or -“vertices per triangle”. Current triangle reordering algorithms for static +“vertices per triangle”. Current triangle reordering algorithms for static (i.e. non-VIPM) meshes, using modern vertex caches of around 16 entries can get numbers down to around 0.65. For an example, see [Hoppe99]. This gives suitable benchmark figures to compare efficiencies when the mesh is converted to a VIPM one. Also note that when calculating the vertices per triangle using triangle strips, only drawn triangles should be counted, not degenerate ones. The -degenerate triangles are a necessary evil – they don’t add anything to the +degenerate triangles are a necessary evil – they don’t add anything to the scene at all.
Algorithms that are good at streaming allow @@ -171,49 +171,49 @@
This also helps systems with virtual memory -– if the data is accessed linearly, the virtual memory manager can swap out +– if the data is accessed linearly, the virtual memory manager can swap out data that has yet to be accessed, or has not been accessed for a long time. Static data can be optimized even further and made into a read-only -memory-mapped file. This also ensures that irritating “loading level” messages +memory-mapped file. This also ensures that irritating “loading level” messages are no more tedious than absolutely necessary. The object data does not all -need to be loaded at the start – the player can start playing the level with +need to be loaded at the start – the player can start playing the level with low-resolution data and as the detailed models are needed, they will be loaded.
All the methods discussed here are based around implementations of the same fundamental algorithm. Single operations are done that collapse a single vertex onto another vertex along one of its -triangle edges. No new “average” vertex is generated, and no collapses between +triangle edges. No new “average” vertex is generated, and no collapses between vertices that do not share an edge are allowed. These are worth looking into, however the current consensus is that they involve a higher runtime cost for equivalent error levels on most current hardware. Of course, things change, and new algorithms are always being invented.
A note on the terminology used. The -“resolution” of a mesh is proportional to the number or triangles in it. Thus a -“high-resolution” mesh undergoes edge collapses and becomes a -“lower-resolution” mesh. The opposite of an edge collapse is an edge “split”, +“resolution” of a mesh is proportional to the number or triangles in it. Thus a +“high-resolution” mesh undergoes edge collapses and becomes a +“lower-resolution” mesh. The opposite of an edge collapse is an edge “split”, where a single vertex splits into two separate vertices. For a given -edge-collapse, there is a “kept” vertex and a “binned” vertex. The binned +edge-collapse, there is a “kept” vertex and a “binned” vertex. The binned vertex is not used in any lower-resolution meshes, the kept vertex is. For a given edge collapse, there are two types of triangle. Those that use the edge -being collapsed will not be in any lower-resolution mesh, and are “binned.” For +being collapsed will not be in any lower-resolution mesh, and are “binned.” For a typical collapse, there are two binned triangles, though there may be more or less for complex mesh topologies. Those that are not binned, but use the binned -vertex are “changed” triangles, and changed so that they use the kept vertex +vertex are “changed” triangles, and changed so that they use the kept vertex instead of the binned vertex. When performing an edge split, the previously -binned vertex and triangles are “new,” though they are often still called -“binned” because typically there are no split data structures, just collapse +binned vertex and triangles are “new,” though they are often still called +“binned” because typically there are no split data structures, just collapse data structures that are done in reverse. Most of the perspective is in the -collapsing direction, so words like “first”, “next”, “before” and “after” are +collapsing direction, so words like “first”, “next”, “before” and “after” are used assuming collapses from a high-triangle mesh to a low-triangle mesh. Again, splits are done by undoing collapses.
This gem will also be talking in a very PC -and DirectX-centric way about CPUs, AGP buses, graphics cards (“the card”), +and DirectX-centric way about CPUs, AGP buses, graphics cards (“the card”), system/video/AGP memory, index and vertex buffers. This is generally just a -convenience – most consoles have equivalent units and concepts. Where there is +convenience – most consoles have equivalent units and concepts. Where there is a significant difference, they will be highlighted. The one term that may be -unfamiliar is the AGP bus – this is the bus between the main system memory (and +unfamiliar is the AGP bus – this is the bus between the main system memory (and the CPU) and the graphics card with its memory. There are various speeds, but this bus is typically capable of around 500Mbytes/sec, which makes it considerably smaller than the buses between system memory and the CPU, and @@ -253,43 +253,43 @@
{
-// The offset of the vertex that doesn't +
    // The offset of the vertex that doesn't vanish/appear.
-unsigned short wKeptVert;
+    unsigned short wKeptVert;
-// Number of tris removed/added.
+Â Â Â Â // Number of tris removed/added.
-unsigned char bNumTris;
+    unsigned char  bNumTris;
-// How many entries in wIndexOffset[].
+Â Â Â Â // How many entries in wIndexOffset[].
-unsigned char bNumChanges;
+    unsigned char  bNumChanges;
-// How many entries in wIndexOffset[] in +
    // How many entries in wIndexOffset[] in the previous action.
-unsigned char bPrevNumChanges;
+    unsigned char  bPrevNumChanges;
-// Packing to get correct short alignment.
+Â Â Â Â // Packing to get correct short alignment.
-unsigned char bPadding[1];
+    unsigned char  bPadding[1];
-
// The offsets of the indices to change.
+Â Â Â Â // The offsets of the indices to change.
-// This will be of actual length +
    // This will be of actual length bNumChanges,
-// then immediately after in memory will be +
    // then immediately after in memory will be the next record.
-unsigned short wIndexOffset[];
+    unsigned short wIndexOffset[];
};
-This structure is not a fixed length – wIndexOffset[] grows to the number of vertices that need changing. This +
This structure is not a fixed length – wIndexOffset[] grows to the number of vertices that need changing. This complicates the access functions slightly, but ensures that when performing collapses or splits, all the collapse data is in sequential memory addresses, which allows cache lines and cache pre-fetching algorithms to work efficiently. @@ -298,25 +298,25 @@
Although at first glance bPrevNumChanges doesn’t seem to be needed for collapses, it is needed when doing -splits and going back up the list – the number of wIndexOffset[] entries in the previous structure is needed so they can be skipped +
Although at first glance bPrevNumChanges doesn’t seem to be needed for collapses, it is needed when doing +splits and going back up the list – the number of wIndexOffset[] entries in the previous structure is needed so they can be skipped over. Although this makes for convoluted-looking C, the assembly code produced is actually very simple.
To perform a collapse, the number of vertices used is decremented, since the binned vertex is always the one on the -end. The number of triangles is reduced by bNumTris – again, the binned triangles are always the ones on the end of the +end. The number of triangles is reduced by bNumTris – again, the binned triangles are always the ones on the end of the list.
The changed triangles all need to be redirected to use the kept vertex instead of the binned one. The offsets of the indices that refer to the binned point are held in wIndexOffset[]. Each one references an index that needs to be changed from the -binned vertex’s index (which will always be the last one) to the kept vertex’s -index – wKeptVert:
+binned vertex’s index (which will always be the last one) to the kept vertex’s +index – wKeptVert:-
+
   Â
VanillaCollapseRecord *pVCRCur = the current collapse;
@@ -338,10 +338,10 @@{
-ASSERT ( +
    ASSERT ( pwIndices[pVCRCur->wIndexOffset[i]] == (unsigned short)iCurNumVerts );
-pwIndices[pVCRCur->wIndexOffset[i]] = pVCRCur->wKeptVert;
+Â Â Â Â pwIndices[pVCRCur->wIndexOffset[i]] = pVCRCur->wKeptVert;
}
@@ -349,7 +349,7 @@pIndexBuffer->Unlock();
-// Remember, it’s not a simple ++ (though the +
// Remember, it’s not a simple ++ (though the operator could be overloaded).
pVCRCur = pVCRCur->Next();
@@ -358,14 +358,14 @@Note that reading from hardware index buffers can be a bad idea on some architectures, so be careful of exactly what -that ASSERT() is doing – it is mainly for illustration purposes.
+that ASSERT() is doing – it is mainly for illustration purposes.
[Figure 1 - insert VIPM_fig1.SDR]
-Figure 1 – An edge collapse with before and +
Figure 1 – An edge collapse with before and after index lists and the VanillaCollapseRecord.
@@ -389,10 +389,10 @@
{
-ASSERT ( +
    ASSERT ( pwIndices[pVCRCur->wIndexOffset[i]] == pVCRCur->wKeptVert );
-pwIndices[pVCRCur->wIndexOffset[i]] = +
    pwIndices[pVCRCur->wIndexOffset[i]] = (unsigned short)iCurNumVerts;
}
@@ -407,7 +407,7 @@-
Note – in practice, and for arbitrary +
Note – in practice, and for arbitrary historical reasons, in the sample code the VertexCollapseRecords are stored last first, so the Prev() and Next() calls are swapped.
Vanilla VIPM is simple, easy to code, and @@ -451,7 +451,7 @@
This means that the order of triangles is -no longer determined by collapse order – they can be ordered by some other +no longer determined by collapse order – they can be ordered by some other criteria. The cunning thing that the original SkipStrips paper pointed out is that triangles can now be ordered into strip order, and indeed converted into strips. This is great for hardware that prefers their data in strip order. @@ -459,7 +459,7 @@
The ability to reorder triangles increases -vertex cache coherency. Strips are naturally good at this – they have an +vertex cache coherency. Strips are naturally good at this – they have an implicit 1.0 vertices per triangle efficiency (for long strips with no degenerates), and with the right ordering and a decent-sized vertex cache they can get much lower values.
@@ -485,13 +485,13 @@Fortunately, there is a solution to most of -skipstrip’s woes. After a certain number of collapses, simply stop, take the +skipstrip’s woes. After a certain number of collapses, simply stop, take the current geometry with all of its collapses done, throw away the degenerate triangles, and start making a completely new skipstrip from scratch. Continue collapses with this new skipstrip until it too becomes inefficient, and so on.
@@ -500,22 +500,22 @@The different index lists can be stored globally, since when switching to a new list a new copy is taken and then refined with collapses to exactly the number of triangles wanted. So the fact -that there are now multiple index lists is not too bad – it’s global data. This +that there are now multiple index lists is not too bad – it’s global data. This also restores some of the nice streaming-friendliness that the vanilla method -has. The granularity is a bit coarser – the whole of an index list must be -grabbed before anything can be rendered using that level, but at least it’s no +has. The granularity is a bit coarser – the whole of an index list must be +grabbed before anything can be rendered using that level, but at least it’s no longer an all-or-nothing thing, and the lower-resolution index lists are actually very small.
For a bit more efficiency, two versions of -the index lists can be stored in global space – fully collapsed (before switching +the index lists can be stored in global space – fully collapsed (before switching to a lower-resolution list that is) and fully uncollapsed. This means that a single-collapse oscillation across the boundary between two index lists is still fairly efficient. If only the uncollapsed versions are held, each time @@ -529,7 +529,7 @@
So this has fixed all the bad things about @@ -557,7 +557,7 @@
On a multi-level skipstrip, a lot of the triangles are not affected even when that level is fully collapsed. So there is no need to copy those triangles per-instance; they can be global and shared -between instances. In fact, for this algorithm indexed lists are used – the +between instances. In fact, for this algorithm indexed lists are used – the indexed strip case will be discussed afterwards as a variant. At each level, the triangles are split into four lists:
@@ -587,12 +587,12 @@To draw the mesh, the required collapses and splits are done to the dynamic per-instance list, and the list is drawn. -Then the associated level’s static list is drawn, with the only modification +Then the associated level’s static list is drawn, with the only modification being that the number of triangles drawn will change as static triangles are collapsed.
The code and structures needed are based on -the multi-level skiplist, except that for each level there are two lists – the +the multi-level skiplist, except that for each level there are two lists – the copied dynamic one and the shared static one. The other change is that there are two triangle counts, one for each list, and a collapse may alter either or both of these numbers. So the bNumTris member is replaced @@ -602,7 +602,7 @@
This means that a large proportion of each mesh is being drawn from a static index buffer that is tuned for vertex cache coherency (list 1). It is not quite as good as it could be, since the triangles -in this list only make up part of the object. There will be “holes” in the mesh +in this list only make up part of the object. There will be “holes” in the mesh where triangles have been moved to the other three lists, and this decreases both the maximum and the actual vertex per triangle numbers that are obtained. Some of the dynamic buffer is also ordered for optimal vertex cache behavior @@ -612,9 +612,9 @@
Like all multi-level methods, it is streaming/friendly, though in this case, since the lists are ordered by -collapse order, the granularity is even finer – at the triangle level, not just +collapse order, the granularity is even finer – at the triangle level, not just the list level. Whether this is a terribly exciting thing is a different -question – the finer control is probably not going to make much of a difference +question – the finer control is probably not going to make much of a difference to performance.
This does require two DrawIndexedPrimitive @@ -631,7 +631,7 @@
Sliding window notes that when a collapse -happens, there are two classes of triangles – binned triangles and modified +happens, there are two classes of triangles – binned triangles and modified triangles. However, there is no real need for the modified triangles to actually be at the same physical position in the index buffer before and after the collapse. The old version of the triangles could simply drop off the end of @@ -659,15 +659,15 @@
So instead of an example collapse binning two triangles and editing three others, it actually bins five triangles and adds three new ones. Both operations are performed by just changing the first -and last indices used for rendering – sliding a “rendering window” along the +and last indices used for rendering – sliding a “rendering window” along the index buffer.
-
[Figure 2 – insert
+ [Figure 2 – insert
VIPM_fig2.sdr] Figure 2 – a collapse showing the index
+ Figure 2 – a collapse showing the index
list and the two windows. Sliding Window
Note that a triangle modified as the result of a collapse cannot then be
involved (either binned or changed) in another collapse. To be modified by a
second collapse would mean that triangle would have to fall off the end of the
-index buffer. But it has already been added to the start – it cannot then also
-fall off the end – the chance of the ordering being just right to allow this
+index buffer. But it has already been added to the start – it cannot then also
+fall off the end – the chance of the ordering being just right to allow this
are incredibly slim.
So once a triangle has been modified by a @@ -704,9 +704,9 @@
However, there is actually no need to strictly follow the order of collapses that QEM decides. Progressive meshing is not an exact science, since it ignores everything but the distance of the -camera from the object, and the whole point is to simply be “good enough” to +camera from the object, and the whole point is to simply be “good enough” to fool the eye. So there is no real need to precisely follow the collapse order -that QEM decides – it can be manipulated a bit.
+that QEM decides – it can be manipulated a bit.The way to do this is to follow the QEM collapse order until it decides to do a collapse that involves triangles that @@ -736,7 +736,7 @@
Since no runtime modification is made to the index or vertex lists, all the data can be made global, and there is almost zero instance memory use. There is also almost zero CPU use to change level of -detail – each time a simple table lookup is made to decide the index list to +detail – each time a simple table lookup is made to decide the index list to use, the start and end index to draw from that index list, and how many vertices are used. In practice, the index lists are concatenated together, so that the start index also implies the index list to use. The table is composed @@ -748,11 +748,11 @@
{
-unsigned int dwFirstIndexOffset;
+unsigned int dwFirstIndexOffset;
-unsigned short wNumTris;
+unsigned short wNumTris;
-unsigned short wNumVerts;
+unsigned short wNumVerts;
};
@@ -772,42 +772,42 @@d3ddevice->DrawIndexedPrimitive (
-D3DPT_TRIANGLELIST, // +
D3DPT_TRIANGLELIST, // Primitive type
-0, // +
0, // First used vertex
-pswr->wNumVerts, // +
pswr->wNumVerts, // Number of used vertices
-pswr->dwFirstIndexOffset, // +
pswr->dwFirstIndexOffset, // First index
-pswr->wNumTris ); // +
pswr->wNumTris ); // Number of triangles
There is no code to do splits or collapses -as with all the other methods – the current LoD is just looked up in the +as with all the other methods – the current LoD is just looked up in the SlidingWindowRecord table each time the object is rendered. This also means that with hardware transform and lighting cards, the CPU time required to render objects is fixed and constant per object, whatever their level of -detail. The phrase “constant time” is always a good one to find lurking in any +detail. The phrase “constant time” is always a good one to find lurking in any algorithm.
The major problem with sliding window VIPM is that it forces the ordering of the triangles at the start and end of each -level’s index lists. This has two effects – one is that it makes strips hard to -use – only triangle lists really handle fixed-ordering well. The other is that +level’s index lists. This has two effects – one is that it makes strips hard to +use – only triangle lists really handle fixed-ordering well. The other is that vertex cache efficiency is affected.
Fortunately, it is not as bad as it first seems. When an edge collapse is performed, all the triangles that use the binned vertex are removed, so they all go on the end of the triangle list. This is typically from five to seven triangles, and they form a triangle fan around -the binned vertex. Then the new versions of the triangles are added – these +the binned vertex. Then the new versions of the triangles are added – these need to go together at the start of the index list, there are typically three to five of them, and they form a triangle fan around the kept vertex. These fans can be ordered within themselves to get the best cache coherency. The @@ -817,7 +817,7 @@
Vertex cache coherency can be raised by -having a larger middle index list section in each level – by having fewer +having a larger middle index list section in each level – by having fewer collapses per level. This takes more memory, but the extra performance may be worth it, especially as it is global memory.
@@ -825,7 +825,7 @@Table 1 shows the results of each method -with their relative strengths and weaknesses. Note that “skipstrips” refers to -multi-level skipstrips – the single-level version is not actually a sensible +with their relative strengths and weaknesses. Note that “skipstrips” refers to +multi-level skipstrips – the single-level version is not actually a sensible method in practice, for the reasons given.
@@ -912,8 +912,8 @@
[Svarovsky00] Svarovsky, Jan. “View-independent -Progressive Meshing”, Games Programming Gems 1, ISBN 1584500492. +
[Svarovsky00] Svarovsky, Jan. “View-independent +Progressive Meshing”, Games Programming Gems 1, ISBN 1584500492. A similar talk was given at GDC99, available from http://www.svarovsky.org/ExtremeD/.
@@ -923,14 +923,14 @@[Hoppe99] Hoppe, Hugues. “Optimization of -Mesh Locality for Transparent Vertex Caching”, Computer Graphics (SIGGRAPH 1999 +
[Hoppe99] Hoppe, Hugues. “Optimization of +Mesh Locality for Transparent Vertex Caching”, Computer Graphics (SIGGRAPH 1999 proceedings) pages 269-276. Also from http://www.research.microsoft.com/~hoppe/
[El-Sana99] J. El-Sana, F. Evans, A. Varshney, -S. Skiena, E. Azanli,. “Efficiently Computing and Updating Triangle Strips for -View-Dependent Rendering” The Journal of Computer Aided Design Vol 32, IS 13, +S. Skiena, E. Azanli,. “Efficiently Computing and Updating Triangle Strips for +View-Dependent Rendering” The Journal of Computer Aided Design Vol 32, IS 13, pages 753-772. Also from at http://www.cs.bgu.ac.il/~el-sana/publication.html
@@ -948,27 +948,27 @@It is worth noting a few misconceptions about VIPM that have cropped up in discussions with people.
First, VIPM is not aimed at getting low-end machines running faster. It can have this side effect in many cases, but that is not its primary -aim. This may surprise many people – increasing speed on slow machines is often +aim. This may surprise many people – increasing speed on slow machines is often assumed to be the whole point of VIPM, and many approach it with this in mind. However, with low-end machines you need to render objects with as few tris as possible. The problem with most forms of VIPM is that they produce very poor low-tri representations of objects, at least when compared to the output of a good 3D artist given the same triangle budget. If the aim of an engine is to run well on low-end machines, getting an artist to author good low-tri models -is really the only way to proceed – they produce far better looking 200-tri people +is really the only way to proceed – they produce far better looking 200-tri people than any automated VIPM routine can.
The real aim of VIPM is to allow artists to author high-tri models for use on high-end platforms, but allowing the extra triangle detail to be used where it matters in a scene, without burning the triangle budget on -unnoticed background detail. This is the real power of scalability techniques – +unnoticed background detail. This is the real power of scalability techniques – the ability to have lots of extremely detailed models in a scene, but only showing the detail that is visible at any one time. The fact that by tweaking a few global values, lower-end machines can use all the detail their available @@ -1013,7 +1013,7 @@
Many people including myself have had excellent results with -Garland and Heckbert’s Quadric Error Metrics [GarlandHeckbert97], +Garland and Heckbert’s Quadric Error Metrics [GarlandHeckbert97], and the further refinements introduced by Hoppe [Hoppe99] for vertex attributes such as colour, texture co-ordinate and so on. The maths of the academic papers can be initially intimidating, but the implementation is -actually very simple, versatile and quick. A very naďve version of the basic +actually very simple, versatile and quick. A very naĂŻve version of the basic Garland-Heckbert QEM is included in the code, mainly so that the data generated is representative of real VIPM collapse orders.
@@ -1048,7 +1048,7 @@At the low tri levels, sliding window could be used. Although it has a comparatively low vertex cache efficiency, the objects are at -such low triangle levels that they don’t actually make up much of the scene’s +such low triangle levels that they don’t actually make up much of the scene’s triangle budget. However, there are a lot of them, and sliding window not only has near-zero memory per instance, but also changes LoD in constant (and -nearly zero) time – always a good characteristic when the number of separate +nearly zero) time – always a good characteristic when the number of separate objects is high. Another side effect is that since these are low tri levels, the number of sliding window levels can be increased without too much memory bloat. As noted above, this increases the vertex cache coherency, if that is @@ -1087,15 +1087,15 @@
So 20Mtri/sec at 30Hz means 667ktris per frame. Assuming those are all from vanilla VIPM objects, and they were all unique (neither of -which is true – most scenes have some objects rendered with multiple passes as +which is true – most scenes have some objects rendered with multiple passes as well as non-VIPM objects such as particle effects), and assuming the 30% over-estimation, that means 867ktris held in VIPM object index buffers each -frame. We’re using WORD indices, and triangle lists, so 6 bytes per triangle. +frame. We’re using WORD indices, and triangle lists, so 6 bytes per triangle. Which gives 5.20Mbytes. Which is a decent chunk, but not actually all that much considering this is the worst-case scenario on the (current) best hardware, being driven at peak efficiency at a fairly modest frame rate. This machine @@ -1114,14 +1114,14 @@
The other big weakness of sliding window was going to be the -vertex cache coherency. But in fact it’s not too bad. The fragmented start and +vertex cache coherency. But in fact it’s not too bad. The fragmented start and end of each index level are bad, but they are still fans, rather than individual triangles, so not mad. For a typical mesh, the tri fans at the start -of the list will have four tris and reference six vertices – a rate of 1.5 +of the list will have four tris and reference six vertices – a rate of 1.5 verts per tri. Not very good, but not stupidly bad. The tri fans at the end of the list typically have six tris and reference seven verts (remember that they form a complete fan around the binned vert), a rate of 1.17 tris per vert. @@ -1138,7 +1138,7 @@
The sample code has a number of features to be aware of. It does not handle multiple objects, just multiple instances of the same object. -It does not handle materials – everything is just Gouraud-shaded white. It does -not try to deal sensibly with seams and edges of meshes – it simply avoids +It does not handle materials – everything is just Gouraud-shaded white. It does +not try to deal sensibly with seams and edges of meshes – it simply avoids doing any collapses on them at all. It does not take normals into account when -computing collapse errors, only vertex positions – the normals are simply there +computing collapse errors, only vertex positions – the normals are simply there for display purposes. In short, the collapse-generation algorithm does only just enough work to generate representative collapse orders, so as not to skew the comparison of the various methods. All these problems are interesting and @@ -1165,20 +1165,20 @@
It is worth noting that sitting an artist down and getting them to click on each edge to determine the collapse order is not that mad an idea. It takes less time than you might think, and keeps all the control in the -artist’s hands. When the model is millions of tris, it’s a bit impractical, but +artist’s hands. When the model is millions of tris, it’s a bit impractical, but at that level an automated collapse finder gets good results. All the errors -are small, so it’s hard to produce obviously wrong results. Once an object gets +are small, so it’s hard to produce obviously wrong results. Once an object gets below a few thousand triangles, even the best QEM methods start to struggle, and that sort of tri count is low enough for an artist to spend time doing the rest by hand. In the grey area in between, using a mainly automated system with the artist watching and occasionally forcing an early collapse or (more frequently) forbidding collapses that the error metric has mistakenly labelled -as “low error” works quite well. (note - this paragraph is somewhat +as “low error” works quite well. (note - this paragraph is somewhat theoretical - see the "Hindsights" section below for what we ended up actually using for VIPM in Blade 2)
The sample code also does not attempt to choose the level of -detail of objects in any particularly sensible way – it just aims roughly for a +detail of objects in any particularly sensible way – it just aims roughly for a set number of triangles (or rather collapses) at a certain distance from the camera. Again, the aim is to get representative comparisons of the various methods, rather than to actually get good visual quality. In practice, the @@ -1190,8 +1190,8 @@
All these methods use the same fundamental algorithm – a -vertex is collapsed along an edge to another vertex. No new “average” vertex +
All these methods use the same fundamental algorithm – a +vertex is collapsed along an edge to another vertex. No new “average” vertex is generated. It should be noted that calculating an average vertex position is tricky (the straight linear average is not a good approximation in many cases), and many dispute whether it gives much of a useful increase in detail for the @@ -1202,7 +1202,7 @@
Anyway, there are two ways to do this – add those average +
Anyway, there are two ways to do this – add those average vertices into the global vertex buffer along with the originals, or make the vertex buffer a per-instance resource and when the collapse is made, replace one of the vertices with the new average data.
@@ -1222,13 +1222,13 @@However, using something similar to the “sliding window” +
However, using something similar to the “sliding window” algorithm, but for vertices instead of triangles, would solve both problems. The vertex data would still be global, not per-instance; there is no runtime modification of it, it has no holes, and the memory use can be kept @@ -1249,16 +1249,16 @@
Using the sliding window method for vertices would again -allow the central region to be ordered optimally – typically simply in the +allow the central region to be ordered optimally – typically simply in the order that tris use them. This means that the lookahead reads that the system memory bus does are more likely to be correct, and also allows the AGP more chance to squirt linear chunks of memory at the graphics card. Even though it -doesn’t directly affect vertex cache coherency, it does increase the effective +doesn’t directly affect vertex cache coherency, it does increase the effective bandwidth of the AGP bus, and that is increasingly becoming the bottleneck in -today’s graphics cards.
+today’s graphics cards.Another future possibility is to extend the code to use the -“limited instance” idea. Use something like skipstrips, but instead of having +“limited instance” idea. Use something like skipstrips, but instead of having one instance per on-screen object, the number of instances is set to a fixed (fairly low) number. When an object needs rendering at a certain LoD, take the instance that is currently closest to this LoD and collapse/split it until the @@ -1274,14 +1274,14 @@
[GarlandHeckbert97] Garland, Heckbert, . “Surface Simplification Using Quadric Error Metrics”, SIGGRAPH 97. Also from
+ [GarlandHeckbert97] Garland, Heckbert, . “Surface Simplification Using Quadric Error Metrics”, SIGGRAPH 97. Also from
http://graphics.cs.uiuc.edu/~garland/research/quadrics.html [Hoppe99] Hoppe, Hugues. “New quadric metric for simplifying meshes with appearance attributes”, IEEE Visualization 1999, October 1999, 59-66. Also from
+ [Hoppe99] Hoppe, Hugues. “New quadric metric for simplifying meshes with appearance attributes”, IEEE Visualization 1999, October 1999, 59-66. Also from
http://research.microsoft.com/~hoppe/ [Forsyth00] Forsyth, Tom. “Where have all the bumpmaps gone?”, Meltdown 2000. Also from
+ [Forsyth00] Forsyth, Tom. “Where have all the bumpmaps gone?”, Meltdown 2000. Also from
http://www.eelpi.gotdns.org/papers/papers.htmlHTML versions of older articles.
Comparison of VIPM Methods
-Impostors – Adding Clutter
+Impostors – Adding Clutter
Cellular Automata for Physical Modelling
diff --git a/papers/trilight/trilight.html b/papers/trilight/trilight.html
index 35b7eec..7724760 100644
--- a/papers/trilight/trilight.html
+++ b/papers/trilight/trilight.html
@@ -181,7 +181,7 @@
A “trilight” lighting equation is +
A “trilight” lighting equation is a generalisation of a bunch of lighting models people have used in games over the years:
@@ -206,16 +206,16 @@The problem with pure Lambert lighting is that 50% of your -model is completely dark, so the back side doesn’t really show up very well +model is completely dark, so the back side doesn’t really show up very well without adding more lights. A wrap-around light is a very crude simulation of the light scattering back from the surroundings and lighting the back side to some extent. This has been used by a variety of games for ages, but the first -place I’ve seen it actually documented was in an aside in an article about Half -Life 2’s lighting model.
+place I’ve seen it actually documented was in an aside in an article about Half +Life 2’s lighting model.“factor” ranges from 0 to 1, with +
“factor” ranges from 0 to 1, with many games just hard-wiring it to 1.
res = lerp ( colour2, colour0, (N.L+1)/2 )
-= colour2 + +
           = colour2 + (colour0-colour2) * (N.L+1)/2
-= +
           = (colour0+colour2)/2 + (colour0-colour2) * (N.L)/2
This has also been used in many games for ages, but the first time I remember seeing it actually documented was in a talk by Chas Boyd of Microsoft talking about more interesting light models. This is good for -outdoor scenes – typically the colours are a light blue for the sky and a dark +outdoor scenes – typically the colours are a light blue for the sky and a dark greenish/brown for the ground. These are the preset colours in the trilight demo.
Note – I’ve used colour0 and colour2, missing out colour1 to -be consistent with the trilight’s terminology below.
+Note – I’ve used colour0 and colour2, missing out colour1 to +be consistent with the trilight’s terminology below.
Yet another lighting model used for ages. The first place I’ve -seen it documented is in an article on Tabula Rasa’s +
Yet another lighting model used for ages. The first place I’ve +seen it documented is in an article on Tabula Rasa’s rendering model in an upcoming GPU Gems book. This is distinct from the hemispherical lighting model because a lit sphere will have a ring around the middle between the two lights. In hemispherical lighting, this ring is the @@ -269,8 +269,8 @@
This lighting model is a cheap approximation of the classic -movie lighting of “key” and “fill” lights. The usual example is at night - a -bright yellow “key” light from a streetlamp and a dark-blue “fill” light from +movie lighting of “key” and “fill” lights. The usual example is at night - a +bright yellow “key” light from a streetlamp and a dark-blue “fill” light from the night sky to give shape to the dark parts of the character. In fact usually the fill is not directly opposite the key light, but this is a very similar effect and is basically free.
@@ -288,8 +288,8 @@The idea of the trilight is to unify all the above lighting models into one. As a bonus, it also gives more -control than any one of them. In the same way that I’ve never seen the above -lighting models documented until fairly recently, but I know they’ve been used +control than any one of them. In the same way that I’ve never seen the above +lighting models documented until fairly recently, but I know they’ve been used for ages, I suspect this model has been used by others even before I did (which I think was sometime in 2002). But I thought it was time to write the thing down so that everyone could use it.
@@ -297,7 +297,7 @@The trilight is obviously a superset -of Lambert and bi-directional – simply set colour1 and/or colour2 to black.
+of Lambert and bi-directional – simply set colour1 and/or colour2 to black.trilight = colour0 * (N.L) + colour1 * (1-(N.L))
-= +
           = colour0 * (N.L) + colour1 - colour1 * (N.L)
-= +
           = (colour0-colour1) * (N.L) + colour1
-= +
           = (colour0-(colour0+colour2)/2) * (N.L) + (colour0+colour2)/2
-= +
           = ((colour0-colour2)/2) * (N.L) + (colour0+colour2)/2
-= +
           = same as hemispherical
For N.L < 0:
@@ -331,16 +331,16 @@trilight = colour1 * (1+(N.L)) - colour2 * (N.L)
-= +
           = -colour2 * (N.L) + colour1 + colour1 * (N.L)
-= +
           = (colour1-colour2) * (N.L) + colour1
-= +
           = ((colour0+colour2)/2-colour2) * (N.L) + (colour0+colour2)/2
-= +
           = ((colour0-colour2)/2) * (N.L) + (colour0+colour2)/2
= same as @@ -354,7 +354,7 @@
The trilight is very close to wrap-around lighting when colour2 = black, colour1 = factor*colour0/2. This is only an -approximation, but it’s pretty close as you can see from the demo and from the +approximation, but it’s pretty close as you can see from the demo and from the following:
= colour0 * (N.L) + factor *colour0* (1-(N.L))/2
-= +
           = colour0 * ((N.L) + factor /2 - factor *(N.L)/2)
-= +
           = colour0 * ((N.L)*(2- factor) + factor))/2
For N.L < 0:
@@ -378,13 +378,13 @@trilight = colour1 * (1+(N.L)) - colour2 * (N.L)
-= +
           = colour1 * (1+(N.L))
-= +
           = factor *colour0 * (1+(N.L))/2
-= +
           = colour0 * (factor /2) * ((N.L)+1)
You probably want to make the demo fullscreen -– there’s a lot of sliders in the bottom right. From +– there’s a lot of sliders in the bottom right. From top to bottom: