Dev Log 2018-05-21¶
Entity Graph Completed¶
Entity graphs/scene graphs with parent/child relationships and relative transformations are now fully supported.
I corrected errors in matrix multiplication order as well as storage format. The second problem was difficult to find, only subtly becoming apparent when entities changed display size along with their z position, which shouldn’t have happened with the orthographic projection matrix I was using. Some people in freenode ##OpenGL helped me narrow it down to transpose. As derhass said, “matrix layout is like the USB A connector. you always get it wrong the first round”.
The storage format issue was ultimately a discrepancy between my matrix class and GLM’s, which I was mixing. My matrices are stored row major, whereas GLM’s are stored column major. Everything outside GLM was configured to expect row major, so multiplication against GLM’s matrices was yielding the wrong result. I initially tried to correct this with GLM’s transpose function, but this is what happened when I tried it:
In [8]: m = glm.ortho(-320.0,320.0,-240.0,240.0,0.0,100.0)
In [12]: m
Out[12]:
tmat4x4
[ 0.003125 | 0 | 0 | 0 ]
[ 0 | 0.00416667 | 0 | 0 ]
[ 0 | 0 | -0.02 | 0 ]
[ -0 | -0 | -1 | 1 ]
In [13]: glm.transpose(m)
Out[13]:
tmat4x4
[ 0.003125 | 0 | 0 | -0 ]
[ 0 | 0.00416667 | 0.00416667 | -0 ]
[ 0 | 0 | -0.02 | -1 ]
[ 0 | 0 | 0 | 1 ]
The extra 0.00416667 in [1][2] of the transpose is incorrect. I’m not sure if this is an issue with GLM itself or PyGLM. Since I only need simple view and projection matrices for now, I removed PyGLM and am calculating them directly. I would like to be able to use PyGLM’s features eventually, so I will try the same transpose experiment with the C GLM library and isolate the error.
Working through different matrix operations on paper and with IPython this week has helped me to understand how to use a matrix to represent a system of linear equations. I’ve also gotten a better understanding of how to use matrices as a storage format for individual transformations that can be executed in series using matrix multiplication.
The entity graph is implemented by sending each entity’s vertices through a series of transformation matrices: projection * view * parent * display * translate * scale * rotate.
With the entity graph completed, development on another important feature begins.
Particle Systems Started¶
Particle systems can create many visual effects, and can also be a performance indicator for a graphics engine. I wanted to see how well ng could do this in its current state, so I started work on a simple particle system. Particles are created by an emitter. An emitter is created as follows,
self.emitter0 = self._ng.emitter_create(name='emitter0', delay_sec=0.1,
template={'name': 'star', 'position': ((635, 0, 0), (635, 480, 0)),
'origin': (0.0, 0.0), 'velocity': ((-50.0, 0, 0), (-70, 0, 0)),
'texture': {'name': 'star02'}})
where delay_sec
is the delay in seconds between particles, and template
is the entity creation template. The creation template is the blueprint the
emitter uses to create its particles. Note the two 3d values supplied for both
position and velocity. This syntax means each new particle entity will be
assigned a random value between the two values of these arguments. In
the above example, each particle created by the emitter is assigned a position
with a random y value between 0 and 480.
I created a demo with a scrolling star field and ships with fuel trails to test the emitters.
These initial emitters simply create many entities on a delay, so a particle created by an emitter has the same properties as any other entity created by the engine. I don’t yet know if this pattern can scale to thousands or more particles. This will become clear in the next days and weeks, both as I finish more rounds of performance optimization, and as I learn more about the relationships between CPU, GPU, and OpenGL object model with respect to overall performance.
Serendipitously, implementing the initial particle system this way has been an excellent first performance test. Emitters are a great way to test frequently creating and destroying a large number of entities and examine the effects on frame rate under these conditions.
Performance Optimizations¶
As initially configured, the new demo keeps a few hundred entities on display at once with particle emitters. The first time I ran the demo, it highlighted performance issues. Frame completion times quickly exceeded 80-100 milliseconds/frame. These issues haven’t been noticeable with previous demos, which are simple feature tests using only a small number (< 10) of simultaneous entities.
I used the cProfiler module to profile some runs of the demo. Then, I used the snakeviz module to visualize the results. This was the first time profiling the engine, so I wasn’t surprised to see many functions with cumulative run time higher than expected. I went through the worst offending functions with line_profiler to find the costliest and easiest to optimize operations. After each change, I repeated the profiling, analysis, and optimization process, gradually improving performance.
There were several easy optimizations. Some of these were issues I’d already anticipated; I’d written simple, but potentially problematic code, mentally noted it as such, and moved on. Others were surprises.
Mesh VBO Optimization¶
The GL renderer was updating the VBO for each entity mesh every frame. The optimization is to only update the VBO when the vertices actually change. This happens only when the entity size or origin changes. On one of these changes, the entity now fires an ENTITY_MODEL_INIT event. The GL renderer updates the VBO for an entity only in response to this event.
UV VBO Optimization¶
The GL renderer was updating the VBO for texture coordinates every frame. The optimization is to only update the VBO when texture coordinates change. This happens only when the entity’s texture or animation frame changes. The renderer now stores the UV coordinates in the VBO in main memory, as well. When the texture or animation frame changes, it compares the new coordinates to the in-memory ones. If they differ, it updates the VBO with the new coordinates. Rather than using this state polling loop, I’d prefer to handle this update using an event handler as described in the mesh optimization above. The requisite TEXTURE_CHANGE event isn’t available yet, so I plan to refactor this later after adding it.
Cache Shader Uniform Locations¶
The GL renderer was calling glGetUniformLocation
to get uniform locations
every frame. This call is expensive and the returned value is valid for the
lifetime of the shader program. The optimization is to store uniform locations
in main memory after linking the shader, and use the cached locations as needed
instead of calling glGetUniformLocation
.
Fewer Matrix Operations¶
ng’s current patterns of creating and multiplying matrices are expensive (more on this in the next post). One way to mitigate this is to do less of it.
The entity update routine was recalculating every transformation matrix (rotate, scale, and translate) as well as the model matrix each time any of local space properties changed. The optimization is to change only what is necessary. For example, if the user updates the entity’s position, only recalculate the translate matrix and the model matrix.
I think there are still opportunities to reduce the number of matrix operations each frame, and will come back to this.
Improvement¶
After these and few other changes, the emitter demo runs at <50 ms/f on my development system. There is a lot more to do for performance, and I’m looking forward to doing more optimizations over the next few days.
Next?¶
I would like to create a side scrolling space shooter in June. With the entity graph and particle system features complete, the major features required for this are there.
The question is around performance. My hunch guess is that what I have in mind would run at an average 80-100 ms/f in the current version of ng, where I would prefer it to be less than 35 ms/f. In addition to concerns around average frame rate, there are also intermittent frame rate drops in the emitter demo. Every few seconds, the time between two frames doubles, creating a visual jerk before recovering to the average frame rate.
It’s become clear to me that performance tuning will be an ongoing activity. This week, I’ll develop tools to analyze average and instantaneous frame rate issues and see how much I can do to improve. We’ll re-evaluate the results and see if we can start making a side scrolling shooter in June.