Particle systems are awesome! Not only can you create amazing effects, but you can also optimize code and push even more and more pixels to the screen. This post series will cover how to design a flexible particle system and apply a bunch of optimizations to run it faster. Flexible means that it can be used in real applications and for a variety of graphics effects.
Introduction
For some time I have been playing with my own little particle system. One previous post shows some effects that I was able to make using the system. Since that moment I did not create any more effects, however I've spent this time on optimizations and improvements.
I would like to show you more, or say that I optimized the code by 100000%... but it is not that easy :) Still I think it is valuable to share my current experience.
This post will cover the basics of the particle system and my assumptions.
Let's start!
The Series
- Introduction (this post)
- Particle Container 1 - problems
- Particle Container 2 - implementation
- Generators and Updaters
- Renderer
- Tools Optimizations
- SIMD Optimizations
- Renderer Optimizations
Big Picture
What is needed to create a particle system:
- array of particles - we need some container to keep particles. Particles are dynamic things so we also need efficient way of making a particle alive or dead. It seems that even
std::vector
is not enough for this purpose. Another thing is what data should one particle contain? Should we use Array of Struct (AoS) or maybe Struct of Arrays (SoA)? - generators/emitters - they create (make alive) particles, sets their initial parameters
- updaters - when a particle is alive there has to be a system that updates it and manages its movements.
- a renderer - finally we need a way to push all the data to the screen and render the whole system. Rendering particle system is an interesting topic on its own because there are lots of possible solutions and techniques.
And probably that is all for a good start.
Stateless vs State preserving particle systems
When implementing a particle system it is important to notice that we can update particles in two ways:
Stateless way
It means that we compute current position/data/state from initial values and we do not store this calculated state. Take a look at this simple movement equation used in a simple particle system:
pos = pos_start + vel_start*time + 0.5*acc*time*time;
This computed
pos
is used usually only for rendering. In the next frame, the time
will change and thus we will get different value for pos
.Lots of graphics tutorials have such particle systems. It is especially visible as an example for vertex shaders. You can pass start data of particles to vertex shader and then update only time value. Looks nice but it is hard to create advanced effects using such technique.
Pros:
- simple to use, no additional data is needed, just start values
- very fast: just create initial data, need to update particle buffer only when a particle is killed or born.
Cons:
- only for simple movement equations
state preserving
As name suggests we will store current state of particles. We will use previous state(s) to compute the current one. One of the most popular way to do this is called Euler method:
vel = vel + delta_time * acc;
pos = pos + delta_time * vel;
Pros:
- can be used to create advanced effects
Cons:
- need a storage for internal/current state
- more computations and updates needed than in stateless system
I will leave this topic, but it will come back when I show actual implementation of the system.
Assumptions/Requirements
What would I like to achieve with the system:
- Usability - the whole system will not be just little experiment with some simple update loop, can be used to create several different effects.
- Easy to extend - different modules or option to create own parts.
- Performance - should be fast enough. This is quite vague spec, but whole optimization part will be a great playground for testing new ideas.
- I aim for at least 100k particles running smoothly (60fps) on my system. Would be nice to have 1M, but this will not be that easy on CPU version
- CPU only - I know that currently GPU implementations are better, but for the experiment I choose CPU only. Maybe in the second version I will rewrite it to OpenCL or OpenGL Compute Shaders.
- CPU version also gives a chance to experiment with the CPU to GPU buffer transfers.
- So far simple OpenGL 3.3+ renderer
What's Next
In the next article I will write about particle data and its container used in the system.
Notes and Links
Here is a bunch of links and resources that helped me (or will help) in the implementation:
- The Software Optimization Cookbook: High Performance Recipes for IA-32 Platforms, 2nd Edition, Intel Press; 2nd edition (December 2005) - Hard to get book, but I've won it on GDC Europe 2011 :)
- Video Game Optimization by Eric Preisz and Ben Garney
- Intel Creating a Particle System with Streaming SIMD Extensions - quite old, but very simple to understand tutorial.
- Building a Million-Particle System - for
- Particle Systems From the Ground Up by Matt Greer - great article for javascript and WebGL about particles
- Gamasutra Building an Advanced Particle System