In my previous post about Soil Library I have talked about adding some new features. One of them was improving mipmap generation by simply using glGenerateMipmap(EXT) function. In this post I am going to describe changes needed to be made to implement it and gained benefits.
To be short: For NPOT sizes I get around 4x faster texture loading and 2x smaller memory consumption. For POT size 2x faster times (no memory difference).
Code here: SOIL_ext @ github, changed soil.c
The problem
Add ability to use
glGenerateMipmap
in the SOIL library. Old functionality - the custom software solution for mipmap generation - will be (and should be) left unchanged. The new generation method can be used when passing new flag called SOIL_FLAG_GL_MIPMAPS. For desktop OpenGL this should be much faster than the original SOIL method. It can be hardware accelerated and it will work for NPOT textures. When using standard SOIL_FLAG_MIPMAPS SOIL rescales image to be POT and then creates mipmaps. All of that happens in custom code - CPU side.
Another assumption:
Since the lib is small I do not want to introduce GLEW or other extension loading libraries. Extension loading will be done manually.
Since the lib is small I do not want to introduce GLEW or other extension loading libraries. Extension loading will be done manually.
Desired usage:
texID = SOIL_load_OGL_texture("test.jpg",
SOIL_LOAD_AUTO,
SOIL_CREATE_NEW_ID,
SOIL_FLAGS_GL_MIPMAPS); // <<
The solution
Since there is no GL_EXT_mipmap extension we need to find where our desired function is placed. The easiest way to do that is to download latest version of glext.h and search for glGenerateMipmap. We will find two version:
- glGenerateMipmap - in OpenGL 3.0 core or in GL_ARB_framebuffer_object
- glGenerateMipmapEXT - in GL_EXT_framebuffer_object
The code will try to find the first one if not then the second function pointer will be obtained. If both test fail then we will use same functionality as SOIL_FLAG_MIPMAPS (fallback).
There is no need to load all functions from extension actually, only one is essential. First the code below should be added:
// soil.c
typedef void (APIENTRY *P_PFNGLGENERATEMIPMAPPROC)(GLenum target);
static P_PFNGLGENERATEMIPMAPPROC soilGlGenerateMipmap = NULL;
Then the code for loading/checking: static int has_gen_mipmap_capability = SOIL_CAPABILITY_UNKNOWN;
static int query_gen_mipmap_capability( void );
The above example adds function declaration (we can find the proper declaration in the glext.h) and then the actual function pointer. The last line is a function that has to be invoked some time in the code to load and check the extension. This should be done only in the first time.
Query extension
Let us go inside
query_gen_mipmap_capability()
:int query_gen_mipmap_capability( void )
{
/* check for the capability */
P_PFNGLGENERATEMIPMAPPROC ext_addr = NULL;
if( has_gen_mipmap_capability == SOIL_CAPABILITY_UNKNOWN )
{
// instead of checking "GL_ARB_framebuffer_object" or
// "GL_EXT_framebuffer_object"
// we simply test the function pointer
ext_addr =
(P_PFNGLGENERATEMIPMAPPROC)
soilLoadProcAddr("glGenerateMipmap");
if(ext_addr == NULL)
{
ext_addr =
(P_PFNGLGENERATEMIPMAPPROC)
soilLoadProcAddr("glGenerateMipmapEXT");
}
if(ext_addr == NULL)
{
/* not there, flag the failure */
has_gen_mipmap_capability = SOIL_CAPABILITY_NONE;
} else
{
/* it's there! */
has_gen_mipmap_capability = SOIL_CAPABILITY_PRESENT;
soilGlGenerateMipmap = ext_addr;
}
}
return has_gen_mipmap_capability;
}
The code is quite simple. It basically checks if our function pointer is available in the system. We could check availability of the extension first but our method should be equally safe. Usually SOIL is called after all OpenGL extension setup so our extension for GL_ARB_framebuffer_object should be already checked.
Let us go to the
soilLoadProc
function:void *soilLoadProcAddr(const char *procName)
{
#ifdef WIN32
PROC p = wglGetProcAddress(procName);
if (soilTestWinProcPointer(p))
return p;
else
return NULL;
#elif defined(__APPLE__) || defined(__APPLE_CC__)
// apple specific..
#elif defined ( linux ) || defined( __linux__ )
#if !defined(GLX_VERSION_1_4)
return glXGetProcAddressARB((const GLubyte *)procName);
#else
return glXGetProcAddress((const GLubyte *)procName);
#endif
#else
return NULL; // unsupported platform
#endif
}
Interesting function
soilTestWinProcPointer
:#ifdef WIN32
static int soilTestWinProcPointer(const PROC pTest)
{
ptrdiff_t iTest;
if(!pTest) return 0;
iTest = (ptrdiff_t)pTest;
if(iTest == 1 || iTest == 2 || iTest == 3 || iTest == -1) return 0;
return 1;
}
#endif
It appears that we cannot assume that wglGetProcAddress returns NULL or a proper pointer. We need to perform more testing (for 1, 2, 3 and -1).
Usage
Now we can use our loading code in SOIL texture loading function. This will happen in
SOIL_internal_create_OGL_texture
:if( flags & SOIL_FLAG_MIPMAPS || flags & SOIL_FLAG_GL_MIPMAPS)
{
...
}
In the
if
statement we just need to write:if ((flags & SOIL_FLAG_GL_MIPMAPS) &&
query_gen_mipmap_capability() == SOIL_CAPABILITY_PRESENT)
{
soilGlGenerateMipmap(opengl_texture_target);
}
else
{
// old functionality...
}
Benefits
In the introduction I used catchy phrases like "4x speedup" or "2x lower memory consumption". Let me explain where those results may come from.
Memory consumption
For POT size there will be no difference of course. New method will create exactly the same number of levels as the SOIL way. But for NPOT size situation changes. Let us take simple case:
For POT size there will be no difference of course. New method will create exactly the same number of levels as the SOIL way. But for NPOT size situation changes. Let us take simple case:
- Image 540x600 RGB8 - memory needed 540*600*3 bytes = ~950kb
- This image will have mipmaps: 270x300, 135x150, 67x75, 33x37, 16x18, 8x9, 4x4, 2x2, 1x1 - 10 levels (including original image).
- In total we will need around 1265 kb. (33% more than with no mipmaps of course)
- When we use SOIL method, first we need to rescale image to be POT - new size is 1024x1024! This is 3072kb!
- Mipmaps: 512x512, 256x256, 128x128, 64x64, 32x32, 16x16, 8x8, 4x4, 2x2, 1x1. In total we will have 11 levels! (one more then NPOT).
- Total memory: around 4095kb! As we see it is even 3x larger than NPOT.
The difference is of course bigger when input size is a little bit larger then some POT size. If the input size is only a little bit smaller then some POT size the difference is small. As mentioned before, for POT size there is no difference (no need to scale the texture).
Performance
The first gain comes from smaller number of pixels to process when we use NPOT textures.
The second comes from internal optimization, possibility to use hardware accelerated scaling and lower cost of driver calls (one call to glGenerateMipmap vs several calls to glTexImage).
The first gain comes from smaller number of pixels to process when we use NPOT textures.
The second comes from internal optimization, possibility to use hardware accelerated scaling and lower cost of driver calls (one call to glGenerateMipmap vs several calls to glTexImage).
- Image 540x600 RGB jpeg: 50 loads:
- 0.5s vs 3.5s
- 62MB vs 200MB (total memory for 50 textures)
- Image 1024x1024 RGB jpeg: 50 loads:
- 1.1s vs 3.1s
- memory 200MB in both cases of course
Although we load textures usually in init phase and thus there is no need to fight for the performance at all cost I think it is important to know that by a simple improvement we can get nice speed-up. It will be significant for scenarios where we dynamically load textures through the game. Or when we load all directory of photos to display them in some gallery. User should see results as soon as possible.
Beside all things: it was quite interesting experience for me :) I dig into code and I had to verify my initial thoughts :)
Notes
- g-truc: OpenGL tip: Generate mipmaps
- OpenGL Spec - glGenerateMipmap
- OpenGL ES2 Spec - glGenerateMipmap
- glGenerateMipmap opengl wiki
One more links to the code SOIL_ext @ github, changed soil.c