Wednesday, June 26, 2013

ETC1 textures loading and alpha handling with Android NDK



 In this post I will describe how to work with ETC1 textures.

 What are compressed textures good for?

 You are probably familiar with image formats like .png and .jpg. The first one employs lossless compression while the second uses lossy compression. Usually the .jpg files are smaller for the same image for price of decreased quality.
 Regardless the way the image is stored compressed on the disk, when you load and extract it into memory it bloats to its uncompressed size. For RGBA 8888 it is 32 bits per each one pixel. So when your 1024x1024 image as a texture may eat 4MB of precious video memory
  Beside these well known image formats there are also formats that compress the texture and it stays compressed on GPU. These formats can drastically reduce number of bits required for each pixel.
 Unfortunately, when coding for Android these formats are vendor specific. Currently there are four of them: ETC1, PVRTC, ATITC, S3TC. See more here.

 Only ETC1 is available on all Android devices so the rest of the article is about this format. But it lacks one important thing - it does not support textures with alpha. Fortunately, there are ways how to bypass this. As mentioned before, these format means that texture stays compressed on GPU being decompresssed on the fly which will reduce texture space used as well as increase performance of your openGL game (reduced data bandwidth).

 Another atractiviy of these texture formats comes from fact that you do not need to include any image unpacking library like libpng into your game. Nor you need to load images on the java side like I was describing in my older post (Load images under Android with NDK and JNI).

 The drawback is that textures in its compressed form are bigger than .jpg or .png files with the same image, so more disk space will be consumed. Further in the article I will also describe way how I solved it.


Creating ETC1 texture

 Every GPU vendor has its own set of tools including tool for compressing textures. I am using tool from ARM (load it here). After you run the tool you will get initial screen. Open image file with you texture (do not forget that open GL ES needs height and width to be power of 2) and you should get screen like this:

 Select the texture(s) in left panel and press "Compress" icon (). The compression parameters panel will pop up:


 Choose ETC1/ETC2 tab (1.) and select PKM as output format. PKM is very simple format that ads small header to compressed data. The header is this:

+0: 4 bytes header "PKM "
+4: 2 bytes version "10"
+6: 2 bytes data type (always zero)
+8: 2 bytes extended width
+10: 2 bytes extended height
+12: 2 bytes original width
+14: 2 bytes original height
+16: compressed texture data

 In ETC1 format each 4x4 pixel bloc is compressed into 64 bits. So the extended width and height are the original dimensions rounded up to multiple of four. If you are using power of 2 textures then the original and extended dimensions are the same.

 From these parameters you can calculate the size of compressed data like this:

(extended width / 4) * (extended height / 4) * 8

 This formula just says: there is so many 4x4 pixel blocks and each of them is 8 bytes long (64 bits).

 Parameters marked 2. and 5. on the picture will affect quality of compression. The compression takes quite a lot of time. So during development you can use worse quality if you do not want to wait. But be sure that when finishing your game you use maximum quality - the size of output remains the same.

 Under 3. do not forget to check that ETC1 is chosen and udder 4. choose to create separate texture for alpha channel. This texture will have the same dimensions as the original one. But in the red channel of it there will be stored alpha instead of color. The green and blue channels are unused so theoretically you can put any additional information there (but not with the tool - it would be up to you how to do it).


Loading ETC1 texture

  Now when you have the texture compressed it is time to load it into GPU.


//------------------------------------------------------------------------
u16 TextureETC1::swapBytes(u16 aData)
{
 return ((aData & 0x00FF) << 8) | ((aData & 0xFF00) >> 8);
}

//------------------------------------------------------------------------
void TextureETC1::construct(SBC::System::Collections::ByteBuffer& unpacked)
{
 // check if data is ETC1 PKM file - should start with text "PKM " (notice the space in the end)
 // read byte by byte to prevent endianness problems
 u8 header[4];
 header[0] = (u8) unpacked.getChar();
 header[1] = (u8) unpacked.getChar();
 header[2] = (u8) unpacked.getChar();
 header[3] = (u8) unpacked.getChar();

 if (header[0] != 'P' || header[1] != 'K' || header[2] != 'M' || header[3] != ' ')
  LOGE("data are not in valid PKM format");

 swapBytes is just help method the real work is done in construct method. ByteBuffer is our simple wrapper around array of bytes holding not only data but also its size. This is not important here it just increases readability.
 We start with header to check whether the input data are really PKM file.


 // read version - 2 bytes. Should be "10". Just skipping
 unpacked.getShort();
 // data type - always zero. Just skip
 unpacked.getShort();

 // sizes of texture follows: 4 shorts in big endian order
 u16 extWidth = swapBytes((u16) unpacked.getShort());
 u16 extHeight = swapBytes((u16) unpacked.getShort());
 u16 width = swapBytes((u16) unpacked.getShort());
 u16 height = swapBytes((u16) unpacked.getShort());

 // calculate size of data with formula (extWidth / 4) * (extHeight / 4) * 8
 u32 dataLength = ((extWidth >> 2) * (extHeight >> 2)) << 3;

 In next step we skip additional information on version and data type as we do not use them. Then dimensions are read and with swapByte it is converted from big endian to little endian. The size of compressed data is calculated with already mentioned formula.


  // openGL part
  // create and bind texture - all next texture ops will be related to it
  glGenTextures(1, &mTextureID);
  glBindTexture(GL_TEXTURE_2D, mTextureID);
  // load compressed data (skip 16 bytes of header)
  glCompressedTexImage2D(GL_TEXTURE_2D, 0, GL_ETC1_RGB8_OES,
    extWidth, extHeight, 0, dataLength, unpacked.getPositionPtr());

  // set texture parameters
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
  glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

  // save size
  mWidth = extWidth;
  mHeight = extHeight;
}

 Now it is time to create texture with OpenGL calls. We first allocate texture ID and then we sent data for it. Note that internal format is GL_ETC1_RGB8_OES. This says that we are using OES_compressed_ETC1_RGB8_texture extension that ads support for ETC1 textures.

 Now we can use this texture as any other loaded from .png.


Handling alpha with ETC1 textures

 As previously said, ETC1 does not support alpha. During texture creation time we exported alpha into separate texture. This texture has the same dimension as the one with colors. You can employ fragment shader to compose the final color containing alpha from these two textures.
 The simple fragment shader doing this may look like this:


#ifdef GL_ES
precision mediump float;
#endif

uniform lowp sampler2D u_map[2];
varying mediump vec2 v_texture;

void main(void)
{
  gl_FragColor = vec4(texture2D(u_map[0], v_texture).rgb, texture2D(u_map[1], v_texture).r);
}

 It took me less than 1 hour to rewrite parts of our game to support ETC1 with alpha in separate texture instead of previously used textures from .png files. And I got really sweet reward in increasing the frame rate by about 10% on my Samsung Galaxy Tab.



Compression to the power of two

 The last thing that I did not liked first was that ETC1 textures without alpha were 2 times bigger than .jpg files (in our prepared game Shards we are using .jpg for backgrounds - so no need for alpha for these). I finally solved this with additional compression of .pkm file with zlib library. It took me some time to find how to use it on Android, but you can read it here (Using zlib compression library in Adroid NDK) - it is really worth of it.
 Now I am almost on the size of .jpg file. When creating texture I first uncompress the file with zlib and the uncompressed .pkm is sent to previously described routine.


Conclusion

  ETC1 texture compression is the only one supported by all Android devices (having OpenGL ES 2.0 of course). It lacks alpha so small overhead is present when bypassing this. It pays as you are saving texture memory and increasing your frame rate - 10% in case of my Samsung Galaxy Tab.




Friday, June 21, 2013

Using zlib compression library in Adroid NDK



 Previous parts


 Just one day after writing article on Crunch the packing utility and way how to unpack data created with it I wandered how much overhead it would take to get into zlib library. It was very pleasant for me, that using it is very intuitive and also the implementation was without problems.


Android implementation


 Android NDK already contains library and header file in NDK. The header is in directory:
    \android-ndk-r8b\platforms\android-14\arch-arm\usr\include\zlib.h
and library is indirectory
    \android-ndk-r8b\platforms\android-14\arch-arm\usr\lib\libz.so

 To make it part of your NDK project you have to open Android.mk file and add the following lines into it (first one is just comment):
  #Zlib
  LOCAL_LDLIBS    += -lz

 Now it is enough to include header file "zlib.h" in the beginning of any source file where you want to use it.

 In my engine I build simple wrapper around it to simplify calling of compress / uncompress. I am again using the IPacker interface introduced in previous article. The target is to have unified interface for various packing / unpacking utilities and libraries. I just want to send it ByteBuffer (our class for wrapping array, its length and adding some utilities for working with it) and get another with result back. Just to remember the interface methods are like this:

 // compress data and returns pointer to heap allocated ByteBuffer with it
 virtual SBC::System::Collections::ByteBuffer* compress(u8* aUnpackedData, s32 aSize) = 0;
 virtual SBC::System::Collections::ByteBuffer* compress(
   SBC::System::Collections::ByteBuffer& aUnpackedData) = 0;
 // uncompress data and returns heap allocated ByteBuffer with it
 virtual SBC::System::Collections::ByteBuffer* uncompress(u8* aPackedData, s32 aSize) = 0;
 virtual SBC::System::Collections::ByteBuffer* uncompress(
   SBC::System::Collections::ByteBuffer& aPackedData) = 0;

 There are two things I found important about zlib:
 - it is not Zip, gzip or any other unpacking program (while these use algorithms from zlib heaving the same authors). Do not expect that it will open zip archives you created for example in WinZip program. Zlib has its own internal format,
 - when you compress piece of memory you get result compressed into another piece of memory without any information on how much bytes the original data had. It is up to you to handle it somehow if you do not want to allocate extra large space for uncompression that every file will fit in.


Compress


 Here is the implementaion of compress routines:

//------------------------------------------------------------------------
ByteBuffer* ZlibPacker::compress(u8* aUnpackedData, s32 aSize)
{
 ByteBuffer buffer;
 buffer.construct(aUnpackedData, aSize, false);

 return compress(buffer);
}

//------------------------------------------------------------------------
ByteBuffer* ZlibPacker::compress(ByteBuffer& aUnpackedData)
{
 // maximum length needed for compression
 unsigned long destLength = compressBound(aUnpackedData.getLimit());

 The first one just wraps raw data into ByteBuffer to pass it further (the false says that ByteBuffer is not owner of the data and will not delete it on destruction). In the second the actual work starts. First we call compressBound() method with length of data we want to compress. The method will return maximum length required for target buffer where packed data will be placed. In worst case when there is no compression in result the final size will be size of the original data plus some zlib overhead.

 // add 4 for header (to save original unpacked data length)
 u8* out = new u8[destLength + 4];
 s32 result = ::compress(out + 4, &destLength,
   aUnpackedData.getDataPtr(), aUnpackedData.getLimit());

 if (result != Z_OK)
 {
  switch(result)
  {
  case Z_MEM_ERROR:
   LOGE("note enough memory for compression");
   break;

  case Z_BUF_ERROR:
   LOGE("note enough room in buffer to compress the data");
   break;
  }
 }

 We allocate new buffer for compressed data. We add 4 bytes. This will be our "minimalistic" header keeping information on how big were the original data.
 Notice the double colon before compress() method. It is to force the compiler use non-class uncompress() method. If you omit it you will get compiler error as non compress methods within class takes 4 parameters.
 We put out + 4 as a first parameter, because we want to reserve first 4 bytes in the beginning to write original data size into it later.
 Also note that destLength is handed over like pointer. It is because the length of final compressed data will be handed back in it.

 // create new buffer and copy data into it
 ByteBuffer* packed = new ByteBuffer();
 packed->construct(destLength + 4, out, destLength + 4);

 // delete old buffer - not needed more
 delete [] out;

 Now the data are packed and its final packed length is known. As the zlib works well our initial maximal buffer is probably to big. Part of it is empty just eating memory. So we create new ByteBuffer with capacity needed just for compressed data and 4 bytes header. The ByteBuffer construct copies the requested amount of data into new ByteBuffer.
 We can now free the old (probably partly empty) ByteBuffer.

 // set position to beginning and save unpacked size
 packed->setPosition(0);
 packed->setInt((s32) aUnpackedData.getCapacity());
 packed->setPosition(0);

 return packed;
}

 In the end we will write original (unpacked) data length into the beginning and return the result.


Uncompress


The uncompression looks similarly:

//------------------------------------------------------------------------
ByteBuffer* ZlibPacker::uncompress(u8* aPackedData, s32 aSize)
{
 ByteBuffer buffer;
 buffer.construct(aPackedData, aSize, false);

 return uncompress(buffer);
}

//------------------------------------------------------------------------
ByteBuffer* ZlibPacker::uncompress(ByteBuffer& aPackedData)
{
 // read size of unpacked data
 unsigned long destLength = (u32) aPackedData.getInt();

 // create bytebuffer with sufficient capacity to hold unpacked data
 ByteBuffer* unpacked = new ByteBuffer();
 unpacked->construct(destLength);

 First we read length of uncompressed data. This is not part of zlib and it is up to you to keep it somewhere (or allocate buffer big enough to hold the biggest file you consider to process).

 s32 result = ::uncompress(unpacked->getDataPtr(), &destLength,
   aPackedData.getPositionPtr(), aPackedData.getLimit() - 4);

 if (result != Z_OK)
 {
  switch(result)
  {
  case Z_MEM_ERROR:
   LOGE("note enough memory for uncompression");
   break;

  case Z_BUF_ERROR:
   LOGE("note enough room in buffer to uncompress the data");
   break;

  case Z_DATA_ERROR:
   LOGE("compressed data corrupted or incomplete");
   break;
  }
 }

 Here we call the zlib uncompress() method. The third parameter "aPackedData.getPositionPtr()" returns unsigned char* pointing just after the 4 byte header! Do not forget to skip any headers you added to point to raw zlib data. The position within ByteBuffer automatically moved when we read the size in the very beginning of the method.

 unpacked->setLimit(aPackedData.getLimit() - 4);

 return unpacked;
}

 Finally we adjust some ByteBuffer internal variables and return the result.


Summary


 It is easy to make the zlib work in your project. Remember that it is not UnZip program so just putting some zips into your asset will not work. The zlib library also supports compressing and uncompressing in gzip (.gz format) format and ads gzip file access functions to easily work with it. It is still too fresh for me, so I will have to examine it in future.



Thursday, June 20, 2013

Crunch - very lightweight unpacker used in our Android and bada games



Previous parts

UPDATE: next day after writing this article I explored zlib compression library and found it very easy to implement. Read how to do it in "Using zlib compression library in Adroid NDK". My own unpacking utility, I spent lot of time with, described further down still works within our engine and as it is very small and fast in decompression I still may have use for it in projects without zlib. Teaching from this for me: "think twice, cut once".


 Here is next article on mobile games programming topics I came across when building our small games. This time it is about way how to minimize size of your game assets on the disk and how to quickly unpack them without need of Java code part or without need of any big library.

 Common way is to pack or compress the assets to the smallest possible size to reduce space consumed on target device. There are lossy compression formats like .jpg for pictures but what if lossless compression is needed? The possible solution in Android NDK development is to use zlib library. But it brings some overhead before you get familiar with it.


Crunch - the packer

 Fortunately I found very old program by me written in basic that compresses the data and rewritten it into Java. The program runs very slowly, but gives good results so I am using it now for assets compression. It also has only 6 bytes header, thus it is very easy to work with it. The program names Crunch and you can download it here.

 The Crunch looks like this:


 You can add multiple files and delete them from list. You can also set number of "Window bits" which says how far back in data the matches are being searched and "Match bits" which says how long can the matches be.

 The program can only compress the data. The result starts with 6 bytes header:
+0: length of packed data (4 bytes)
+4: window size bits (1 byte)
+5: match size bits (1 byte)
+6: packed data


Unpacking data

 Now  when we have our data packed, we need to decompress them in the game. To do it we first create IPacker interface that will help us to add different packing algorithms or wrap some 3rd party libraries in future.

#ifndef IPACKER_H_
#define IPACKER_H_

#include "../../System/system.h"

namespace SBC
{
namespace Engine
{

class IPacker
{
public:
 virtual ~IPacker() {};

public:
 // encodes data and returns pointer to heap allocated ByteBuffer with it
 virtual SBC::System::Collections::ByteBuffer* encode(u8* aUnpackedData, s32 aSize) = 0;
 virtual SBC::System::Collections::ByteBuffer* encode(
   SBC::System::Collections::ByteBuffer& aUnpackedData) = 0;
 // decodes data and returns heap allocated ByteBuffer with it
 virtual SBC::System::Collections::ByteBuffer* decode(u8* aPackedData, s32 aSize) = 0;
 virtual SBC::System::Collections::ByteBuffer* decode(
   SBC::System::Collections::ByteBuffer& aPackedData) = 0;
};

} /* namespace Engine */
} /* namespace SBC */
#endif /* IPACKER_H_ */

 ByteBuffer is class that I use for convenient manipulation with byte data (unsigned char / u8). It allows reading of int, short or char as well as storing it into ByteBuffer.

 Next we will implement this interface in class SBCDepacker:

#ifndef SBCDEPACKER_H_
#define SBCDEPACKER_H_

#include "IPacker.h"

namespace SBC
{
namespace Engine
{

class SBCDepacker: public IPacker
{
public:
 SBCDepacker();
 virtual ~SBCDepacker();

public:
 // IPacker implementation
 // encodes data and returns pointer to heap allocated ByteBuffer with it
 virtual SBC::System::Collections::ByteBuffer* encode(u8* aUnpackedData, s32 aSize);
 virtual SBC::System::Collections::ByteBuffer* encode(SBC::System::Collections::ByteBuffer& aUnpackedData);
 // decodes data and returns heap allocated ByteBuffer with it
 virtual SBC::System::Collections::ByteBuffer* decode(u8* aPackedData, s32 aSize);
 virtual SBC::System::Collections::ByteBuffer* decode(SBC::System::Collections::ByteBuffer& aPackedData);

private:
 u32 readData(s32 aNumBits, SBC::System::Collections::ByteBuffer& aPackedData);

private:
 u8 mData;
 u8 mDataBitsRest;
};

} /* namespace Engine */
} /* namespace SBC */
#endif /* SBCDEPACKER_H_ */

 It just implements methods from interface and ads private method readData() that is able to read requested number of bits from array of bytes.

 Here is implementation of decode() and readData() methods. As we can only unpack the data, the implementation for encode() just writes error message to the output. All source files is possible to download here:

//------------------------------------------------------------------------
ByteBuffer* SBCDepacker::decode(ByteBuffer& aPackedData)
{
 // clear data
 mData = 0;
 mDataBitsRest = 0;

 // read header
 s32 length = aPackedData.getInt();
 s8 winBits = aPackedData.getChar();
 s8 matchBits = aPackedData.getChar();

 // create byte array for result
 ByteBuffer* unpacked = new ByteBuffer();
 unpacked->construct(length);
 u8* dest = unpacked->getDataPtr();

 // not packed - just read
 if (!winBits && !matchBits)
 {
  memcpy(dest, aPackedData.getPositionPtr(), length);
 }
 // if packed then decode
 else
 {
  while (length)
  {
   // read 1 bit and check whether it is packed or not
   u32 result = readData(1, aPackedData);

   // packed
   if (result)
   {
    s32 zpet = readData(winBits, aPackedData);
    s32 shoda = readData(matchBits, aPackedData);

    for (s32 i = shoda; i > 0; --i)
    {
     *(dest) = *(dest - zpet);
     ++ dest;
    }

    length -= shoda;
   }
   // unpacked
   else
   {
    result = readData(8, aPackedData);

    *(dest ++) = (u8) (result & 0xFF);
    -- length;
   }
  }
 }

 // set limit to capacity
 unpacked->setLimit(unpacked->getCapacity());
 
 return unpacked;
}

//------------------------------------------------------------------------
u32 SBCDepacker::readData(s32 aNumBits, ByteBuffer& aPackedData)
{
 u32 result = 0;

 // while not requested number of bits read
 while (aNumBits > 0)
 {
  // if out of bits read next byte
  if (!mDataBitsRest)
  {
   mData = (u8) aPackedData.getChar();
   mDataBitsRest = 8;
  }

  // how many bits to read (limited with remaining bits)
  u32 num = Math::min(mDataBitsRest, aNumBits);
  // create mask for given number of bits
  u32 mask = (1 << num) - 1;

  // shift previous result and add masked values
  result = (result << num) | (mData & mask);

  // adjust variables for next read
  aNumBits -= num;
  mDataBitsRest -= num;
  mData >>= num;
 }

 return result;
}


Conclusion

 That's all. With this simple class you can unpack data previously packed with Crunch utility like this:

 // load compressed file from disk
 ByteBuffer* compressed = aLoader->loadToByteBuffer(aIdx);

 // create unpacker and unpack data
 SBCDepacker sbcDepacker;
 ByteBuffer* unpacked = sbcDepacker.decode(*compressed);

 // delete compressed data - no more needed now as all data is in unpacked
 delete compressed;

  If you do not want or need to link whole zlib library or you want something really simple that works, then you may give a try to Crunch. In our games we are currently using it for packing ETC1 textures. These are packed itself but format is targeted to reduce GPU bandwidth not minimize disk space. It can be reduced almost to half of the size with Crunch so the disk space consumption is decreased significantly.