Stage3D Online Conference Slides

It was really awesome to be invited to talk about Minko today during the Stage3D online conference organized by Sergey Gonchar. He has done an excellent job in organizing this and I hope people enjoyed attending it as much as I enjoyed being a part of it.

minko_file_formats_comparison minko_editor_workflow minko_editor_triggers minko_darksider

You can watch the entire conference here.

As I promised, here are the slides to this presentation. They are pretty heavy because they embed some videos. Here is the outline of the content of the presentation:

  • Community SDK
    • Scripting
    • Shaders and GPU programming
    • Scene editor
  • Professional SDK
    • Physics
    • Optimizations for the web and mobile devices

At the end of the presentation, I also demonstrated how Minko can load and display Crytek’s Sponza smoothly on the iPad and the Nexus 7 in just a few minutes of work thanks to the editor and the optimizations granted by the MK format. You will soon here more about this very demonstration wiht a clean video demonstrating the app. but also the publishing process. This is incredibly cool since Sponza is quite a big scene with more than 50 textures including normal maps, alpha maps and specular maps for a total of 200+MB (only 70MB when published to MK).

Don’t forget to have a look at all the online resources for Minko:

As stated in the presentation, Minko’s editor public beta should start next week. So stay tuned!

New video: normal mapping and specular maps

Normal mapping has been available for a long time inside Minko. At first as part of the minko-lighting plugin and now in the core framework (in the dev branch only for now). The support for specular maps was added recently. Combining normal mapping and specular maps really gives good results. To show how far you can get, here is a little video demonstrating both techniques in the Minko editor:

New Documentation for Collada

Minko’s Collada plugin makes it possible to easily import Collada (*.dae) files using the assets loading API. Working with the API is quite simple and there already is a detailed tutorial about that. But the Collada format has its flaws and it is sometimes very complicated to get your exported files to work properly.

That’s why I’ve compiled a new documentation article that explains how to properly export Collada files from 3D Studio Max:

Export Collada files from 3D Studio Max on Aerys Hub

The article details the export procedure itself but also provides guidelines to make sure the exported file will display properly with Minko. It also give details about the supported features, including material, material properties and how the engine will use them. Similar articles will be released for different editors and formats according to the community’s needs.

Of course, as soon as the Minko editor and the MK format will be available we will strongly discourage the use of Collada files in production for many reasons (mainly performances). But you will still need to import those Collada files into the editor first… So here is a little video to show how simple importing assets is with the editor:

Yes: it’s as simple as a drag’n’drop! You will also notice that the textures load automatically and that the editor will let you play the animations and make sure everything works out of the box. The editor will be available in open beta in March…

Anamorphic Lens Flare

Update: I’ve just pushed a new SWF with a much better enhanced effect. I’ve tweaked things like the number of vertical/horizontal blur passes – which are now up to 3/6 – but also the flares’ brightness, contrast and dirt texture. I think it looks way better now!

Tonight’s experiment was focused on post-processing. My goal was to implement a simple anamorphic lens flare post-processing effect using Minko. It was actually quite simple to do. Here is the result:

minko_anamorphic_lens_flare_vipermarkII_2

The 1st pass applies a luminance threshold filter:

Then I use a multipass Gaussian blur with 4 passes: 3 horizontal passes and 1 vertical passes. The trick is to apply those 5 passes (1 luminance threshold pass + 4 blur passes) on a texture which is a lot taller than wide (32×1024 in this case). This way, everything gets streched when the flare are composited with the rest of the backbuffer.

JIT Shaders For Better Performance

The subject is really vast and complex and I’ve been trying to write an article about this for quite some time now. Recently, I made a small patch to enhance this technique and I thought it was a good occasion to try to summarize how it works and the benefits of it. In order to talk about this new enhancement, I would like to draw the big picture first.

The Problem

That might look like a complicated post title… but this is rather complex than really complicated. Here is how it starts: rendering a 3D object require to execute a graphics rendering program – or “shader” – on the GPU. To make it simple, let’s just say this program will compute the final color of each pixel on the screen. Thus, the operations performed by this shader will vary according to how you want your object to look like. For example rendering with a solid flat color requires different operations than rendering with per-pixel lighting.

Any programming beginner will understand that such program will test conditions – for example whether to use lighting or not – and perform some operations according to the result of this test. Yes: that’s pretty much exactly what an “if” statement is. It might look like programming trivia to you. And it would be if this program was not meant to be executed on the GPU…

You see, the GPU does not like branching. Not one bit (literally)! For the sake of parallelization, the GPU expects the program to have a fixed number of operations. This is the only efficient way to ensure computations can be distributed over a large number of pipelines without having to care too much about their synchronization. Thus, the GPU does not know branching and each program has a fixed number of instructions that will always be executed in the same order.

Conclusion: shader programs cannot use “if” statements. And of course, loops are out of the game too since they are pretty much pimped out “if” statements. Can you imagine what such logic would imply on your daily programming tasks? If you simply try to, you will quickly understand that instead of writing one program that can handle many different situations you will have to write many different programs that will handle a single situation. And then manually choose which one should be launched according to your initial setup…

Workarounds…

Mutables

The simplest workaround is to find “some way” to make sure useless computations do not affect the actual rendering operations. For example, you can “virtually disable” lighting by setting all lights diffuse/specular components to 0.

As you can imagine, this is really a suboptimal option. Performance wise, it’s actually the worst possible idea: a lot of computations happen and most of them are likely to be useless in most cases.

If/else shader intrinsic instructions

After a few years, shaders evolved and featured more and more instructions. Those instructions are now usable through higher level languages such as CG or GLSL. Those languages feature “if” statements (and even loops too). How are they compiled into shader code that can run on a GPU? Do they overcome the challenges implied by parallelization?

No. They actually fit in in a very straight forward and simple way. As a shader program must feature a single fixed list of instructions, the two parts of a if/else statement will both be executed. The hardware will then decide which one should be muted according to the actual result of the test performed by the conditional instructions.

The bright side is that you can use this technique to have a single shader program that handles multiple scenarios. The dark side is that this shader is still very inefficient and might eventually break the limit number of instructions for a single program. On some older hardware, the corresponding hardware instructions simply do not exist…

So even this “brand new” feature that will be introduced in Flash 11.7 and its “extended” profile is far from sufficient.

Pre-compilation

Some engines will use high level shader programming languages (like CG or GLSL) and a pre-compilation workflow to generate all the possible outcomes. Then, the right shader is loaded at runtime according to the rendering setup. This is the case of the Source Engine, created by Valve and used in famous games like Half Life 2, Team Fortress 2 or Portal.

This solution is efficient performance wise: there is always a shader that will do exactly and strictly the required operations according to the rendering setup. Plus it does not have to rely on some hardware features availability. But pre-compilation implies a very heavy and inefficient assets workflow.

Minko’s Solution

We’ve seen the common workarounds and each of them has very strong cons. The most robust implementation seems to be the pre-compilation option despite the obvious workflow issues. Especially when we’re talking web/mobile applications! But the past 10 years have seen the rise of a technique that could solve this problem: Just In Time (JIT) compilation. This technique is mostly used by Virtual Machines – such as the JVM (Java Virtual Machine), the AVM2 (Actionscript Virual Machine) or V8 (Chrome’s JavaScript virtual machine). It’s purpose is to compile the virtual machine bytecode into actual machine opcodes at runtime in order to get better performances.

How would the same principle apply to shaders? If you consider your application as the VM and your shader code as this VM execution language, then it all falls into place! Indeed, your 3D application could simply compile some higher level language shader code into actual machine shader code according to the available data. For example, some shader might compile differently according to whether lighting is enabled or not or even according to the number of lights.

With Minko, we tried to keep it as simple as possible. Therefore, we worked very hard to find a way to be able to write shaders using AS3. As the figure above explains, the AS3 shader code you write is not executed on the GPU (because that’s simply not possible). Instead, the application acts as a Virtual Machine and as it gets executed at runtime, this AS3 shader code transparently generates what we call an Abstract Shader Graph (ASGs). You can see it as an Abstract Syntax Tree for shaders (you can even ask Minko to output ASGs in the consoleas they get generated using a debug flag). This ASG in then optimized and compiled into actual shader code for the GPU.

For example: everytime you call the add() method in your AS3 shader code, it will create a corresponding ASG node. This very node will be linked with the rest of the ASG as you use it in other operations until it is finally used as the result of the shader. This result node becomes the “entry point” of the ASG.

Here is what a very simple ASG that just handles a solid flat color rendering looks like:

Here is what a (complicated) ASG that handle multiple lights looks like:

Your AS3 shader code is executed at runtime on the CPU to generate this ASG that will be compiled into actual shader code that will run on the GPU (in the case of Flash it will actually output AGAL bytecode that will be translated into shader machine code by the Flash Player). As such, you can easily perform “if” statements that will shape the ASG. You can even use loops, functions and OOP! You just have to make sure the shader is re-evaluated anytime the output might be different (for example when the condition tested in a “if” changes). But that’s for another time…

Using JIT shaders, Minko can efficiently dynamically compile shaders shaped by the actual rendering settings occuring at runtime. Thus, it combines the high performance of a pre-compilation solution while leveraging all the flexibility of JIT compilation. In my next articles, I will explain how JIT shaders compilation can be efficiently automated and how multi-pass rendering can also be made more efficient thanks to this approach.

If you have questions, hit the comments or post in the Minko category on Aerys Answers!

Teasing: Simple Minko Physics Stacking Stress Test

Minko physics is probably one of the most awaited features for this new year. I will not cover it extensively right now – I’ll rather post a few demos in a later post – but if you must know it was designed to be the fastest 3D physics engine for ActionScript 3 and the Flash platform.

I will share extensive details about this new physics engine during the Stage3D online conference in February. Make sure you attend: it’s online and it’s free! 🙂

The Demo

Anyway, I just wanted to tease the community with one of the stress tests we’re using here to benchmark the engine. It’s a really simple test and it uses only a very very small subset of the available features.

Here you go (just press “space” to throw some balls):

Read more about performances after the jump…

3D Matrices Update Optimization

4×4 matrices are the heart of any 3D engine as far as math is concerned. And in any engine, how those matrices are computed and made available through the API are two critical points regarding both performances and ease of development. Minko was quite generous regarding the second point, making it easy and simple to access and watch local to world (and world to local) matrices on any scene node. Yet, the update strategy of those matrices was.. naïve, to say the least.

TL;DR

There is a new 3D transforms API available in the dev branch that provides a 50000% 25000% boost on scene nodes’ matrices update in the best cases, making it possible to display 50x 25x more animated objects. You can read more about the changes on Answers.

Continue reading 3D Matrices Update Optimization

Minko Weekly Roundup #4

Features

  • The JointsDebugController makes it possible to display the joints and bones of a skinned mesh in order to check if everything works as expected.
  • The VertexPositionDebugController will display the position of each vertex of a mesh.
  • The VertexNormalDebugController will display the normal of each vertex. You can use it with the VertexNormalShader to display and debug the normals of a mesh.

Examples

Answers

Tutorials

Fixes

  • minko-collada will now load the vertex RGBA data when it’s available
  • min/max computation is always possible upon creation of a VertexStream regardless of its StreamUsage
  • frustum culling will now also test the center of the bounding box and not only the 8 corners

Minko Weekly Roundup #3

Projects

Wieliczka Salt Mine website and apps

Created by the Polish agency GoldenSubmarine, the Wieliczka Salt Mine web site offers an interative experience with HD 360° panoramas built with Minko. Thanks to Adobe AIR, the application is also available on mobile platforms such as the iPad, the iPhone and Android.

This awesome project was just rewarded with the “FWA Mobile of the Day” award! This is the very first FWA for a project built with Minko so we are very proud of course!

Videos

This video presents yet another example of a very cool shader built by developer Jeremy Sellam from the Les Chinois, a French digital agency based in Paris, France. It was built with the public beta of the ShaderLab of course!

Features

ByteArray streams

Geometry is now stored in ByteArray objects. It reduces the memory consumption and should provide a significant performances boost. Especially on mobile devices. You can read more about this feature on my previous blog post.

Parametric frustum culling

Minko now implements frustum culling in a very flexible fashion. You can chose on what planes and using what volume (sphere or box) frustum culling is computed. For example, to use the bounding sphere on all planes but the box only on the near/far planes, you can write:

Such flexibility should make it possible to optimize performances to every use case. It’s also nice to consider computations are kept to a minimum by storing world space versions of the bounding sphere/box. You can view the entire code for this feature in the VisibilityController.

Per-triangle ray casting

Geometry.cast() now makes it possible to get the triangle ID (the first indice of the triangle) that crosses the specified ray. Combined with Mesh.cast() and Group.cast(), it makes it very easy to perform ray-casting at any level, from the root of the scene to the geometry level.

Local/world gizmos in Minko Studio

Gizmos are the key to any WYSIWYG editor. We’ve enhanced Minko Studio’s gizmos not only by adding scale/rotation gizmos but also the possiblity to chose whether those gizmos should be used in local or world space.

Smooth shadows in Minko Studio

Shadows are a very important part of any real-time 3D immersive application. Thus, being able to control the quality of the shadows while keeping interactive framerates is very important. The minko-lighting plugin already introduced soft shadows a few weeks ago, and now the corresponding options are available right inside Minko Studio! It’s now easier than ever to fine tune the lighting engine in order to get the best performance/quality ratio.

Documentation

Minko now has a new, clean and fast growing wiki: the developers Hub. You will find a lots a old and new tutorials. But also lots a new examples and projects done with Minko!

Fixes

  • Geometry.fillNormalsData() and Geometry.fillTangentsData will reset the corresponding buffers to make sure normals/tangents are not incremented everytime they are supposed to be updated.
  • Picking has been completely refactored: all the known bugs have been fixed and we are now using a 1*1 BitmapData/scissor rectangle to get better performances.
  • The Collada loader will now inspect 3D transforms in order to handle negative scales properly. It will also invert the z axis at the vertex level in order to perform right to left handed coordinates conversion. It fixes problems occuring when loading animations build using symmetry.

Don’t forget to “watch” Minko’s github repository to get your daily dose of new features and fixes!

New Minko Feature: ByteArray Streams

I’ve just pushed on github my work for the past few weeks and it’s a major update. But most of you should not
have to change a single line of code in the best case. The two major changes are the activation of
frustum culling – who now works perfectly well – and the use of ByteArray objectst to store vertex/index
streams data.

Using ByteArray instead of Vector, why are we doing this?

As you might now, Number is the equivalent of the “double” data type and as such they are stored on
64bits. As 32bits is all a GPU can handle regarding vertex data it is a big waste of RAM. Using ByteArray
makes it possible to store floats as floats and avoid any memory waste
. The same goes with indices stored
in uint when they are actually shorts.

Another important optimization is the GPU upload. Using Number of uint requires the Flash player to
re-interpret every value before upload: each 64bits Number has to be turned into a 32bits float, each
32bit uint has to be turned into a 16bits short. This process is slow by itself, but it also prevent
the Flash player to simply memcopy the buffers into the GPU data. Thus, using ByteArray should really
speed up the upload of the streams data to the GPU
and make it as fast as possible. This difference should be even bigger on low-end and
mobile devices.

Finally, it also makes it a lot faster to load external assets because it is now possible to memcopy
chunk of binary files directly into vertex/index streams. It should also prove to be very very useful
for a few exclusive – and quite honestly truly incredible – features we will add in the next few months.

What does it change for you ?

If you’ve never been playing around with the vertex/index streams raw data, it should not change a single
thing in your code
. For example, iterators such as VertexIterator and TriangleIterator will keep working just the way
they did. A good example of this is the TerrainExample, who runs just fine without a single change.

If you are relying on VertexStream.lock() or IndexStream.lock(), you will find that those methods now
return a ByteArray instead of a Vector. You should update you code accordingly. If you want to see a good example of ByteArray manipulations for streams, you can read the code of the Geometry.fillNormalsData() and Geometry.fillTangentsData() methods.

What’s next?

This and some recent additions should make it much easier to keep streams data in the RAM without wasting too much memory and be able to restore it on context device loss. It’s not implemented yet but it’s a good first step on this complicated memory management path.

Another possible feature would be to store streams data in compressed ByteArray. As LZMA compression is now available, it could save a lot of memory. The only price to pay would be to have to uncompress the data before being able to read/write it.