As compared to before, there is a huge reduction in size - from 396 bytes down to 288. Some of this was due to discovering some of my own incompetence (there was an extra unused matrix object in one of the objects that composed the Entity3D) and some due to actual design changes. I suppose this is a good advertisement for checking the resulting layout of your objects - to find extra matrices that don't need to be there!
The design changes show a general refactoring of the objects contents into separate classes. All of the scale, rotation, and position (and their matrices) have been refactored into a Transform3D object. The rendering related objects are now part of the Renderable class. The respective member functions have also been moved accordingly. This type of refactoring helps to consolidate the content, and it also makes it easier to include this same functionality in another class by simply including those new classes.
That leads to the other big change that I have made. Previously the Node3D class was a sub-class of Entity3D, which made both classes virtual (and hence used a vtable). This is less than optimal since it messes with the cache, it makes every single Entity3D and Node3D bigger than it really needs to be (by one pointer) and it doesn't really buy you very much in either functionality or savings. So I decided to split the inheritance hierarchy, and just make Entity3D and Node3D their own standalone classes.
The refactored objects I described above made it pretty easy to build a Node3D without inheriting its functionality. The only real hiccup was that the controller system had to be made template based, but that wasn't really an issue. Overall it was a pretty easy transition, and now the Node3D has a better defined purpose - to provide the links between objects in the scene graph. I'm pretty happy with the change so far...
In the end, there is some objects which were simply removed from the scene graph objects. This is primarily the bounding spheres, which will be relocated into the composite shapes object. That work is still under way, so I'm sure I'll write more about it once it is ready!
The composite shapes object should be, ideally, a pointer or index into a separate array entirely. Such object should have a handle back to this object, and this would allow you to trivially ensure that your culling data is trivially iterable in a cache friendly manner without carrying along a lot of other baggage that you don't care about.
Other things that might end up getting moved around, if you do refactor with that in mind, is the transform3d object, which will likely need to be included (or at least the position and rotation for calculating bounding boxes) in the composite shape object for culling purposes.