Modelling Models: Enter the Real World
In a previous post, we talked about how we can use low cost physical models to generate digital models of their real world counterparts. This is particularly beneficial when it comes to hard to access objects (such as tanks), but can we rely on the outputs?
Fortunately we were able to attend the largest ever UK defence AI trial conducted across land, sea and air - Wintermute 3. There we were able to gather close up imagery of a range of interesting objects, including a T72.
A T72 in its natural habitat, on the docks.
Walking around the tank with a camera, we can extract frames and create a NeRF in the same way as we did before:
We happen to have a NeRF of the T72 physical model already made for comparison:
We can then compare real life views from the walkaround to our model. There are obvious differences in colour, barrel position (I used an old scan), and “accessories” such as the fuel tanks, but we can recreate the general look of the object well. Note that the NeRF from the real model has a bit of blue sky ghosting around the features; this could be cleaned up in post processing.
Left: Still from the base video. Centre: NeRF view from the base video. Right: NeRF view from the model.
We can also compare both of our models against an image taken from the same camera, but from a different perspective to the video. Again, we can recreate the view well (well, good enough given my poor control in a 3D environment).
Left: Still from a different video from the same camera. Centre: NeRF view from the base video. Right: NeRF view from the model.
We can also recreate entirely different viewpoints, such as from a UAV:
Left: Still from an aerial video. Centre: NeRF view from the base video. Right: NeRF view from the model.
Neither is a perfect recreation if you look close, but they both give a reasonable fascimile to the real thing. Generating a 3D model from real data gives us a good baseline, but we’re limited to the geometry we captured it in. The physical model can be adjusted and rescanned, giving more flexibility.
At this point you might ask ‘why not use 3d computer models instead?’ which is a fair question. In our experience, computer models still leave that sim-to-real gap where they’re just a little too perfect, and artefacts introduced through the sensor system don’t exist. NeRFs (and other approaches) present an interesting halfway house for novel view synthesis - they use real data, warts and all, but are much more flexible than simple image augmentation approaches like cut mix.
The next question is, does that actually help when training vision models? Let’s find out.