DreamGaussian: Turn Text/Images Into 3D Objects!
– Speed and Efficiency: Rapid conversion from images to usable 3D models.
– Quality in Simplicity: Best results when handling everyday, uncomplicated objects.
– Difficulty with Complexity: Lesser performance with intricate or unusual structures.
– Practical Output: Generates mesh formats ideal for immediate application in various digital projects.
Since the launch of image generating models, people have been creating incredible images from intricate cityscapes to whimsical fantasy terrains. However, one domain where machine learning has yet to make its indelible mark is the realm of 3D models. It seems that is about to change as one of the most advanced image-to-3D and text-to-3D models has been released. DreamGaussian is a new 3D content generation framework bourn out of a collaboration between Peking university, Nanyang Technical University and Baidu Inc, that aims to help 3D modellers, animators and diverse array of professionals who might require efficient and high quality models 3D models.
In the past, image-to-3D model technologies faced a major challenge: they either sacrificed quality for speed or offered high-fidelity models at the cost of extensive generation times. What DreamGaussian does is meet in the sweet spot. The framework generates high quality images but does that in a short span of time, averaging around two minutes to generate a 3D model from an image or a text prompt. Which is about 10 times faster than its predecessor that relied on Neural Radiance Fields (NeRF).
DreamGaussian Technical Details
If you have been following the world of 3D content generation, then NeRF stands out. Neural Radiance Fields have been a promising method to create 3D content. However, they come with a significant drawback: the extensive computational power and time required for rendering. NeRF operates by densely sampling points within a scene, utilising deep neural networks to predict the colour and opacity at each point, a process that, while effective, is notoriously demanding on resources.
DreamGaussian employs what’s known as Generative 3D Gaussian Splatting. This method simplifies the representation of 3D points by using 3D Gaussian functions, reducing the number of parameters the model needs to learn. This not only makes the model more efficient but also significantly speeds up the generation process.
To put it in more simple terms, ‘splatting’ is like throwing paint randomly on a canvas, but each splat (or dot of paint) is shaped like this bell curve. In 3D modelling, Gaussian splatting is used to create points in a 3D space that collectively form an object. It’s a bit like 3D pointillism, where you’re using dots to create an image, but here, the ‘dots’ can also have depth and form 3D objects.
DreamGaussian is a much more simpler approach compared to NeRF models such as DreamFusion. For a text-to-3D model, DreamGaussian takes approximately five minutes versus ~ one hour for DreamFusion according to their research paper. The simplicity is one of the reasons why DreamGaussian is so much faster. If like me, you are interested in the technical details, I highly suggest checking out their research paper titled ‘DREAMGAUSSIAN: GENERATIVE GAUSSIAN SPLATTING FOR EFFICIENT 3D CONTENT CREATION‘.
Text & Images To 3D Models
From my initial testing of DreamGaussian framework, it does a great job of creating 3D models which are tailored in a certain way. I gave it an image of a skyscraper which I created using DALL-E 3 (though the building was in a weird shape), it wasn’t able to generate a good 3D model for that. Then I tried using an image of Burj Khalifa, the tallest building in the world, which it again failed to do a great job. But what I have seen it do great, is pictures of everyday objects and that too if they are simple. I’m assuming this is because of their training data.
While the DreamGaussian framework produces competent results, particularly with its convenient mesh output ready for integration into game engines or modelling programs, it excels primarily with simpler objects. If you want to run the model yourself, you can head over to the Github page, or if you just want to test it out, head over to Huggingface spaces.
Transforming a text prompt or a picture to a lifelike 3D model was the dream of animators or game object designers, as it takes an enormous time and resources to create 3D objects. But it seems with the power of machine learning, things are going to take a leap forward.
Interested in Learning More?
Check out our comprehensive courses to take your knowledge to the next level!
Browse Courses