*This article was written by Timothy B. Lee.*

* *

New research from Google's UK-based DeepMind subsidiary demonstrates that deep neural networks have a remarkable capacity to understand a scene, represent it in a compact format, and then "imagine" what the same scene would look like from a perspective the network hasn't seen before.

Human beings are good at this. If shown a picture of a table with only the front three legs visible, most people know intuitively that the table probably has a fourth leg on the opposite side and that the wall behind the table is probably the same color as the parts they can see. With practice, we can learn to sketch the scene from another angle, taking into account perspective, shadow, and other visual effects.

A DeepMind team led by Ali Eslami and Danilo Rezende has developed software based on deep neural networks with these same capabilities—at least for simplified geometric scenes. Given a handful of "snapshots" of a virtual scene, the software—known as a generative query network (GQN)—uses a neural network to build a compact mathematical representation of that scene. It then uses that representation to render images of the room from new perspectives—perspectives the network hasn't seen before.

The researchers didn't hard-code any prior knowledge about the kind of environments they would be rendering into the GQN. Human beings are aided by years of experience looking at real-world objects. The DeepMind network develops its own similar intuition simply by examining a bunch of images from similar scenes.

"One of the most surprising results [was] when we saw it could do things like perspective and occlusion and lighting and shadows," Eslami told us in a Wednesday phone interview. "We know how to write renderers and graphics engines," he said. What's remarkable about DeepMind's software, however, is that the programmers didn't try to hard-code these laws of physics into the software. Instead, Eslami said, the software started with a blank slate that was able to "effectively discover these rules by looking at images."

It's the latest demonstration of the incredible versatility of deep neural networks. We already know how to use deep learning to classify images, win at Go, and even play Atari 2600 games. Now we know they have a remarkable capacity for reasoning about three-dimensional spaces.

**How DeepMind’s generative query network works**

Here's a simple schematic from DeepMind that helps provide an intuition about how the GQN is put together:

Under the hood, the GQN is really two different deep neural networks connected together. On the left, the representation network takes in a collection of images representing a scene (together with data about the camera location for each image) and condenses these images down to a compact mathematical representation (essentially a vector of numbers) of the scene as a whole.

Then it's the job of the generation network to reverse this process: starting with the vector representing the scene, accepting a camera location as input, and generating an image representing how the scene would look like from that angle.

Obviously, if the generation network is given a camera location corresponding to one of the input images, it should be able to reproduce the original input image. But this network can also be provided with other camera positions—positions for which the network has never seen a corresponding image. The GQN is able to produce images from these locations that closely match the "real" image that would be taken from the same location.

*To read the rest of the article, click here.*

** **

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central