We asked our Head of Tommy Labs, Farhad Agzamov about the future of Augmented Reality and how advances in technology might give rise to greater opportunities.
With augmented reality (AR) expected to reach 3.5 billion users by 2022, the future looks bright when it comes to adaptation by both users and the brands that seek to connect with them.
But what kind of innovations can we look forward to when it comes to expanding the possibilities of augmented reality? What could augmented reality look like in 2022?
I’ll seek to answer these questions by looking at technologies emerging over the next few years. This research will provide a glimpse of AR’s future, specifically new innovations in occlusion and depth estimation, the use of artificial intelligence, and the impact of 5G.
Occlusion and Depth Estimation
Current AR implementations are good at detecting flat surfaces, facial features, and other specific feature sets like pets and landmarks. However, the key thing missing is the ability to discern the depth of a scene and how the object should scale depending on the distance from the input video. These implementations also lack the ability to account for occlusion, as in whether a foreground and/or background object is blocking the AR effect.
Occlusion, depth perception, and depth estimation are mainly featured on hardware such as the Microsoft Hololens. The Hololens uses an array of ‘environment understanding’ sensors that allow hardware to spatial map the users’ environment allowing for fairly precise depth estimation and occlusion. However, the Hololens costs $3,000, and is aimed at enterprise applications rather than consumer devices.
Apple is the first company to implement LIDAR sensors in the refreshed line of consumer-focused iPads. LIDAR works by firing a pulsed laser beam which reflects off surfaces and bounces back, using the time of travel between transmission and response to generate a depth map. The LIDAR sensor thus can provide very precise depths within a captured scene.
Hardware implementations such as a LIDAR sensor would help create more realistically placed AR experiences that would account for the presence of other objects. There are rumours that Apple will include a similar sensor in the next iteration of the iPhone.
In 2015 Apple purchased Metaoi, whose patents were based around capturing an image to discern spatial information, not only allowing objects to be placed realistically within a scene but also to replace and/or remove objects from the real world.
The purchase of Metaoi, the LIDAR sensors on IPad, and LIDAR’s possible inclusion in the next generation of iPhones only intensifies the rumours that Apple is working on its own AR headset sometime in 2022.
If Apple includes depth sensors on the next iPhone iteration, it could compel Google to do the same as the market for AR is too big to ignore for these giants of the smartphone industry.
The use of artificial intelligence (AI) to enhance augmented reality experiences is very exciting, as new advances in image analysis, scene recognition, and such are important fields of research and development. These advancements impact the development of self-driving car technologies, such as the Tesla Autopilot program. This research will also trickle down into consumer-level implementations of AR.
One such development has been Consistent Video Depth Estimation which was developed in a collaboration between Facebook, University of Washington, and Virginia Tech. I will let the videos of their research speak for themselves.
Other research has been around the estimation of light within a given scene, which would allow AR experiences to seamlessly integrate with the environmental light when the AR effect is applied. This technology exists within platform-level integrations like ARCore and ARKit.
Snapchat has been ahead of the curve in implementing machine-learning models within Lens Studio. This means we can train an AI to understand a scene by learning based on predetermined inputs. This is similar to ‘deepfake’, which one might have heard referenced for its ability to convincingly place persons in places they’ve never been for purposes nefarious or humorous, such as placing Nicolas Cage in various film scenes.
This allows Lens Studio to create effects such as neural style transfers where one image’s characteristics are applied against a user image.
Machine learning (ML) also allows for precise scene segmentation, where particular parts of the scene are segmented out. A common segmentation these days is removing the background, but now one could replace the entire sky, ground and other elements.
Beyond this, one can see how AI and ML can be trained to recognise specific objects and environmental elements: For example, an AR filter could be programmed to detect if a user is looking at a specific brand of perfume and give the user more product information on that brand.
The biggest innovator in using artificial intelligence within its product is TikTok. AI drives the entire user experience; controlling the curation of videos, powering lens filters, even influencing hashtags and music suggestions. AI is used extensively throughout the entire TikTok product and likely explains the obsessive nature of its content.
Artificial intelligence will not only have an impact on AR’s future graphical capabilities but also on how it is consumed, as Facebook and Snapchat will seek to mimic the same implementations TikTok has innovated on.
These potential AR innovations wouldn’t matter if the data to deliver these experiences wasn’t available, and this is where 5G would come in. 4G at its peak can translate 1GB of data, whereas 5G can theoretically max out to 20GB while also improving latency times.
Current consumer AR experiences delivered via Instagram, Snapchat, or TikTok are all limited by the amount of data that can be sent to the users wirelessly. An AR filter could include multiple textures, 3D models, sound effects, and so on. All data that would need to be streamed down to the user.
The data restriction impacts implementations as AR filters cannot exceed 4MB (Snapchat and Instagram). This means there is a tight restriction around what kind of assets can be included (images, 3d models, sound, etc.).
5G, through its higher speed and reduced latency, would mean more assets can be used. It would also open the door to streaming AR experiences or streaming holographic video. Imagine experiencing a film trailer and moving around inside it using your device.
The most compelling benefit of improved data transference is that heavy processing of complex effects and experiences can be offloaded to the Cloud, removing the burden from the device. Captured video from the user could be sent to the Cloud, where powerful servers integrate an AR effect, streaming the complete composite back to the user.
All in all, The coming of 5G will greatly elevate augmented reality experiences for all users.
Innovations in depth estimation, artificial intelligence, and the adaptation of 5G will all mean more compelling and realistic AR experiences in the next few years.
Motivated by heavy investments from tech giants including Apple, Google, Facebook, TikTok. and others, augmented reality will continue to grow.
At the same time, the way we interact with augmented reality will shift as well, as new devices may have more sensors, more processing power, and more integration with headset usage.
Google Glass was perhaps ahead of its time, but it is still running as a business proposition. Microsoft already has a second iteration of Hololens, and Apple is rumoured to have its own headset in production, competing with Facebook’s Oculus.
This shift is, in my opinion, the greatest advancement for AR, as these experiences are greatly limited by a smartphone screen. When augmented reality can be seamlessly experienced through a compact, stylish pair of glasses, it will move much closer to the awe-inspiring realm of mixed reality.