XRAI - Hyperlocal AI for Spatial/Mobility Applications

simon.micollier@gmail.com

We’re revolutionizing the Metaverse by making AI accessible to virtual reality developers. Our platform empowers anyone, regardless of AI expertise, to seamlessly integrate Virtual Assistants and Recommenders into the Metaverse. Think of us as the HelloFresh for AI in immersive digital reality, delivering automation to VR/AR users. With a focus on high-potential data-driven companies, our 100% API, SaaS solution solves the mobility/spatial deep learning challenge. We envision a future where our technology enables high-accuracy Spatial AI compute, bringing intelligence to Extended Reality and shaping the future of immersive experiences.

Capture :alt: mobility guidance to optimize your path to checkpoints


Introduction

Problem: The Future is the Metaverse

The future of the Internet lies in the Metaverse, where virtual and augmented reality, powered by artificial intelligence, are reshaping how we interact with digital content. Intelligent Virtual Assistants (IVAs) and Recommender systems are rapidly becoming part of our everyday lives, from vocal assistants like Apple’s Siri to navigating city streets with Google’s Maps Live View. These technologies simplify our daily tasks by providing data and content seamlessly.

However, as the Metaverse emerges, there’s a gap. There haven’t been widespread experiments to integrate IVAs and Recommenders into the Metaverse. Yet, as this digital space evolves, incorporating these technologies is inevitable.

Solution: Making AI Accessible in the Metaverse

Our mission is to bridge this gap. We aim to assist Metaverse developers, even those without AI expertise, in creating AI, including Virtual Assistants and Recommenders. Our goal is to enable AI to be seamlessly integrated into the Metaverse platforms, making it a part of users’ daily experiences, whether at home, work, or during their mobility journeys.

We’ve successfully implemented a specialized artificial intelligence model trained on hyperlocal data points, collected through our Immersive web application.

Think of us as the HelloFresh for AI in immersive digital reality, delivering AI capabilities to Virtual Reality (VR) and Augmented Reality (AR) users. Our focus is to automate digital labor, ensuring users stay in “Fresh Digital Health.”

Context: VR/AR in 2022

In the current landscape, VR devices are making their way to end-users, with companies like Meta/Facebook investing in their widespread adoption. AR devices, such as Smart Glasses, are yet to gain momentum due to display limitations and pricing. However, we anticipate a shift, with Smart Glasses potentially replacing or extending smartphones by 2025.

The way users consume digital content is evolving, emphasizing the need for VR/AR applications. These applications merge the digital and real worlds, providing a unique programming interface for managing audio/display, sensors, and inputs. This paradigm shift enhances user experiences, making them faster, better, and capable of achieving the seemingly impossible.

Mobility/Spatial AI for Paradigm Shift

Our approach includes running Mobility/Spatial AI Deep Learning Models on top of VR/AR applications. This approach is driven by:

  1. Hyperlocal Data: Leveraging data from spatial applications for high accuracy AI.

  2. Niche Expertise: Recognizing the scarcity of skills in combining 3D, AI, and creative elements.

  3. Understanding End-users: Providing contextualized, high-accuracy AI services directly within the user’s experience.

Use Cases Unlocking Services

Our technology opens doors to new services in various sectors:

  • Outdoor Pedestrians Tasks: Enhancing sports performance, gamification, and more.

  • Health Services Tasks: Enabling surgeon act control, monitoring, acrophobia detection, and more.

  • Indoor Services Tasks: Facilitating Drone AI Pathfinding, Super Human augmented applications, in-person navigation in shops, and more.

  • Gaming & Social Services Tasks: LLMs, ChatGPT for non player characters, Space control and monitoring.

  • Immersive Payment Services: Introducing zero-friction virtual card payments, transaction control, and monitoring.

While these business cases can be implemented, their success will depend on the adoption of VR/AR devices.

Problem: Mobility/Spatial Deep Learning as a Service

Solution: 100% API, SaaS Solutions

We address the challenge of mobility/spatial deep learning as a service with a 100% API solution. Our Software as a Service (SaaS) offerings specialize in solving this problem, focusing on high-potential data-driven companies.

Vision: Enabling High Accuracy Spatial AI Compute

Our vision is to serve as an enabler for high-accuracy Spatial AI compute, bringing AI to Extended Reality (XR) and enabling intelligent meta-worlds. In essence, we strive to make AI an integral part of the immersive experiences in the evolving digital landscape.

Key Features

  • Problem to Solve:

    • Addressing the challenge of Mobility/Spatial Deep Learning as a Service.

  • Significance:

    • VR/AR applications heavily rely on Sensors and Inputs, unlocking substantial hyperlocal data on end-users’ devices—the future of Software and applications.

    • Consumer businesses seek AI-enabled VR/AR content for enhanced user engagement, while corporations aim for performance improvement and manual work automation through AI.

  • Target Users:

    • Consumer Businesses: Enhance Metaverse spaces with StopShop or Entertainment, delivering the value of AI and 3D through AI-enabled Spaces.

    • Corporate Businesses: Optimize Digitized Spaces like Warehouses or Medical facilities with Operators/Employees, leveraging AI and 3D for data-driven Control & Monitoring operations.

  • Implementation Approach:

    • Technical Expertise: Offer a 100% API solution, enabling businesses to seamlessly integrate technical capabilities into their applications.

    • AI Model Specialization: Provide an End-to-End AI Learning pipeline, ensuring performant, scalable, and multi-model (Recommenders, Classifiers, and Generators) solutions for XR datasets, training challenges, and execution.

    • Expertise and Agility: Demonstrate expertise in the field and a commitment to agile development for swift advancements in the flow of innovation.

Room Task Example

VIDEO of the Tensor Core task FP16 resolution FP32 resolution

Technical Approach with Product Specialized Modules

Hyper-Grid

Representing Real World & Virtual Spaces as multi-dimensional grids in which your operators/employees and end-users operate. For example, a digital twin parcel of a Hangar block, a Fantasy corner, and a Virtual room in the Metaverse.

XRAI Learnings

Training Model for the given First-Person user session samples to Think and Act through predictions, recommendations, contextual actions, and choices along the operator/end-user journey.

XRAI Inference

Serving the results of AI computation for different applications, including User situation detection in Mobility, Travel, Health, Security, and Consumer use cases.

XRAI Com Interface

Communicating Immersive Feedbacks to your Spatial application in Digital Ears and Digital Eyes of your Users.

Model Capabilities

Model Architecture

  • Problem we’re Addressing:

    • Creating a powerful and adaptable AI system to address the challenges of Mobility/Spatial Deep Learning as a service across various business scenarios.

  • Why It’s Essential:

    • Document Signal: Text, Audio, and Image documents have defined structures, like 2D waves for sound and 2D/3D grids for images. However, User Spatial Sessions* documents are unique. In our representation, they consist of sets of Spatial Datapoints with 6DOF* coordinates, as users experience the environment in a First-Person (camera) perspective.

      • Each Datapoint lacks meaning when considered independently, requiring a connection in a Spatial Graph using our Hypergrid geometric data structure. This approach ensures a meaningful document structure for precise model training.

    • Our model specializes in delivering the best results for spatial/mobility computations, showcased in this article, accelerating advancements in this field through our AI specialization.

Definitions for Non-Technical Audience

  • 6DOF: Stands for 6 degrees of freedom, giving users the ability to explore the digital or real world in a first-person view. It means users can move in six different ways—forward/backward, up/down, left/right, pitch, yaw, and roll.

  • User Spatial Session: Imagine it as one activity in the Metaverse, like a unique experience you have. It has a start time, an end time, and a specific identifier, making it a distinct journey or interaction within the virtual space.

XRAI Model Overview

XRAI Model is like a highly trained guide for Spatial AI—it uses advanced technologies to understand and navigate the virtual world. Think of it as the brains behind making virtual reality smart and interactive. Here’s a simple breakdown:

Architecture

XRAI employs a Structural Geometric Neural Network (GraphNN) as its core structure. It also integrates other models for processing Image, Text, and Audio signals, enhancing its abilities to provide feedback to users.

Hypergrid

It’s like a helper for the XRAI Model, sorting out data into spatial (related to space) and non-spatial categories.

  • For spatial data (like where you move), it connects points to create a visual map for the specialized Model processor.

  • For non-spatial data (other info), it organizes it into a table-like structure for easy handling.

Specialized Techniques

XRAI is a Spatial AI expert, using various techniques (fancy tools like Data Normalization, GAN, Auto-Encoders, and more) to provide top-notch spatial AI computations. It’s like having a high-tech assistant for virtual adventures.

Accelerated Hardware

XRAI loves speed! It supports fancy hardware to run as XRAI accelerates Mobility/Spatial AI Deep Learning through software specialization. But, keep in mind, it’s hungry for data.

Monitoring and Control

XRAI isn’t just smart; it’s also self-aware. It introduces tools and metrics to watch over every step of the training and decision-making process. This ensures it stays sharp and can quickly adapt in case things go off track.

Who Benefits?

XRAI is here for all the developers working on spatial/mobility applications. They can focus on making cool stuff while XRAI handles the complex Spatial AI parts—like a helpful sidekick for VR/AR and AI enthusiasts.

How It Works

XRAI understands space and movement (Spatial Data), uses geometric tricks (Spatial geometric operators), and applies its knowledge (XRAI Model) for advanced 3D/6DOF AI Compute capabilities during training and real-world tasks.

Now, you have a glimpse of how XRAI makes the virtual world smarter and more interactive without needing to dive into the technical details! 🚀

Model Architecture

Dataset Exploration

Our mobile web app prototype, developed using Three.js and WebXR, serves as a playground for user testing data. Accessible on WebXR-enabled browsers like Mozilla WebXR Viewer for IOS, it provides a glimpse into the future where headsets like Apple Vision Pro or META Quest are mainstream. Although the technical code is a work in progress, the prototype serves its purpose well and can be enhanced further with technologies like TypeScript and react-three-fiber.

Purpose of Development

The aim was to create a mobile application sandbox to collect user testing data, especially given the lack of a suitable public dataset for our Mobility and Spatial AI use case. The prototype resembled a Stop Shop supporting both AR and VR, displaying a 3D Fantasy corner and merchant products in a digital environment.

User Interaction

End-users could engage with the digital environment, seeing calls to action like Display or Add to Cart for 3D products. Although we didn’t implement the Payment API, users could further explore personalized retail products recommended by the system, such as a motorcycle or a jacket.

Development Output

Our development resulted in a spatial interface for end-users, incorporating visual displays, audio sounds, and integration with the XRAI Model for inference.

Dataset Creation

  • Dataset Problem:

    • User Spatial Sessions, a collection of spatial datapoints containing features like End-User Spatial Position coordinates, Digital Eye vector, and End-User Activities.

    • User Sessions include non-spatial data like User Agent, OS, and metadata.

    • Data was dynamically sent by our Mobile App and stored in MongoDB. To build our dataset, we developed a database connector, downloading User Spatial Sessions from around 20 testing users.

  • Choosing Our Dataset:

    • No public dataset was available earlier, necessitating the creation of our own.

    • Capitalizing on our experience, we aim to assist others in bringing their datasets to the platform and challenging our model.

  • Who Benefits:

    • Spatial Application Developers commercializing VR/AR applications.

    • Spatial Hardware Developers for specific industrial cases like Drone, Healthcare, or Sports.

  • Dataset Creation Steps:

    • Filtering non-representative samples.

    • Subsetting to challenge the most complex samples through clusters (e.g., similar user sessions in-shop with a time spent between 10 and 20 seconds, only viewing products without digital interactions).

    • Blacklisting personal privacy features.

    • Normalizing sample features.

This dataset is a glimpse into the future, where advanced headsets will produce a wealth of datapoints, allowing us to refine and enhance our models for even more immersive experiences.

Dataset developer form:

If you are developing Mobility/Spatial Application hang with us, it does not matter if it’s too soon to establish contact touchpoint !

Here is some reference to help you get started to make your own Dataset:

Model Training

  • What’s the Problem?

    • Training needs to work across different spaces and adapt to various devices and sensors.

    • Data needs special preparation before training due to different devices and polymorphic data.

    • The 6DOF neural network used for spatial understanding demands a lot of computing power.

  • Why Solve Training?

    • Training needs to be universal, whether it’s a user with a smartphone in one space or someone with smart glasses in another.

    • The process is demanding, especially for the complex 6DOF neural network.

    • We need a robust training system that can handle different scales effortlessly.

  • Who Uses It?

    • Data Scientists and Developers.

How It Works

  • Achieving Seamless Training

    • It’s like Continuous Training for our smart AI system.

    • Continuous Monitoring ensures we stay on track, comparing against set benchmarks.

    • Techniques like space segmentation, normalization, and resolution adjustments maintain consistency across spaces.

    • Fine-tuning allows us to enhance the model for specific dataset clusters or new datasets.

    • Adjusting model hyperparameters is possible through a simple model structure file definition.

    • The model flows smoothly with parallelization, saving, and deployment.

    • Configuration details like batch size (3), optimizer (Adam), learning rate gamma (0.001), and weight decay (0.01) can be customized.

Training the AI model becomes a well-managed, adaptable, and efficient process, making our spatial understanding technology reliable and effective.


That is a End-User User Reference Space : sim

That is a Digital Reconstruction : sim

That is a End-User User Session (and User Act) : sim

That is a End-User User Session and User Act Precision Resolution processing : enter image description here

That resulted Data Structure is converted into a Geometric Graph Node and our algorithm handle the complexity of the adjacency graph.

Graph data structure is then directly ingested by our Graph Neural Network Model .

API

  • Swag it

API Endpoints

Developer Package

Raw Data

enter image description here

enter image description here

enter image description here

class  ModelAPI(ABC):
@abstractmethod
def  loadModel(self, input):
"""
"""
return

@abstractmethod
def  train(self, input):
"""
"""
return


@abstractmethod
def  inference(self, input):
"""
"""
return

"""

@abstractmethod
def load_checkpoint():
return

@abstractmethod
def eval_step():
return

@abstractmethod
def save_model():
return

@abstractmethod
def test_run():
return

@abstractmethod
def eval_model_performance():
return

@abstractmethod
def train_log():
return

@abstractmethod
def build_report():
return
"""

class  VisionModelAPI(ModelAPI):
	def  __init__(self):
		super(ModelAPI, self).__init__()
		
	def  loadModel(self, model=None):
		from  src.serve  import  get_bundle, get_model, device
		bundle = get_bundle()
		model = get_model(bundle)
		return  bundle, model, device

	def  inference(self, input):
		from  src.serve  import  inference
		out = inference(input)
		return  out

	def  train(self, views_input_file, engaged_input_file):
		from  src.XRPredict  import  train_model_checkpoint
		model = train_model_checkpoint(views_input_file,engaged_input_file)
		return


class  AudioModelAPI(ModelAPI):
	def  __init__(self):
		super(ModelAPI, self).__init__()

	def  loadModel(self, model=None):
		device = torch.device('cuda'  if  torch.cuda.is_available() else  'cpu')
		bundle = torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H
		model = bundle.get_model().to(device)
		return  bundle, model, device

	def  defineModelSpec(spec):
		class  GreedyCTCDecoder(torch.nn.Module):
			def  __init__(self, labels, blank=0):
				super().__init__()
				self.labels = labels
				self.blank = blank

			def  forward(self, emission: torch.Tensor) -> str:
				indices = torch.argmax(emission, dim=-1) # [num_seq,]
				indices = torch.unique_consecutive(indices, dim=-1)
				indices = [i  for  i  in  indices  if  i != self.blank]
				return  ''.join([self.labels[i] for  i  in  indices])
		return  GreedyCTCDecoder
		
	def  train(self):
	return

	def  inference(self, input):
	return

	def  vec_matching(self, input_1, input_2=None):
	'''compute distance between one vectorized input 1 and persisted index
	optional : can compute distance between vector1 and vector2
	return top-k matchs'''

Results

  • what is the given problem to solve

    • what metric we used

      • accuracy

      • confusion matrix

      • loss

      • auc/roc

      • precision

      • recall

      • f1

    • tracked/optimized

    • other dataset

  • why we want to solve it

    • j

  • who will be using it

    • j

  • how this will be achieved

    • using dataset

    • using monitoring tool

    • using scale backend Python/Pytorch FastAPI

Dataset1

Dataset2

Exp.

Accuracy

F1

Accuracy

F1

DecisionTree

🌮

🌮

🌮

🌮

NN

🌮

🌮

🌮

🌮

Groundtruth

🌮

🌮

🌮

🌮

Exp1

0.81

🌮

🌮

🌮

Exp1 + Finetuning

0.91

🌮

🌮

🌮

Exp2

0.96

🌮

🌮

🌮

enter image description here

Due to the time and space constraints in the project, we could not expand the experiments to all the possible use-cases of XRAI. We plan to include those in our future study and add new capabilities to XRAI that would give more

Try out

Ethical Concerns

Tracking can be sensitive to local regulation but our hyperlocal approach based on anonymous hyperlocal data is effective and can help to capture end-user behaviours and profiles.

Conclusion

For AI-Enabled companies, have to migrate their AI implementation into the new VR/AR paradigm shift: new architecture, new product, new compute cluster and so it’s faster to provide them with fast to implement AI solutions

For Not AI-Enabled companies: have to enable AI for their Business cases including direct VR/AR AI Deep Learning solutions.