--- title: XRAI - Hyperlocal AI for Spatial/Mobility Applications description: We're revolutionizing the Metaverse by making AI accessible to virtual reality developers. lang: en-US --- # XRAI - Hyperlocal AI for Spatial/Mobility Applications We're revolutionizing the Metaverse by making AI accessible to virtual reality developers. Our platform empowers anyone, regardless of AI expertise, to seamlessly integrate Virtual Assistants and Recommenders into the Metaverse. Think of us as the HelloFresh for AI in immersive digital reality, delivering automation to VR/AR users. With a focus on high-potential data-driven companies, our 100% API, SaaS solution solves the mobility/spatial deep learning challenge. We envision a future where our technology enables high-accuracy Spatial AI compute, bringing intelligence to Extended Reality and shaping the future of immersive experiences. ![Capture](https://i.postimg.cc/nVKNCmsV/DHQ-copie.png) :alt: mobility guidance to optimize your path to checkpoints --- - ๐Ÿ‘‰ [Try the XRAI demo ](https://www.halogem.co/) - ๐Ÿ‘‰ Visit the XRAI [project page](https://www.halogem.co/solutions/) - ๐Ÿ‘‚ Try WebXR Sound Feedback [mobile web app](https://rd.halogem.co/fb/arplayer) on 3D Voice Speaker - ๐Ÿ‘ Try WebXR Vision Feedback [mobile web app](https://rd.halogem.co/fb/arplayer) on 3D Model Recommender - ๐Ÿ’Ž Visit GEM Documentation [tutorial](https://threejs.org/docs/#api/en/renderers/webxr/WebXRManager) - ๐Ÿง  Try XRAI demo on [Collab](https://colab.research.google.com/)
## Introduction **Problem: The Future is the Metaverse** The future of the Internet lies in the *Metaverse*, where virtual and augmented reality, powered by artificial intelligence, are reshaping how we interact with digital content. *Intelligent Virtual Assistants (IVAs)* and *Recommender systems* are rapidly becoming part of our everyday lives, from vocal assistants like Apple's Siri to navigating city streets with Google's Maps Live View. These technologies simplify our daily tasks by providing data and content seamlessly. However, as the Metaverse emerges, there's a gap. There haven't been widespread experiments to integrate IVAs and Recommenders into the Metaverse. Yet, as this digital space evolves, incorporating these technologies is inevitable. **Solution: Making AI Accessible in the Metaverse** Our mission is to bridge this gap. We aim to assist Metaverse developers, even those without AI expertise, in creating AI, including *Virtual Assistants and Recommenders*. Our goal is to enable AI to be seamlessly integrated into the Metaverse platforms, making it a part of users' daily experiences, whether at home, work, or during their mobility journeys. We've successfully implemented a specialized artificial intelligence model trained on hyperlocal data points, collected through our *Immersive web application*. Think of us as the *HelloFresh for AI* in immersive digital reality, delivering AI capabilities to *Virtual Reality (VR)* and *Augmented Reality (AR)* users. Our focus is to automate digital labor, ensuring users stay in *"Fresh Digital Health."* **Context: VR/AR in 2022** In the current landscape, VR devices are making their way to end-users, with companies like *Meta/Facebook* investing in their widespread adoption. AR devices, such as *Smart Glasses*, are yet to gain momentum due to display limitations and pricing. However, we anticipate a shift, with Smart Glasses potentially replacing or extending smartphones by 2025. The way users consume digital content is evolving, emphasizing the need for VR/AR applications. These applications merge the digital and real worlds, providing a unique programming interface for managing audio/display, sensors, and inputs. This paradigm shift enhances user experiences, making them faster, better, and capable of achieving the seemingly impossible. **Mobility/Spatial AI for Paradigm Shift** Our approach includes running *Mobility/Spatial AI Deep Learning Models* on top of VR/AR applications. This approach is driven by: 1. **Hyperlocal Data:** Leveraging data from spatial applications for high accuracy AI. 2. **Niche Expertise:** Recognizing the scarcity of skills in combining *3D, AI, and creative elements*. 3. **Understanding End-users:** Providing contextualized, high-accuracy AI services directly within the user's experience. **Use Cases Unlocking Services** Our technology opens doors to new services in various sectors: - **Outdoor Pedestrians Tasks:** Enhancing sports performance, gamification, and more. - **Health Services Tasks:** Enabling surgeon act control, monitoring, acrophobia detection, and more. - **Indoor Services Tasks:** Facilitating Drone AI Pathfinding, Super Human augmented applications, in-person navigation in shops, and more. - **Gaming & Social Services Tasks:** LLMs, ChatGPT for non player characters, Space control and monitoring. - **Immersive Payment Services:** Introducing zero-friction virtual card payments, transaction control, and monitoring. While these business cases can be implemented, their success will depend on the adoption of VR/AR devices. **Problem: Mobility/Spatial Deep Learning as a Service** **Solution: 100% API, SaaS Solutions** We address the challenge of *mobility/spatial deep learning as a service* with a 100% API solution. Our *Software as a Service (SaaS)* offerings specialize in solving this problem, focusing on high-potential data-driven companies. **Vision: Enabling High Accuracy Spatial AI Compute** Our vision is to serve as an enabler for high-accuracy Spatial AI compute, bringing AI to *Extended Reality (XR)* and enabling intelligent meta-worlds. In essence, we strive to make AI an integral part of the immersive experiences in the evolving digital landscape. ### Key Features - **Problem to Solve:** - Addressing the challenge of *Mobility/Spatial Deep Learning as a Service*. - **Significance:** - VR/AR applications heavily rely on Sensors and Inputs, unlocking substantial hyperlocal data on end-users' devicesโ€”the future of Software and applications. - Consumer businesses seek AI-enabled VR/AR content for enhanced user engagement, while corporations aim for performance improvement and manual work automation through AI. - **Target Users:** - *Consumer Businesses:* Enhance Metaverse spaces with StopShop or Entertainment, delivering the value of AI and 3D through AI-enabled Spaces. - *Corporate Businesses:* Optimize Digitized Spaces like Warehouses or Medical facilities with Operators/Employees, leveraging AI and 3D for data-driven Control & Monitoring operations. - **Implementation Approach:** - *Technical Expertise:* Offer a 100% API solution, enabling businesses to seamlessly integrate technical capabilities into their applications. - *AI Model Specialization:* Provide an End-to-End AI Learning pipeline, ensuring performant, scalable, and multi-model (Recommenders, Classifiers, and Generators) solutions for XR datasets, training challenges, and execution. - *Expertise and Agility:* Demonstrate expertise in the field and a commitment to agile development for swift advancements in the flow of innovation. ## Room Task Example VIDEO of the Tensor Core task FP16 resolution FP32 resolution ### Technical Approach with Product Specialized Modules #### Hyper-Grid Representing Real World & Virtual Spaces as multi-dimensional grids in which your operators/employees and end-users operate. For example, a digital twin parcel of a Hangar block, a Fantasy corner, and a Virtual room in the Metaverse. #### XRAI Learnings Training Model for the given First-Person user session samples to *Think* and *Act* through predictions, recommendations, contextual actions, and choices along the operator/end-user journey. #### XRAI Inference Serving the results of AI computation for different applications, including User situation detection in Mobility, Travel, Health, Security, and Consumer use cases. #### XRAI Com Interface *Communicating* Immersive Feedbacks to your Spatial application in Digital Ears and Digital Eyes of your Users. ![Model Capabilities](https://lucid.app/publicSegments/view/2a4f2ae5-caea-4a00-8ced-51a654651e9e/image.png) ## Model Architecture - **Problem we're Addressing:** - Creating a powerful and adaptable AI system to address the challenges of Mobility/Spatial Deep Learning as a service across various business scenarios. - **Why It's Essential:** - **Document Signal:** Text, Audio, and Image documents have defined structures, like 2D waves for sound and 2D/3D grids for images. However, **User Spatial Sessions*** documents are unique. In our representation, they consist of sets of Spatial Datapoints with **6DOF*** coordinates, as users experience the environment in a First-Person (camera) perspective. - Each Datapoint lacks meaning when considered independently, requiring a connection in a Spatial Graph using our Hypergrid geometric data structure. This approach ensures a meaningful document structure for precise model training. - Our model specializes in delivering the best results for spatial/mobility computations, showcased in this article, accelerating advancements in this field through our AI specialization. ## Definitions for Non-Technical Audience - **6DOF:** Stands for 6 degrees of freedom, giving users the ability to explore the digital or real world in a first-person view. It means users can move in six different waysโ€”forward/backward, up/down, left/right, pitch, yaw, and roll. - **User Spatial Session:** Imagine it as one activity in the Metaverse, like a unique experience you have. It has a start time, an end time, and a specific identifier, making it a distinct journey or interaction within the virtual space. ## XRAI Model Overview **XRAI Model** is like a highly trained guide for Spatial AIโ€”it uses advanced technologies to understand and navigate the virtual world. Think of it as the brains behind making virtual reality smart and interactive. Here's a simple breakdown: ### Architecture XRAI employs a Structural Geometric Neural Network (GraphNN) as its core structure. It also integrates other models for processing Image, Text, and Audio signals, enhancing its abilities to provide feedback to users. ### Hypergrid It's like a helper for the XRAI Model, sorting out data into spatial (related to space) and non-spatial categories. - For spatial data (like where you move), it connects points to create a visual map for the specialized Model processor. - For non-spatial data (other info), it organizes it into a table-like structure for easy handling. ### Specialized Techniques XRAI is a Spatial AI expert, using various techniques (fancy tools like Data Normalization, GAN, Auto-Encoders, and more) to provide top-notch spatial AI computations. It's like having a high-tech assistant for virtual adventures. ### Accelerated Hardware XRAI loves speed! It supports fancy hardware to run as XRAI accelerates Mobility/Spatial AI Deep Learning through software specialization. But, keep in mind, it's hungry for data. ### Monitoring and Control XRAI isn't just smart; it's also self-aware. It introduces tools and metrics to watch over every step of the training and decision-making process. This ensures it stays sharp and can quickly adapt in case things go off track. ### Who Benefits? XRAI is here for all the developers working on spatial/mobility applications. They can focus on making cool stuff while XRAI handles the complex Spatial AI partsโ€”like a helpful sidekick for VR/AR and AI enthusiasts. ### How It Works XRAI understands space and movement (Spatial Data), uses geometric tricks (Spatial geometric operators), and applies its knowledge (XRAI Model) for advanced 3D/6DOF AI Compute capabilities during training and real-world tasks. Now, you have a glimpse of how XRAI makes the virtual world smarter and more interactive without needing to dive into the technical details! ๐Ÿš€ ![Model Architecture](https://lucid.app/publicSegments/view/db41eb70-db8b-4abf-839e-77dfa8a851a5/image.png) ## Dataset Exploration Our mobile web app prototype, developed using Three.js and WebXR, serves as a playground for user testing data. Accessible on WebXR-enabled browsers like [Mozilla WebXR Viewer](https://apps.apple.com/fr/app/webxr-viewer/id1295998056) for IOS, it provides a glimpse into the future where headsets like Apple Vision Pro or META Quest are mainstream. Although the technical code is a work in progress, the prototype serves its purpose well and can be enhanced further with technologies like TypeScript and react-three-fiber. ### Purpose of Development The aim was to create a mobile application sandbox to collect user testing data, especially given the lack of a suitable public dataset for our Mobility and Spatial AI use case. The prototype resembled a Stop Shop supporting both AR and VR, displaying a 3D Fantasy corner and merchant products in a digital environment. ### User Interaction End-users could engage with the digital environment, seeing calls to action like Display or Add to Cart for 3D products. Although we didn't implement the Payment API, users could further explore personalized retail products recommended by the system, such as a motorcycle or a jacket. ### Development Output Our development resulted in a spatial interface for end-users, incorporating visual displays, audio sounds, and integration with the XRAI Model for inference. ### Dataset Creation - **Dataset Problem:** - User Spatial Sessions, a collection of spatial datapoints containing features like End-User Spatial Position coordinates, Digital Eye vector, and End-User Activities. - User Sessions include non-spatial data like User Agent, OS, and metadata. - Data was dynamically sent by our Mobile App and stored in MongoDB. To build our dataset, we developed a database connector, downloading User Spatial Sessions from around 20 testing users. - **Choosing Our Dataset:** - No public dataset was available earlier, necessitating the creation of our own. - Capitalizing on our experience, we aim to assist others in bringing their datasets to the platform and challenging our model. - **Who Benefits:** - Spatial Application Developers commercializing VR/AR applications. - Spatial Hardware Developers for specific industrial cases like Drone, Healthcare, or Sports. - **Dataset Creation Steps:** - Filtering non-representative samples. - Subsetting to challenge the most complex samples through clusters (e.g., similar user sessions in-shop with a time spent between 10 and 20 seconds, only viewing products without digital interactions). - Blacklisting personal privacy features. - Normalizing sample features. This dataset is a glimpse into the future, where advanced headsets will produce a wealth of datapoints, allowing us to refine and enhance our models for even more immersive experiences. **Dataset developer form:** If you are developing Mobility/Spatial Application hang with us, it does not matter if it's too soon to establish contact touchpoint ! Here is some reference to help you get started to make your own Dataset: - WebXR [Explainer](https://immersive-web.github.io/webxr/explainer.html) - WebXR for your IOS application with [Mozilla WebXR Viewer](https://apps.apple.com/fr/app/webxr-viewer/id1295998056) and Android mobiles - WebXR for your [Oculus Quest](https://skarredghost.com/2022/01/05/how-to-oculus-spatial-anchors-unity-2/) with [PWA's Web application](https://web.dev/pwas-on-oculus-2/) - XR for your Device OS with [Unity XR](https://docs.unity3d.com/Manual/XR.html), ARKIT, ARCORE ## Model Training - **What's the Problem?** - Training needs to work across different spaces and adapt to various devices and sensors. - Data needs special preparation before training due to different devices and polymorphic data. - The 6DOF neural network used for spatial understanding demands a lot of computing power. - **Why Solve Training?** - Training needs to be universal, whether it's a user with a smartphone in one space or someone with smart glasses in another. - The process is demanding, especially for the complex 6DOF neural network. - We need a robust training system that can handle different scales effortlessly. - **Who Uses It?** - Data Scientists and Developers. ### How It Works - **Achieving Seamless Training** - It's like Continuous Training for our smart AI system. - Continuous Monitoring ensures we stay on track, comparing against set benchmarks. - Techniques like space segmentation, normalization, and resolution adjustments maintain consistency across spaces. - Fine-tuning allows us to enhance the model for specific dataset clusters or new datasets. - Adjusting model hyperparameters is possible through a simple model structure file definition. - The model flows smoothly with parallelization, saving, and deployment. - Configuration details like batch size (3), optimizer (Adam), learning rate gamma (0.001), and weight decay (0.01) can be customized. Training the AI model becomes a well-managed, adaptable, and efficient process, making our spatial understanding technology reliable and effective. --- That is a End-User User Reference Space : ![sim](https://i.postimg.cc/qvyqWQbX/Capture-d-e-cran-2022-02-03-a-17-20-05.png) That is a Digital Reconstruction : ![sim](https://i.postimg.cc/Y9ZjJVHR/Capture-d-e-cran-2022-02-03-a-17-17-47.png) That is a End-User User Session (and User Act) : ![sim](https://i.postimg.cc/SxfKtkPy/all-sessions.png) That is a End-User User Session and User Act Precision Resolution processing : ![enter image description here](https://i.postimg.cc/T36w9pC7/all-voxels.png) That resulted Data Structure is converted into a Geometric Graph Node and our algorithm handle the complexity of the adjacency graph. Graph data structure is then directly ingested by our Graph Neural Network Model . ## API - Swag it |API Endpoints | Developer Package | Raw Data| |--|--|--| |![enter image description here](https://i.postimg.cc/FzY3S39W/Capture-d-e-cran-2022-05-09-a-21-26-35.png)|![enter image description here](https://i.postimg.cc/nry5Yp5M/Capture-d-e-cran-2022-05-18-a-21-27-57.png) | ![enter image description here](https://i.postimg.cc/bJbFyKdh/Capture-d-e-cran-2022-05-18-a-21-31-32.png) | ```python class ModelAPI(ABC): @abstractmethod def loadModel(self, input): """ """ return @abstractmethod def train(self, input): """ """ return @abstractmethod def inference(self, input): """ """ return """ @abstractmethod def load_checkpoint(): return @abstractmethod def eval_step(): return @abstractmethod def save_model(): return @abstractmethod def test_run(): return @abstractmethod def eval_model_performance(): return @abstractmethod def train_log(): return @abstractmethod def build_report(): return """ class VisionModelAPI(ModelAPI): def __init__(self): super(ModelAPI, self).__init__() def loadModel(self, model=None): from src.serve import get_bundle, get_model, device bundle = get_bundle() model = get_model(bundle) return bundle, model, device def inference(self, input): from src.serve import inference out = inference(input) return out def train(self, views_input_file, engaged_input_file): from src.XRPredict import train_model_checkpoint model = train_model_checkpoint(views_input_file,engaged_input_file) return class AudioModelAPI(ModelAPI): def __init__(self): super(ModelAPI, self).__init__() def loadModel(self, model=None): device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') bundle = torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H model = bundle.get_model().to(device) return bundle, model, device def defineModelSpec(spec): class GreedyCTCDecoder(torch.nn.Module): def __init__(self, labels, blank=0): super().__init__() self.labels = labels self.blank = blank def forward(self, emission: torch.Tensor) -> str: indices = torch.argmax(emission, dim=-1) # [num_seq,] indices = torch.unique_consecutive(indices, dim=-1) indices = [i for i in indices if i != self.blank] return ''.join([self.labels[i] for i in indices]) return GreedyCTCDecoder def train(self): return def inference(self, input): return def vec_matching(self, input_1, input_2=None): '''compute distance between one vectorized input 1 and persisted index optional : can compute distance between vector1 and vector2 return top-k matchs''' ``` ## Results - what is the given problem to solve - what metric we used - accuracy - confusion matrix - loss - auc/roc - precision - recall - f1 - tracked/optimized - other dataset - why we want to solve it - j - who will be using it - j - how this will be achieved - using dataset - using monitoring tool - using scale backend Python/Pytorch FastAPI | | Dataset1 | |Dataset2| | |--|--|--|--|--| | Exp. | Accuracy | F1 | Accuracy | F1 | | DecisionTree | ๐ŸŒฎ | ๐ŸŒฎ| ๐ŸŒฎ| ๐ŸŒฎ | | NN | ๐ŸŒฎ | ๐ŸŒฎ | ๐ŸŒฎ| ๐ŸŒฎ | | Groundtruth | ๐ŸŒฎ | ๐ŸŒฎ | ๐ŸŒฎ| ๐ŸŒฎ | | Exp1 | 0.81 | ๐ŸŒฎ | ๐ŸŒฎ| ๐ŸŒฎ | | Exp1 + Finetuning | 0.91 | ๐ŸŒฎ | ๐ŸŒฎ| ๐ŸŒฎ | | Exp2 | 0.96| ๐ŸŒฎ | ๐ŸŒฎ| ๐ŸŒฎ | ![enter image description here](https://i.postimg.cc/QdTQB9Vf/Screenshot-from-2022-02-15-15-56-31.png) Due to the time and space constraints in the project, we could not expand the experiments to all the possible use-cases of XRAI. We plan to include those in our future study and add new capabilities to XRAI that would give more ## Try out ## Ethical Concerns Tracking can be sensitive to local regulation but our hyperlocal approach based on anonymous hyperlocal data is effective and can help to capture end-user behaviours and profiles. ## Conclusion For AI-Enabled companies, have to migrate their AI implementation into the new VR/AR paradigm shift: new architecture, new product, new compute cluster and so it's faster to provide them with fast to implement AI solutions For Not AI-Enabled companies: have to enable AI for their Business cases including direct VR/AR AI Deep Learning solutions.