
Press Release
Long Beach, CA, June 15, 2026. MultiSet AI today announced two releases at AWE USA 2026 that collapse the cost and complexity of building spatial infrastructure: a single 360-to-VPS pipeline that turns one consumer 360 camera capture into both a machine-readable VPS map and a human-readable Gaussian Splat, and VPS Gen2, a new attention-based positioning engine that raises localization performance in exactly the environments where visual positioning historically failed.
Together they change the entry economics of enterprise spatial computing. A 60-second walk through a space with a camera that costs around $500 now produces the two outputs every spatial operation needs: coordinates machines can localize against, and a photorealistic 3D scene people can view, review, and share.
One 360 video capture now produces both a centimeter-accurate Visual Positioning System map and a 3D Gaussian Splat from a single upload. Devices, wearables, and robots get a positioning layer; the people who manage the space get a visual twin. Both outputs are generally available today.

The pipeline joins MultiSet's existing ingest paths rather than replacing them. Teams already bring E57 point clouds, LiDAR scans, Matterport exports, Gaussian Splats, and phone captures into the same platform. No proprietary capture app. No prescribed scanner. The hardware a team already owns is the hardware that works.
The pipeline accepts raw .insv footage from Insta360 X4 and X5 cameras and returns a live, localizable map in five steps:
The math is simple. One thousand square feet of space maps from roughly 60 seconds of 360 video. A warehouse re-scan that once meant scheduling survey-grade rigs costing $50,000 or more now means handing a field tech a camera they can carry in one hand. The accuracy gap still matters for engineering-grade survey work. For AR indoor navigation, asset finding, AR work instructions, robot localization, and training, it no longer does.

VPS Gen2 is a ground-up upgrade to MultiSet's positioning engine built on an attention-based mechanism that weights the genuinely distinctive features in a scene and down-weights the generic, repeated ones. That is precisely what fails in older systems: long parallel corridors, train platforms, hospital wings, parking levels, and retail aisles look nearly identical from view to view, and positioning engines collapse them onto the same pose.
MultiSet evaluated Gen2 on a benchmark spanning more than 70 datasets and 16,068 test queries, holding every variable fixed except the positioning engine itself. Overall recall climbed from 54.6% to 61.8%. The gains scale with difficulty: the hardest, most repetitive environments saw recall lifts of 15 to 22 points. On the toughest transit scene in the benchmark, recall jumped from 49% to 68% while median position error dropped from 6.0 meters to 1.8 meters.

Single-frame query mode. False-positive rate: lower is better.
Gen2 also addresses the most dangerous failure mode in positioning: the confident, wrong pose that downstream systems act on. New reject-gating turns most would-be false positives into either correct localizations or honest rejections. In a constrained-indoor stress test, the false-positive rate fell from 73% to 13%.
Every figure above was measured in single-frame query mode. MultiSet's multi-query localization API, which fuses a short burst of frames with device SLAM data, stacks further gains on top. The full benchmark methodology is published in the companion engineering post.
In 2025, the AREA's independent enterprise benchmark ranked MultiSet the Most Robust VPS tested, out of the box. Gen2 extends that lead. The platform today spans 16M+ square feet mapped across 2,500+ locations in 120+ countries, with 500K+ device interactions. A Fortune 100 industrial customer running MultiSet in private cloud production reports 2.5× technician productivity on asset finding and 4× faster mean time to repair on critical incidents.
The same binary runs in public cloud, private cloud, on-premise, and fully air-gapped environments, with SDKs across Unity, native iOS and Android, WebXR, Meta Quest, Meta Ray-Ban smart glasses, and ROS 2. Maps created from 360 video gain every platform capability, including Map Versioning, which lets teams re-scan a space while anchors and content carry forward.
The 360-to-VPS pipeline and Gaussian Splat output are generally available today for Insta360 X4 and X5 captures. VPS Gen2 is live for all new maps, across every query API, with no code changes to existing integrations. Developers can start at developer.multiset.ai; setup steps are in the Insta360 setup documentation.
MultiSet is demonstrating the full pipeline live at AWE USA in Long Beach, June 15 to 18. Book a meeting.
What is VPS Gen2?
VPS Gen2 is MultiSet's next-generation Visual Positioning System engine. It uses an attention-based mechanism that prioritizes distinctive scene features, raising overall recall from 54.6% to 61.8% across a 70+ dataset benchmark and cutting the false-positive rate from 73% to 13% in a constrained-indoor stress test.
Which 360 cameras does the pipeline support?
Insta360 X4 and X5 are supported at general availability. The pipeline accepts raw .insv files; no proprietary capture app is required.
How much space does one 360 capture map?
Roughly 1,000 square feet per 60 seconds of 360 video, processed into a centimeter-scale VPS map.
Do existing integrations need code changes for VPS Gen2?
No. Gen2 applies to all new maps created on the platform, across every query API, with no code changes to existing integrations.
Is the Gaussian Splat output generally available?
Yes. The Gaussian Splat output ships generally available today alongside the VPS map from the same 360 capture.
MultiSet AI builds the independent, scan-agnostic Visual Positioning System for enterprise AR and robotics: the spatial substrate for Physical AI. Teams bring any scan format, deploy in any environment from cloud to air-gapped, and localize devices at sub-5cm accuracy across every major platform. Learn more at multiset.ai.