The Ultimate Guide to Markerless Object Tracking with ModelSet
- Shadnam Khan
- Aug 9
- 10 min read
Updated: Sep 2
TL;DR
ModelSet turns a textured GLB/GLTF (or CAD converted to GLB) into a robust, markerless object anchor you can ship across Unity, WebXR, iOS, Android, Vision Pro, Quest. (Start with ModelSet: Object Anchoring and How to Create a ModelSet)
Prep matters: author true scale (1 unit = 1 m), set +Y up, and use a high-detail texture. That’s the difference between “rock-solid” and “why is this wobbling?”
For browser apps, you’ll pair our pose with WebXR Anchors (e.g., XRAnchor) to keep content glued as tracking updates.
If you come from “model targets” land: traditional pipelines often rely on guide views and cloud-trained “advanced” targets; ModelSet aims for minutes-to-publish and simpler authoring.
Azure’s Spatial Anchors retired on Nov 20, 2024. If you’re migrating, this post walks you through a modern, multi-platform path.
Why Object Tracking (and Why Now)?
Anchors are the “atomic unit” of believable AR: a pose (position + orientation) the system keeps aligned to the real world as it updates its understanding over time. In WebXR parlance, that’s literally the XRAnchor concept: a maintained pose that survives tracking updates so your 3D content doesn’t slide off reality.
Most AR stacks do great with planes (floors, tables) and hit tests. But what if you need your overlay pinned to a specific wrench head, a calibration port, or a museum artifact label reliably, without stickers or special markers? That’s markerless, model-based object tracking: recognize this physical object by its shape + texture, then provide a stable 6-DoF anchor on it.
There’s also a strategic reason. With Azure Spatial Anchors retired in November 2024, teams are looking for cross-platform, standards-aware alternatives that work across mobile, headsets, and importantly the browser. The market is consolidating on a few sensible primitives (poses, anchors, maps), so our goal with ModelSet was to make object-level anchoring dead simple and highly portable.
What is Markerless Object Tracking (and How ModelSet Implements It)?
ModelSet lets you upload a textured 3D model (GLB/GLTF, or CAD converted to GLB) of a real-world object and get back a ModelSet code you can drop into any MultiSet SDK. Your app recognizes the object and locks digital content to it - markerless and with centimeter-level stability - then you deploy that same experience to Unity, WebXR, iOS, Android, Vision Pro, Meta Quest and more. You can run it in our cloud, your private cloud, on device or fully on-premises.
Glossary
Markerless object tracking SDK — a toolkit that recognizes and tracks a rigid object using its geometry/visual features (no QR/fiducials).
Object anchor — a maintained pose tied to a recognized object; in WebXR, this pose is often managed with XRAnchor objects.
Model targets / model-based tracking — recognition of objects via 3D models; historically associated with “guide views,” and in some engines cloud-trained advanced databases for 360° recognition.
360° View vs Side View — our two ModelSet tracking modes; 360° recognizes from any angle; Side View is optimized for lateral viewpoints with faster processing.
The State of the Art (and Why We Made Different Choices)
If you’ve used “model target” systems, you’ve likely spent hours wrangling datasets, guide views, and training queues. The Vuforia Engine, for example, supports “Advanced Model Targets” that are produced via cloud-based deep learning training and organized into databases with recognition ranges and user volumes. It’s powerful—but it’s also operationally heavy when all you want is to upload a model and ship across runtimes.
We architected ModelSet for speed-to-first-anchor and portability:
Upload textured GLB/GLTF, choose 360° or Side View, wait minutes (typ. <10), get your ModelSet code, and ship.
Keep the authoring mental model identical across Unity and WebXR (and native) so your team doesn’t relearn concepts per platform. ModelSet Unity sample scenes are intentionally thin so you can read them in an afternoon.
Respect enterprise constraints: hybrid cloud/private/on-prem; credentials and analytics are documented; WebXR flows use camera intrinsics + frame with options to keep processing inside your perimeter.
Are we the only ones who think “upload model → detect object” should feel this straightforward? Not at all, the broader industry is converging on the pattern. You’ll see marketing on some sites emphasizing “upload a 3D model, detect in real time - no custom training required.”
Where We Differ Are:
MapSet + ModelSet — Area Tracking and Model Tracking in the same SDK, same scene that scales across indoor and outdoor, multi-floor and robust in any environment and lighting condition. Think localizing in a room in a large facility, navigating across the 50,000 square meter production line and detecting various objects that require maintenance in the camera FOV with pinpoint AR overlays of maintenance steps, all within the same session.
Enterprise Readiness — Enterprise-grade privacy and true multi-runtime reach.
Deep Dive: How ModelSet Works (and How to Nail It)
Step 1 — Author a Production-Ready Model
This is 80% of the game. Every stability complaint I’ve ever debugged started with the asset:
Scale: Author at real-world scale. 1 unit = 1 meter. Your users don’t operate in “arbitrary units.” Cameras, FOV, and distances do.
Orientation: Set +Y up. You’ll thank yourself when you’re aligning overlays across Unity, native, and WebXR.
Texture: Provide a high-detail albedo/diffuse texture. Feature-poor, mirror-like surfaces are a drift factory. Decals and labels are your friend.
Geometry: Keep silhouette-critical detail, but don’t ship micro-geometry no one can see.
File size: Aim < 20 MB per object model for faster processing and iterations.
Symmetry note: If the object is rotationally or bilaterally symmetrical, the tracker has fewer unique features to disambiguate pose. Tweak the texture (subtle label, serial plate, orientation marks) so there’s always some “tell” at the expected viewing distance.
CAD teams: Convert to GLB/GLTF, unwrap UVs, keep even texel density on the faces that will be seen in your workflow.
Step 2 — Choose a Tracking Type
360° View (default): Recognizes the object from any angle - top, bottom, sides. This is what you want if users can walk around it.
Side View: Optimized for fixed installs where users view mainly from the sides (machines on the floor, wall panels). It typically processes faster since you’re not training for full top/bottom views you don’t need.
Step 3 — Upload, Process, Publish
Create a ModelSet in the MultiSet dashboard → drag-drop .glb/.gltf → pick 360° or Side View → kick off processing. Most jobs complete in under 10 minutes, after which you’ll see the status flip to Active/Ready and receive a ModelSet code for your app. (This is a great time to make a coffee, not lunch.)
Step 4 — Integrate in Your Runtime
Unity: Import our SDK, open ModelSet Tracking in Sample Scenes, paste your ModelSet code, place your AR overlays as children of the object anchor. Pro tip: the reference mesh is for authoring; remove it from the final build—localization doesn’t need it.
WebXR (browser): In WebXR, you capture camera intrinsics and a frame, call localization, then maintain alignment using XRAnchor so the pose stays glued as the session updates.
iOS / Android (native): Use our native guides to integrate with ARKit/ARCore, authenticated via project credentials.
Step 5 — Author Overlays That Feel “Native to Reality”
Author overlays at true scale.
Use hierarchical anchors for sub-parts (bolt heads, ports).
Add simple occlusion checks; avoid labels poking through metal.
Set animation/FX budgets conservatively—readability beats spectacle in industrial lighting.
Step 6 — Validate in the Field
Take your actual device to the actual environment: the lighting, standoff distances, reflections, and operator movement patterns you expect. Spend ten minutes circling the object, vary angles, and time a 2–3 minute operation. If you see jitter, look first at texture detail (is it readable at distance?) and glare.
Step 7 — Choose Hosting and Privacy
Public cloud for speed.
Private cloud when workloads must stay in-VPC.
On-prem when models/frames can’t leave your network (with optional offline mode). Learn more under On-Premises Localization.
On-Device when speed and security are of equal importance.
A Quick Compare (Mental Models, Not Vendors)
There are a few ways to solve “object anchors” today:
Model-based / upload-a-model: you bring a GLB/CAD/scan, the platform processes it, and you get recognition + pose. This is where ModelSet lives, and you’ll see other vendors espouse “upload model → detect, no custom training” as the value prop as well.
Trained advanced targets: some stacks use cloud-trained deep learning to produce 360° “advanced” model-target databases with one or more guide or “advanced” views; it’s effective, but the tooling and workflows can get heavy.
Plain plane/hit-test AR: great for general placement, not tied to your specific object; awesome for furniture-on-floor, weak for aligning torque specs to this machine.
If your requirement is “explain assembly step #3 on this exact SKU” or “show me this artifact’s story,” model-based wins.
Accuracy & Performance (The Honest Take)
When our customers ask “how accurate is it?”, they rarely want a lab number—they want to know whether a 5 cm misalignment will cause a wrong button press or a safety hazard.
In recommended conditions (true scale, +Y, textured mesh, balanced light, typical distances), ModelSet delivers sub-5 cm anchoring on mainstream mobile devices and modern headsets. If you need to harden this for a ruggedized site, two practical things move the needle most:
Texture density at the viewing distance you expect (don’t texture for the marketing render, texture for the operator’s eyes);
Mitigating symmetry with distinguishers (labels, serial plates, asymmetric details).
On the browser, remember WebXR anchors are your friend, use them to maintain alignment with the runtime’s evolving world understanding. MDN’s XRAnchor coverage and the Immersive Web anchors explainer are nice conceptual references here.
The Classic Pain Points (and Fixes)
“It recognizes, but orientation flips.” Symmetry is the usual suspect. Add a unique mark on one side, or increase texture contrast where the camera actually looks.
“Stable up close, noisy at 2 meters.” Your texture lacks feature scale at distance. Increase macro features (decals, panel edges) and retest at 1–2 m.
“Great in the lab, drifts on site.” Lighting and specular reflections. Reduce glare, consider a matte sticker in the texture, and re-export.
“Processing takes too long.” Do you actually need top/bottom visibility? Side View trims unnecessary views and often processes faster, especially for fixed installs.
WebXR Object Anchors (How It All Clicks Together)
I keep this part boring on purpose because it’s easy to hand-wave. Your WebXR flow is:
Start an AR session
Capture camera intrinsics and a frame
Call localization (ModelSet) → receive a pose
Create an XRAnchor at that pose
Keep your content aligned using anchor updates each frame
The mental model aligns with MDN’s XRAnchor docs and the Immersive Web Anchors module: an anchor is simply a pose the system keeps tracking so your content doesn’t slip as the runtime’s world model updates. If you want to see ours in action, the WebXR Integration doc and GitHub sample make this very literal.
Unity: A Fast Path to “It Works”
Unity remains the fastest way to get an on-device, production-ish test. Our ModelSet Tracking sample is designed so you can:
Paste the ModelSet code
Drag your content onto the object anchor
Build to your target (mobile/headset)
Validate drift in minutes
One easy-to-miss line in our docs: the reference mesh is for authoring and alignment - you don’t need to ship it in your final build. This keeps builds lean and avoids shipping IP unnecessarily.
Security, Hosting, and Analytics (for Your IT Team)
Credentials: create per-project IDs/secrets; rotate as policy dictates.
Hosting: pick cloud for speed, private cloud for isolation, or on-prem when nothing leaves the network. The on-prem page shows how to keep media + localization inside your perimeter (and support offline scenarios).
Analytics: map counts, storage, and VPS API call stats are visible so you can capacity-plan and spot anomalies early.
A Worked Example: The Torque Wrench on My Bench
Goal: The overlay walks a trainee through setting 27 bolts, shows “+” and “–” nudges on the dial, and highlights the objects to be 'wrenched' during calibration.
Asset Prep: I took the CAD, simplified the mesh, added a matte label on the handle (orientation tell), authored 1 unit = 1 m, +Y up, and exported GLB. (If you’re starting from a scan, I’d decimate + retexture vs. pushing a raw, noisy mesh.)
Tracking Type: Since the wrench is mostly viewed on a bench from the sides, I chose Side View. Faster to process, no need to model top/bottom recognition perfectly.
Upload & Process: Created a ModelSet, uploaded GLB, and waited a few minutes while processing finished. Copied the ModelSet code when I saw “Active.”
Unity Integration: Opened ModelSet Tracking sample, added overlays (step labels, torque value), parented them to the object anchor, removed the reference mesh from the final build. Test on iPhone: solid at 0.2–0.8 m standoff.
WebXR Parity: Built the same overlay flow for browser using camera intrinsics + frame → pose → XRAnchor. Watching the anchor keep the pose while I walked around is still a little magic.
Outcome: Consistent, sub-5 cm alignment across mobile and browser, no stickers, no guide screens, no fuss.
Where This Sits in the Ecosystem (and Migration Notes)
If you’re migrating off vendor-specific cloud anchor stacks: ASA is gone (Nov 20, 2024 retirement) and some associated samples are deprecated. A pragmatic migration is to adopt ModelSet for object-specific anchoring and MultiSet VPS for broader scene/venue localization, then standardize your browser layer on WebXR Anchors semantics. This keeps your “write once” mental model and opens up on-prem/self-hosted options that don’t exist in many legacy stacks.
If you’re evaluating “upload model → detect object” style solutions generally: you’ll see similar front-door messaging elsewhere (upload GLB/FBX/OBJ, detect in real time, no custom training). Those claims rhyme with what devs want. Where we obsess is enterprise-grade privacy, on-prem viability, and a unified approach across Unity/WebXR/native so your team’s mental model doesn’t fragment.
Production Checklist (The One I Keep Taped to My Monitor)
GLB/GLTF uses meters; bounds match the real object (measure it!)
+Y up (document your forward axis for designers)
Texture carries detail at real viewing distance; avoid mirror-finish traps
Symmetry mitigated (decals/labels)
File size reasonable (aim < 20 MB)
Tracking Type chosen for actual usage (360° vs Side View)
Tested on target device, in real lighting, at real standoff
Hosting mode selected (cloud / private / on-prem) with IT sign-off
What to Build Next (and How to Measure It)
ModelSet & Object Tracking Use Cases:
Maintenance & Inspection: SOPs pinned to machines
Warehouse Ops & Training: pick guidance, part verification
Field Service: offline-first, on-prem if needed
Exhibits & Retail: no QR codes, just walk up and it recognizes
ModelSet & Object Tracking Metrics:
Session success rate (anchor acquired < X seconds)
Mean overlay error at Y cm standoff (spot-check)
Drift over 3-minute task (millimeters/second, practical benchmark)
Support tickets tagged “tracking” / week
Build size and processing latency (keep iterating assets)
VPS API calls and map/model storage (plan budgets early)
Where to Go From Here
Docs
- ModelSet: Object Anchoring → overview and concepts.
- How to Create a ModelSet → prep checklist, 360° vs Side View, processing.
- Unity SDK: Sample Scenes → ModelSet Tracking.
- WebXR Integration → intrinsics + frame, sample repo.
- Credentials / On-Prem / Analytics for enterprise rollouts.
Product Page
Video
Starter Tasks
- Convert an existing CAD to GLB, add believable texture, and try Side View first if your use case is fixed-view.
- Ship a Unity prototype in a day; then replicate in WebXR for browser parity.
Closing Thoughts (Founder Hat On)
I don’t romanticize CV problems anymore. Most “AR drift” tickets are asset prep issues in disguise, and most “SDK limitations” are just mismatched expectations. ModelSet won’t make bad textures good, but with a sane asset and the right tracking mode, you’ll get sub-5 cm, repeatable anchors that survive demos and production.
My biggest advice? Texture for the distance you ship, not the distance you render. And never go to site without validating under the actual lights your users work in. When you’re ready, spin up your first ModelSet, embed our sample in your project, and ship something your operators don’t have to baby. I’ll be the one at the back of the demo, watching your labels stay pinned while you walk around the machine with a grin.
Comments