Milestone Systems has released a new Vision Language Model (VLM) purpose-built for traffic understanding and real-world video intelligence. Powered by NVIDIA Cosmos Reason, the VLM underpins two new offerings: Video Summarization for XProtect® Video Management Software and Hafnia VLM as a Service (VLMaaS) for third-party integrations.
Designed to reduce video overload and manual review, the new solutions enable faster insights, automated reporting, and scalable AI-powered video intelligence across traffic and public safety environments.
Video Summarization For XProtect
Modern video systems generate massive volumes of footage, making manual review time-consuming and inefficient. Milestone’s new Video Summarization tool addresses this challenge by converting raw video into concise, searchable text summaries directly inside the XProtect Smart Client.
Early deployments indicate that video summarization can reduce operator false alarm fatigue by up to 30%, allowing teams to focus on relevant incidents rather than noise or irrelevant motion.
The generative AI-powered plug-in analyzes submitted video clips and produces natural-language descriptions within seconds, based on user-defined prompts.
Key Capabilities
- Convert video segments into structured text summaries within XProtect Smart Client
- Search summaries by video content rather than timestamps or manual tags
- Bookmark and filter summaries to streamline review workflows
- Trigger automated summaries using existing XProtect rules, alarms, and events
- Filter out irrelevant motion to highlight valid events
- Access region-specific, sovereign VLMs starting with the US and EU
The Video Summarization tool is free to download, installs in minutes, and operates on a pay-per-prompt model.
VLM As A Service For Developers
Milestone also introduced Hafnia VLM as a Service (VLMaaS), providing developers, integrators, and partners with API access to production-ready vision language intelligence.
VLMaaS eliminates the complexity of building and managing AI infrastructure, enabling rapid development of AI-powered applications regardless of existing analytics maturity. The platform supports everything from MVP testing to enterprise-scale deployments.
Milestone reports that VLMaaS can reduce development effort by up to 70x compared to fine-tuning a custom VLM independently.
Key Capabilities
- High-accuracy vision language model optimized for traffic environments
- Built on NVIDIA Cosmos Reason and traffic-focused fine-tuning
- Prompt-based instructions for traffic-related operations
- API-first delivery via HTTPS
- Fine-tuned regional models for US and EU markets
- Designed for standalone applications or Milestone ecosystem integrations
- 100% responsibly sourced training data with auditable lineage
- GDPR- and EU AI Act-compliant fine-tuning process
Pricing follows a pay-per-use API model, eliminating large upfront investments.
Developers can register for early access at:
https://hafnia.milestonesys.com/
Industry Perspective
Andrew Burnett, Acting Chief Technology Officer at Milestone Systems, said the new offerings directly address long-standing industry challenges.
“With the Vision Language Model as a Service and Video Summarization for XProtect, we’re tackling some of the most challenging bottlenecks: video overload and time-consuming manual work. Operators get immediate insight directly within XProtect; builders get API-first access to production-ready intelligence without bespoke training or heavy infrastructure.”
He added that specialization in real-world traffic video and responsibly sourced data allows customers to deploy confidently and extract value from existing systems.
Cities such as Genoa, Italy, and Dubuque, Iowa, are among early adopters, using the new capabilities to advance intelligent traffic management initiatives.
Built On Responsible AI And Real-World Data
Both solutions are powered by Milestone’s Hafnia VLM, fine-tuned on 75,000 hours of responsibly sourced real-world traffic video from Europe and the United States. Data preparation leverages NVIDIA Cosmos Curator, with deployment across cloud and regional data centers.
This combination of NVIDIA Cosmos Reason and Milestone’s domain-specific training positions Hafnia as one of the industry’s most advanced video AI platforms.

Milestone Systems is a world leader in data-driven video technology used across industries including manufacturing, airports, law enforcement, retail, and traffic management. Its portfolio includes XProtect video management software, BriefCam AI-powered analytics, and Arcules cloud VSaaS solutions that help organizations learn from the past, understand the present, and predict the future.
https://www.milestonesys.com/news/
https://security.world/video-management-software-innovation/
Frequently Asked Questions (FAQs)
What is Milestone’s Vision Language Model (VLM)?
Milestone’s VLM is a generative AI model designed to analyze and interpret real-world traffic video, producing searchable summaries and enabling advanced video intelligence.
What is Video Summarization for XProtect?
It is a plug-in for XProtect Smart Client that converts video clips into text summaries, helping operators review footage faster and reduce false alarms.
What is Hafnia VLM as a Service?
Hafnia VLMaaS provides API access to Milestone’s vision language model, allowing developers to embed video intelligence into applications without managing AI infrastructure.
Is the training data compliant with regulations?
Yes. The model is fine-tuned using 100% responsibly sourced data and is compliant with GDPR and the EU AI Act.
How is pricing structured?
Video Summarization operates on a pay-per-prompt model, while VLMaaS uses pay-per-use pricing based on API calls.
Source: milestonesys.com