Milestone Systems has introduced a new Vision Language Model (VLM) focused on traffic monitoring and video intelligence, helping cities, operators, and developers make sense of large volumes of video data faster and more efficiently. The AI model is powered by NVIDIA Cosmos Reason and is designed to turn complex video footage into clear, actionable insights.
The VLM powers two new offerings: Video Summarization for XProtect Video Management Software and Hafnia Vision Language Model as a Service (VLMaaS) for developers and third-party platforms. Together, these solutions aim to reduce manual video review, improve operational efficiency, and modernise how traffic and surveillance footage is analysed.
The Video Summarization tool, a generative AI plug-in for the XProtect Smart Client, automatically analyses camera footage and produces short text summaries describing what is happening in a video. Instead of watching hours of footage, users can upload a short clip, enter a simple prompt, and receive a summary in seconds. Early testing shows the tool can reduce false alarm fatigue by up to 30 percent, allowing operators to focus on real incidents rather than background noise or irrelevant motion.
Users can search video summaries by content instead of timestamps, bookmark important events, and link summaries to existing alerts and rules within XProtect. The plug-in is free to download, installs quickly, and follows a pay-per-use model, meaning customers only pay when they generate summaries.
For developers, Hafnia VLM as a Service provides API access to ready-to-use AI video intelligence. Built on NVIDIA technology and fine-tuned for traffic scenarios, VLMaaS allows teams to add video summarisation and prompt-based video understanding to applications without building or managing AI infrastructure. Milestone says this can cut development effort by up to 70 times compared to training a model from scratch.
Andrew Burnett, Acting CTO at Milestone Systems, said the launch addresses video overload by giving operators instant insights and developers fast access to enterprise-ready AI. Cities such as Genoa, Italy, and Dubuque, Iowa, are already exploring these capabilities to improve traffic management and road safety, reinforcing the growing role of AI in smart city operations.
Both solutions are powered by Milestone’s Hafnia VLM, trained on 75,000 hours of real-world traffic video from the US and Europe, using responsibly sourced and fully auditable data. The platform is GDPR and EU AI Act compliant and supports regional data hosting.
