technologyneutral

Hyperbolic Fusion Boosts Video Anomaly Detection

Wednesday, June 10, 2026

The new system, PoinCLIP‑VAD, tackles the challenge of spotting unusual events in videos without detailed frame‑by‑frame labels. Traditional methods struggle because they treat visual and textual clues in a flat, Euclidean space, which makes it hard to tease apart subtle differences between normal and abnormal scenes.

Hyperbolic Embedding

PoinCLIP‑VAD moves both video frames and their text descriptions into a curved, hyperbolic space known as the Poincaré ball. This geometry naturally expands distances for items that are far apart while compressing close ones, giving the model a richer way to encode hidden relationships. The approach does not depend on pre‑defined hierarchies, so it can learn structure directly from the data.

Two‑Stage Architecture

  1. Classification Module – Provides an overall score that flags potentially anomalous clips.
  2. Fine‑Grained Alignment Block – Compares video content with its textual description using negative Poincaré distance, tightening the link between what is seen and what is described.

This two‑stage process helps the system learn from weak supervision more effectively than earlier methods.

Performance

Benchmark Metric Result
UCF‑Crime AUC 90.62 %
XD‑Violence AP 86.93 %

These results indicate that the hyperbolic representation improves both detection accuracy and consistency when aligning visual and language signals under limited labeling.

Takeaway

PoinCLIP‑VAD demonstrates that rethinking the underlying geometry can unlock better performance in video anomaly detection, especially when precise annotations are scarce.

Actions