Hyperbolic Fusion Boosts Video Anomaly Detection

The new system, PoinCLIP‑VAD, tackles the challenge of spotting unusual events in videos without detailed frame‑by‑frame labels. Traditional methods struggle because they treat visual and textual clues in a flat, Euclidean space, which makes it hard to tease apart subtle differences between normal and abnormal scenes.

Hyperbolic Embedding

PoinCLIP‑VAD moves both video frames and their text descriptions into a curved, hyperbolic space known as the Poincaré ball. This geometry naturally expands distances for items that are far apart while compressing close ones, giving the model a richer way to encode hidden relationships. The approach does not depend on pre‑defined hierarchies, so it can learn structure directly from the data.

Two‑Stage Architecture

Classification Module – Provides an overall score that flags potentially anomalous clips.
Fine‑Grained Alignment Block – Compares video content with its textual description using negative Poincaré distance, tightening the link between what is seen and what is described.

This two‑stage process helps the system learn from weak supervision more effectively than earlier methods.

Performance

Benchmark	Metric	Result
UCF‑Crime	AUC	90.62 %
XD‑Violence	AP	86.93 %

These results indicate that the hyperbolic representation improves both detection accuracy and consistency when aligning visual and language signals under limited labeling.

Takeaway

PoinCLIP‑VAD demonstrates that rethinking the underlying geometry can unlock better performance in video anomaly detection, especially when precise annotations are scarce.

Hyperbolic Embedding

Two‑Stage Architecture

Performance

Takeaway

Actions