Bias Check for Smart Vision‑Language Models

Large vision‑language models are becoming increasingly sophisticated, yet they still exhibit biases toward certain demographic groups.
Existing tools for detecting these biases were limited in both size and scope, prompting researchers to develop VLBiasBench, a new benchmark designed to fill that gap.

Coverage

The benchmark addresses nine core bias themes:

Age
Disability
Gender
Nationality
Looks
Race
Religion
Job type
Wealth level

Additionally, it investigates two mixed categories—race with gender and race with wealth—to understand how intersecting identities influence model performance.

Dataset Construction

Images: Generated by a powerful image synthesis engine, producing nearly 47,000 pictures tailored to each bias scenario.
Questions: Paired with two formats—open‑ended and multiple choice—to capture a wide spectrum of model responses.
Total Pairs: Over 128,000 unique image‑question combinations.

Evaluation

The team evaluated fifteen free models and two commercial ones, uncovering unexpected bias patterns that had previously gone unnoticed. These findings underscore the necessity of comprehensive testing.

Availability

The full test kit, complete with instructions, is publicly available online for anyone to download and use.

Coverage

Dataset Construction

Evaluation

Availability

Actions