Bias Check for Smart Vision‑Language Models
Large vision‑language models are becoming increasingly sophisticated, yet they still exhibit biases toward certain demographic groups.
Existing tools for detecting these biases were limited in both size and scope, prompting researchers to develop VLBiasBench, a new benchmark designed to fill that gap.
Coverage
The benchmark addresses nine core bias themes:
- Age
- Disability
- Gender
- Nationality
- Looks
- Race
- Religion
- Job type
- Wealth level
Additionally, it investigates two mixed categories—race with gender and race with wealth—to understand how intersecting identities influence model performance.
Dataset Construction
- Images: Generated by a powerful image synthesis engine, producing nearly 47,000 pictures tailored to each bias scenario.
- Questions: Paired with two formats—open‑ended and multiple choice—to capture a wide spectrum of model responses.
- Total Pairs: Over 128,000 unique image‑question combinations.
Evaluation
The team evaluated fifteen free models and two commercial ones, uncovering unexpected bias patterns that had previously gone unnoticed. These findings underscore the necessity of comprehensive testing.
Availability
The full test kit, complete with instructions, is publicly available online for anyone to download and use.