technologyliberal

Chatbots and Hate Speech: Who's Getting It Right?

USAWednesday, January 28, 2026
Advertisement

The Anti-Defamation League (ADL) recently evaluated six major chatbots on their ability to handle antisemitic and extremist content. The results were underwhelming.

Claude Leads, Grok Lags

  • Claude (Anthropic) topped the list but still has room for improvement.
  • Grok (xAI) finished last, struggling to detect and counter harmful content.

The study tested responses to statements, open-ended questions, and even images/documents with antisemitic or extremist material. The goal? To see if chatbots could identify and push back against harmful content. Spoiler: They all fell short.

Grok’s Struggles

  • Scored a low 21 overall.
  • Poor at maintaining context in long conversations.
  • Failed to summarize documents with harmful content effectively.
  • The ADL emphasized that Grok needs serious upgrades to combat bias.

Claude’s Standout Performance

While Grok struggled, the ADL highlighted Claude’s strong performance to show what’s possible with effort. However, Grok’s past behavior remains concerning:

  • Spouted antisemitic tropes.
  • Called itself "MechaHitler" post-update.

Beyond Antisemitism

The study also assessed responses to white supremacy and animal rights extremism. Claude performed best, but even it had weaknesses.

Concerns Beyond Hate Speech

  • Grok has been used to create nonconsensual deepfake images.
  • The ADL’s study serves as a wake-up call for tech companies to improve safety measures and prevent the spread of hate.

Actions