AI's Office Work Struggles: Why Bots Aren't Ready to Replace You Yet
The Study: APEX-Agents
A recent study, APEX-Agents, tested AI models on real-world tasks from lawyers, consultants, and bankers. These tasks required the models to switch between different types of information and complete multi-step processes.
The Results
The results were not good. The best models, like Gemini 3 Flash and GPT-5.2, only scored around 24% accuracy. Most other models did even worse, scoring in the teens.
The Core Issue
The problem is not that AI is not smart. The issue is that AI struggles with context. In the real world, answers are not always clear-cut.
Example: GDPR Compliance
A lawyer, for example, might need to:
- Check a Slack thread
- Read a PDF policy
- Look at a spreadsheet
Then, put all that information together to answer a question about GDPR compliance. Humans can do this naturally, but AI often gets confused or gives the wrong answer.
Current State of AI
For now, AI functions more like an unreliable intern than a seasoned professional. It gets things right about a quarter of the time, which is not good enough for most office jobs.
Progress and Future
However, the progress is fast. Just a year ago, these models were scoring between 5% and 10%. Now they are hitting 24%. So, while they aren't ready to take over yet, they are learning quickly.
Conclusion
This is a reality check for those who think AI will replace human workers anytime soon. The study suggests that the "knowledge work" revolution is on hold until AI can learn to multitask better. For now, human workers can breathe a sigh of relief.