AI's Office Work Struggles: Why Bots Aren't Ready to Replace You Yet

The Study: APEX-Agents

A recent study, APEX-Agents, tested AI models on real-world tasks from lawyers, consultants, and bankers. These tasks required the models to switch between different types of information and complete multi-step processes.

The Results

The results were not good. The best models, like Gemini 3 Flash and GPT-5.2, only scored around 24% accuracy. Most other models did even worse, scoring in the teens.

The Core Issue

The problem is not that AI is not smart. The issue is that AI struggles with context. In the real world, answers are not always clear-cut.

A lawyer, for example, might need to:

Check a Slack thread
Read a PDF policy
Look at a spreadsheet

Then, put all that information together to answer a question about GDPR compliance. Humans can do this naturally, but AI often gets confused or gives the wrong answer.

Current State of AI

For now, AI functions more like an unreliable intern than a seasoned professional. It gets things right about a quarter of the time, which is not good enough for most office jobs.

Progress and Future

However, the progress is fast. Just a year ago, these models were scoring between 5% and 10%. Now they are hitting 24%. So, while they aren't ready to take over yet, they are learning quickly.

Conclusion

This is a reality check for those who think AI will replace human workers anytime soon. The study suggests that the "knowledge work" revolution is on hold until AI can learn to multitask better. For now, human workers can breathe a sigh of relief.