IBM Research

Published:

I had an amazing time at IBM in 2025 summer developing benchmark for multi-turn conversations to evaluate how effectively AI agents reason over documents and invoke external tool use.