IBM Research Published: May 19, 2025 I had an amazing time at IBM in 2025 summer developing benchmark for multi-turn conversations to evaluate how effectively AI agents reason over documents and invoke external tool use.Share on Twitter Facebook LinkedIn Previous Next