AI agents are becoming a promising new research direction with potential applications in the real world. These agents use foundation models such as large language models (LLMs) and vision language ...
LMArena, a popular benchmark for large language models, has been accused of giving preferential treatment to AIs made by big tech firms, potentially enabling them to game their results. When you ...
A new paper from AI lab Cohere, Stanford, MIT, and Ai2 accuses LM Arena, the organization behind the popular crowdsourced AI benchmark Chatbot Arena, of helping a select group of AI companies achieve ...