--- Summary:

  • Today we’re launching SWE-Lancer—a new, more realistic benchmark to evaluate the coding performance of AI models.
  • SWE-Lancer includes over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD total in real-world payouts.
  • https://t.co/c3pFcL41uK— OpenAI (@OpenAI) February 18, 2025

--- Full Article:

Author: OpenAI Profile: https://twitter.com/OpenAI Source: https://x.com/OpenAI/status/1891911123517018521

--- Embedded Post (converted):

Today we’re launching SWE-Lancer—a new, more realistic benchmark to evaluate the coding performance of AI models. SWE-Lancer includes over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD total in real-world payouts. https://t.co/c3pFcL41uK— OpenAI (@OpenAI) February 18, 2025