--- Summary:
- Today we’re launching SWE-Lancer—a new, more realistic benchmark to evaluate the coding performance of AI models.
- SWE-Lancer includes over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD total in real-world payouts.
- https://t.co/c3pFcL41uK— OpenAI (@OpenAI) February 18, 2025
--- Full Article:
Author: OpenAI Profile: https://twitter.com/OpenAI Source: https://x.com/OpenAI/status/1891911123517018521
--- Embedded Post (converted):
Today we’re launching SWE-Lancer—a new, more realistic benchmark to evaluate the coding performance of AI models. SWE-Lancer includes over 1,400 freelance software engineering tasks from Upwork, valued at $1 million USD total in real-world payouts. https://t.co/c3pFcL41uK— OpenAI (@OpenAI) February 18, 2025