Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Send us example gnarly bugs, published by Beth Barnes on December 10, 2023 on The AI Alignment Forum.
Tl;dr: Looking for hard debugging tasks for evals, paying greater of $60/hr or $200 per example.
METR (formerly ARC Evals) is interested in producing hard debugging tasks for models to attempt as part of an
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Send us example gnarly bugs, published by Beth Barnes on December 10, 2023 on The AI Alignment Forum.
Tl;dr: Looking for hard debugging tasks for evals, paying greater of $60/hr or $200 per example.
METR (formerly ARC Evals) is interested in producing hard debugging tasks for models to attempt as part of an
agentic capabilities evaluation. To create these tasks, we're seeking repos containing extremely tricky bugs. If you send us a codebase that meets the criteria for submission (listed below), we will pay you $60/hr for time spent putting it into our required format, or $200, whichever is greater. (We won't pay for submissions that don't meet these requirements.) If we're particularly excited about your submission, we may also be interested in purchasing IP rights to it.
We expect to want about 10-30 examples overall depending on the diversity. We're likely to be putting bounties on additional types of tasks over the next few weeks.
Criteria for submission:
Contains a bug that would take at least 6 hours for a skilled programmer to solve, and ideally >20hrs
Ideally, has not been posted publicly in the past, and you are able to guarantee it won't be posted publicly in the future.
(Though note that we may still accept submissions from public repositories given that they are not already in a SWE-bench dataset and meet the rest of our requirements. Check with us first.)
You have the legal right to share it with us (e.g. please don't send us other people's proprietary code or anything you signed an NDA about)
Ideally, the codebase is written in Python but we will accept submissions written in other languages.
Is in the format described in this doc:
Gnarly Bugs Submission Format
Please send submissions to
gnarly-bugs@evals.alignment.org in the form of a zip file. Your email should include the number of hours it took for you to get the code from its original state into our required format. If your submission meets our criteria and format requirements, we'll contact you with a payment form. You're also welcome to email
gnarly-bugs@evals.alignment.org with any questions, including if you are unsure whether a potential submission would meet the criteria.
If you would do this task at a higher pay rate please let us know!
(Also if you are interested in forking SWEbench to support non python codebases please contact us.)
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
View more