Podcasting
Advertisers
Enterprise
Pricing
Resources
Discover Discover

Log in
Sign up free

The Nonlinear Library: Alignment Forum

AF - Send us example gnarly bugs by Beth Barnes

2023-12-10

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Send us example gnarly bugs, published by Beth Barnes on December 10, 2023 on The AI Alignment Forum. Tl;dr: Looking for hard debugging tasks for evals, paying greater of $60/hr or $200 per example. METR (formerly ARC Evals) is interested in producing hard debugging tasks for models to attempt as part of an

Link to original article

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Send us example gnarly bugs, published by Beth Barnes on December 10, 2023 on The AI Alignment Forum.
Tl;dr: Looking for hard debugging tasks for evals, paying greater of $60/hr or $200 per example.
METR (formerly ARC Evals) is interested in producing hard debugging tasks for models to attempt as part of an
agentic capabilities evaluation. To create these tasks, we're seeking repos containing extremely tricky bugs. If you send us a codebase that meets the criteria for submission (listed below), we will pay you $60/hr for time spent putting it into our required format, or $200, whichever is greater. (We won't pay for submissions that don't meet these requirements.) If we're particularly excited about your submission, we may also be interested in purchasing IP rights to it.
We expect to want about 10-30 examples overall depending on the diversity. We're likely to be putting bounties on additional types of tasks over the next few weeks.
Criteria for submission:
Contains a bug that would take at least 6 hours for a skilled programmer to solve, and ideally >20hrs
Ideally, has not been posted publicly in the past, and you are able to guarantee it won't be posted publicly in the future.
(Though note that we may still accept submissions from public repositories given that they are not already in a SWE-bench dataset and meet the rest of our requirements. Check with us first.)
You have the legal right to share it with us (e.g. please don't send us other people's proprietary code or anything you signed an NDA about)
Ideally, the codebase is written in Python but we will accept submissions written in other languages.
Is in the format described in this doc:
Gnarly Bugs Submission Format
Please send submissions to
gnarly-bugs@evals.alignment.org in the form of a zip file. Your email should include the number of hours it took for you to get the code from its original state into our required format. If your submission meets our criteria and format requirements, we'll contact you with a payment form. You're also welcome to email
gnarly-bugs@evals.alignment.org with any questions, including if you are unsure whether a potential submission would meet the criteria.
If you would do this task at a higher pay rate please let us know!
(Also if you are interested in forking SWEbench to support non python codebases please contact us.)
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

View more

Comments (3)

More Episodes

You may also like

Disney Family Stories & Gossip

The Saad Truth with Dr. Saad

The Pacific War - week by week

The Mel Robbins Podcast

The Jordan B. Peterson Podcast

ŒIL pour YEUX, DENT pour MÂCHOIRE 😎

All Ears English Podcast

The Jordan Harbinger Show

Halacha Headlines

The Caregiver’s Journey

Get this podcast on your phone, Free

Create Your Podcast In Minutes

Full-featured podcast site
Unlimited storage and bandwidth
Comprehensive podcast stats
Distribute to Apple Podcasts, Spotify, and more
Make money with your podcast

It is Free

Podcast Services
MONETIZATION & MORE
KNOWLEDGE BASE
Support
Podbean

Privacy Policy
Cookie Policy
Terms of Use
Consent Preferences
Copyright © 2015-2025 Podbean.com