Position Summary
In this remote, hourly contractor role, you will design complex, challenging mathematics problems intended to test and stretch the limits of advanced AI models, and evaluate how well those models reason through them. Tasks are intentionally difficult and time-intensive, typically requiring three or more hours each to construct, solve, and document fully. Tasks may include:
Designing original, high-difficulty mathematics problems that probe the boundaries of current AI reasoning, including problems that SOTA models are likely to solve incorrectly
Developing complete, rigorous solutions and step-by-step reasoning that serve as reference answers
Evaluating AI-generated solutions for mathematical correctness and reasoning quality, identifying errors in method, conceptual gaps, and weak justification, and explaining corrections clearly in writing
Rating and comparing AI responses based on c...
Take the next step and apply for this exciting opportunity
Apply Now