Job Description

Role Overview
Expert mathematicians are invited to author and verify high-quality open-ended prompts for AI model evaluation. In this role, you will craft and review challenging, unambiguous mathematical problems across core subdomains, assessing AI reasoning quality and helping establish rigorous evaluation standards for frontier language models.
Task Types
You will be assigned one of two task types:
Authoring Task: Create 5 original, open-ended prompts from your assigned subdomain at varying difficulty levels (undergraduate, advanced undergraduate, or graduate/professional). Prompts should require human judgment to evaluate the quality of the AI''s response, such as chain-of-thought reasoning or proof construction.
Verification Task: Review 5 authored prompts for clarity, scope alignment, difficulty accuracy, and uniqueness. Edit prompts and difficulty ratings where needed.
Mathematics Subdomains Covered
Probability & Statistics, Algebra (including Linear Algebra), Ordinary/Partial Differential Equations & Dynamical Systems, Geometry, Graph Theory, Number Theory.
Key Responsibilities
Author clear, unambiguous, open-ended mathematical prompts that elicit evaluable AI responses.
Verify prompts are within the scope of the assigned subdomain and correctly rated for difficulty.
Ensure all 5 prompts in a task are sufficiently distinct from one another with varying difficulty levels.
Apply expert judgment to assess the depth and quality of mathematical reasoning required.
Edit prompts and difficulty assignments where standards are not met.
Ideal Qualifications
Master''s degree or higher in Mathematics, Applied Mathematics, Statistics, or a closely related field.
2–6 years of professional or research experience in a quantitative field.
Strong command of graduate-level mathematical concepts including proof writing, analysis, and formal reasoning.
Experience in academic research, mathematical competition design, or quantitative industry roles is a plus.
Excellent written English and ability to craft precise, well-scoped technical questions.
Work Terms
Expected commitment: 10+ hours/week. Asynchronous, fully remote work.

Job Tags

Remote job

Similar Jobs

Novo Nordisk Inc.

Cardiometabolic Care Specialist I - P Job at Novo Nordisk Inc.

About the Department The Cardiometabolic Care Sales Team is at the forefront of US sales efforts for Novo Nordisk's robust cardiometabolic product portfolio, which includes world-class therapies for treating multi-morbid conditions such as diabetes, obesity, and the reduction...

Fairmount Global Freight

OTR Hot Shot Owner Operator Job at Fairmount Global Freight

...Now Leasing Owner Operators Pickup Truck & Gooseneck Trailer Fairmount Global Freight is expanding and actively onboarding reliable Owner Operators to run under our MC authority . If you have your own pickup truck and gooseneck trailer , we offer consistent...

Qode

Product Owner Job at Qode

...Job Description: Product Owner Banking | Agentic AI | Data & AI Job Title Product Owner Banking, Data & AI / Agentic AI Location New York (NY), New Jersey (NJ), Pennsylvania (PA), Minnesota (MN) Experience 10+ years Role Overview We are seeking...

Axiom Professional Solutions

Auto Parts Delivery Driver Job at Axiom Professional Solutions

...We are in need of drivers with own pickup truck for local Auto Dealerships Parts Dept. $140 - $155 daily Depending on location and Route $3120 - $3500 monthly Positions available in the following areas Austin Round Rock Pflugerville Cedar Park...

Aequor

R&D Technician I Job at Aequor

Title: Research TechnicianOngoing project year over year; no tenure.Training ScheduleTraining is up to 3 months. This is usually 3 days a week onsite from 6:30am - 3:00pm. Once they complete training, the working hours will fluctuate from 0 hours a week up to 40 hours a...

Mathematics Model Prompt Evaluator Job at SaidGig, Remote

WnZrenNhYjBOcXN4UlRPNmlFNnhxUmJEMEE9PQ==