Links

What I read.

AI Progress & Benchmarks

Epoch AI ↗
Research institute investigating key trends and questions about the trajectory and governance of AI.
Artificial Analysis ↗
Independent model evaluations to understand the AI landscape and choose the best model for your use case.
ARC Prize ↗
Nonprofit dedicated to accelerating AGI development through human-calibrated benchmarks that measure the gap between human and AI capabilities.
METR ↗
Model Evaluation and Threat Research — measuring AI ability to complete long, complex tasks.
Vals.ai ↗
Independent benchmarks of LLMs for tasks that mimic real industry use cases in legal, finance, math, and more.
LifeArchitect.ai ↗
Comprehensive AI model rankings, timeline of AI and language models, and AI IQ testing research.
AI Futures Project ↗
A small research group forecasting the future of AI.
Vending-Bench ↗
Benchmark evaluating AI models on managing a simulated vending machine business over one year, testing long-term coherence and strategic negotiation.

AI Agents & Agentic Engineering

Prompting & Learning

Vibe Coding