AI Resource Hub

A curated, no-fluff AI-related index of live dashboards, model cards, evals, policy, safety, chips/energy, and price signals I use to separate hype from reality. Updated: 18 Aug 2025.

TL;DR

Start with Live model leaderboards & release notes for what’s shipping now.
Use incidents & safety trackers to stress-test adoption plans.
Watch compute, chips, energy, and GPU prices for where supply is heading.
Cross-check claims with benchmarks, system cards, and official model cards.

How to use this page

Bookmark it
Scan the “Live” sections first (Leaderboards, Release notes, GPU price signals).
Click the 🇦🇺 section for local nuance, regulators, and datasets.

Live leaderboards & evals (what’s hot right now)

LMSYS Chatbot Arena (live H2H) — crowd-sourced, quick reality check on model quality
SWE-bench Verified — end-to-end code-fixing benchmark with unit tests
HELM (Stanford CRFM) — broad evals across tasks with methodology notes
MLPerf — hardware/system performance (vendor-submitted)
ProphetArena — live probabilistic forecasting leaderboard for LLMs
FutureBench (leaderboard) — evaluates agents on predicting real-world future events
Artificial Analysis — Intelligence & leaderboards — compare models by intelligence, price, speed & latency

How to use: Arena = “vibes + breadth”, MLPerf = “hardware truth”, SWE-bench = “agentic coding realism”, ProphetArena/FutureBench = “can it forecast?”, AA = “how smart + how much?”.

Model release notes & changelogs (source of truth)

Tips: Read release notes before the blog hype; they list deprecations, limits, and pricing changes.

Official model cards & open weights

Tips: sanity-check safety scopes, context limits, modalities, and licence constraints.

Incidents, red-teaming & security

Why it matters: Helps plan safety and alignment protocols and quantify risks

Policy, standards & governance

Use: map internal controls and vendor due-diligence to internal controls.

Compute, chips & energy (follow the supply)

Epoch AI — compute trends & database — training compute, parameters, datasets
NVIDIA Blackwell · AMD Instinct · Intel Gaudi
U.S. BIS export controls
AEMO — NEM data · 2024 Integrated System Plan

How it helps: chips + grid constraints often explain model availability and API limits better than press releases. Currently, model intelligence is directly tied to increases in compute and chip increased availability and innovation.

GPU price signals (live)

RunPod pricing — on-demand GPU rates (H100/H200/4090 etc.)
Vast.ai pricing — marketplace rates (spot/interruptible)
GPUs.io · ComputePrices · GPUCompare

Watch: falling rental prices can pre-signal “capacity relief” and cheaper fine-tunes.

Benchmarks to watch for agents & reasoning

ARC Prize (ARC-AGI-2) — human-easy, AI-hard abstraction tasks
Kaggle leaderboard
GAIA, MATH, AIME, AgentBoard — see HELM index for roll-ups

Rule of thumb: prefer evals with transparent task lists, cost accounting, and reproduction kits.

Legal dockets (copyright/IP reality)

Why here: legal direction shapes training data access, indemnities, and enterprise risk posture.

Sustainability & emissions

MLCO2 Impact — rough but useful carbon estimates for ML runs

Use to back-of-the-envelope the footprint of training/fine-tune plans.

🇦🇺 Australian perspective

Local edge: align deployments to APPs (privacy), safety guidance, and critical infrastructure constraints.

“Follow along” — trustworthy research & market primers

NIST, UK AISI, Stanford CRFM, Epoch, MLCommons
State of AI Report

Contribute a link

Spotted a must-have resource or a broken link? Ping me at my LinkedIn or X accounts.