AI Resource Hub
A curated, no-fluff AI-related index of live dashboards, model cards, evals, policy, safety, chips/energy, and price signals I use to separate hype from reality. Updated: 18 Aug 2025.
TL;DR
- Start with Live model leaderboards & release notes for what’s shipping now.
- Use incidents & safety trackers to stress-test adoption plans.
- Watch compute, chips, energy, and GPU prices for where supply is heading.
- Cross-check claims with benchmarks, system cards, and official model cards.
How to use this page
- Bookmark it
- Scan the “Live” sections first (Leaderboards, Release notes, GPU price signals).
- Click the 🇦🇺 section for local nuance, regulators, and datasets.
Live leaderboards & evals (what’s hot right now)
- LMSYS Chatbot Arena (live H2H) — crowd-sourced, quick reality check on model quality
- SWE-bench Verified — end-to-end code-fixing benchmark with unit tests
- HELM (Stanford CRFM) — broad evals across tasks with methodology notes
- MLPerf — hardware/system performance (vendor-submitted)
- ProphetArena — live probabilistic forecasting leaderboard for LLMs
- FutureBench (leaderboard) — evaluates agents on predicting real-world future events
- Artificial Analysis — Intelligence & leaderboards — compare models by intelligence, price, speed & latency
How to use: Arena = “vibes + breadth”, MLPerf = “hardware truth”, SWE-bench = “agentic coding realism”, ProphetArena/FutureBench = “can it forecast?”, AA = “how smart + how much?”.
Model release notes & changelogs (source of truth)
- OpenAI — API changelog
- ChatGPT release notes
- Anthropic — release notes overview · API · System prompts
- Google — Vertex AI GenAI release notes
Tips: Read release notes before the blog hype; they list deprecations, limits, and pricing changes.
Official model cards & open weights
- Meta — Responsible Use & model cards
- Mistral — models overview
- DeepSeek-R1 paper (reasoning via RL)
- Qwen / Tongyi model family
Tips: sanity-check safety scopes, context limits, modalities, and licence constraints.
Incidents, red-teaming & security
- AI Incident Database (AIID)
- MIT AISI Incident Tracker
- OECD AIAAIC Repository
- OWASP GenAI
- Embrace the Red: Prompt Injection Hacks
Why it matters: Helps plan safety and alignment protocols and quantify risks
Policy, standards & governance
- EU AI Act — official text (EUR-Lex)
- NIST AI RMF + GenAI Profile
- ISO/IEC 42001 — AI management system
- UK AI Safety/Security Institute — evaluations
Use: map internal controls and vendor due-diligence to internal controls.
Compute, chips & energy (follow the supply)
- Epoch AI — compute trends & database — training compute, parameters, datasets
- NVIDIA Blackwell · AMD Instinct · Intel Gaudi
- U.S. BIS export controls
- AEMO — NEM data · 2024 Integrated System Plan
How it helps: chips + grid constraints often explain model availability and API limits better than press releases. Currently, model intelligence is directly tied to increases in compute and chip increased availability and innovation.
GPU price signals (live)
- RunPod pricing — on-demand GPU rates (H100/H200/4090 etc.)
- Vast.ai pricing — marketplace rates (spot/interruptible)
- GPUs.io · ComputePrices · GPUCompare
Watch: falling rental prices can pre-signal “capacity relief” and cheaper fine-tunes.
Benchmarks to watch for agents & reasoning
- ARC Prize (ARC-AGI-2) — human-easy, AI-hard abstraction tasks
- Kaggle leaderboard
- GAIA, MATH, AIME, AgentBoard — see HELM index for roll-ups
Rule of thumb: prefer evals with transparent task lists, cost accounting, and reproduction kits.
Legal dockets (copyright/IP reality)
Why here: legal direction shapes training data access, indemnities, and enterprise risk posture.
Sustainability & emissions
- MLCO2 Impact — rough but useful carbon estimates for ML runs
Use to back-of-the-envelope the footprint of training/fine-tune plans.
🇦🇺 Australian perspective
- DISR — AI adoption & ecosystem · Critical technologies
- OAIC · eSafety · ACCC
- National AI Centre (CSIRO)
Local edge: align deployments to APPs (privacy), safety guidance, and critical infrastructure constraints.
“Follow along” — trustworthy research & market primers
Contribute a link
Spotted a must-have resource or a broken link? Ping me at my LinkedIn or X accounts.