Reflections on the AI Engineering World Fair 2025

09 Jun, 2025

Last week's AI Engineering World Fair was another great event for those of us building AI systems. It revealed where the industry currently stands and highlighted how we are collectively grappling with the realities and potential of artificial intelligence.

Several themes particularly resonated with me - here are some reflections on immediate implications for both startups and enterprises, with an Australian context in mind..

AI Engineer World Conference 2025

1. Mo' Models / Mo' Context

A key insight, emphasised by Artificial Analysis and echoed by leaders from OpenAI, Google, and Microsoft, is the evolving landscape of AI models. We're seeing a diverse model ecosystem emerge, balancing trade-offs in generalisation vs. specialisation, cost, speed, and reasoning.

While the pursuit of scaling continues, there's a notable shift toward context-driven solutions. Bespoke agents designed for specific sectors like law, finance, healthcare, and accounting are increasingly harnessing proprietary data to drive innovation, efficiency, and enhanced user experiences.

Technologies such as GraphRAG and retrieval-augmented generation underline the value of integrating structured data and context layers around foundational models, significantly boosting performance and reliability.

Implication: Organisations should prioritise their domain-specific contextual data as a competitive advantage. Success hinges on effectively curating, owning, and leveraging this data rapidly and efficiently. This is also true of Australia, where many of our products / markets work off bespoke context (e.g. remote living vs urban living).

2. AI Teams Are Learning To Work Differently

The conference discussions indicated evolving organisational approaches for AI teams. Factory AI’s emphasis on documentation and context preparation, alongside Graphite’s insights into evolving CI/CD practices, exemplifies the broader shifts in AI development processes.

AI development and traditional software engineering are increasingly intertwined. Continuous Integration and Continuous Delivery (CI/CD) processes are adapting to treat AI outputs—such as prompts, agent behaviours, and evaluation suites—as first-class deliverables.

The rise of the AI Product Owner role highlights a critical shift as AI moves from experimental to core business functionality. Teams are expanding traditional agile methodologies (like the two-pizza model) to incorporate AI "members," underscoring the necessity of meticulous context documentation and robust knowledge management practices.

Australian Perspective: This aligns closely with our local experience; planning, documentation, experimentation, evaluation, and productionisation of AI systems are undergoing significant transformation, driven particularly by opportunities like parallelisation (see below for more). There's a faster way to build value for organisations, powered by a new way of working and new tooling.

3. Agentic IDE's Are Mainstream But Agentic Coding Is Not There Yet

Agentic coding—illustrated by tools such as Cursor, Windsurf, Google Jules, OpenAI Codex, and FactoryAI—holds immense potential. The vast majority of developers are now using AI-enabled IDEs. However, Agentic Coding (e.g. Codex, Jules etc) remains at an early stage, reminiscent of GPT-3's initial launch. While these autonomous coding agents are demonstrably effective at certain things (e.g. simple bug fixing) - they require careful scoping, management, and rigorous oversight.

The way forward: Companies should adopt agentic coding incrementally, starting with narrowly scoped tasks, while gradually extending autonomy under robust governance and evaluation frameworks. Thinking carefully ahead to security (e.g. internet access for environments) is key. It's one thing to build non-critical PoCs with this tooling, another to expose enterprise repos to external parties.

4. Integration is All You Need..

The Model-Context-Protocol (MCP) emerged prominently at this year's conference as an emerging critical standard for enabling seamless integration of diverse AI tools and systems. Companies like Anthropic, Google, Microsoft, and numerous startups emphasised advancements in MCP, showcasing significant improvements in interoperability and protocol maturity.

Recent enhancements announced at the conference, including streamable HTTP conversations and improved authentication and security, signify MCP’s rapid evolution. Looking forward, standardised MCP implementations will enable smoother tool integrations, reduced complexity in agent orchestration, and improved observability—benefiting both developers and enterprise adopters.

Future considerations: MCP and tool integration is potentially the single biggest lever for companies to leverage AI. The LLMs today are good enough for a wide range of use cases - MCP and tool integration protocols provide new ways to accelerate value creation and return on investment. Enterprises investing early in MCP standards and infrastructure, allowing for security, have the potential to gain a strategic advantage. Connecting AI to internal systems offers a safer pathway to building value.

5. Evals! Evals! Evals!:

One of the biggest pain points of the last 12 months, is building and managing the right set of evaluation metrics across AI systems. With data, models, tooling changing constantly, and a need for enterprise to provide observability / auditability to satisfy a range of legal, regulatory, risk needs; evaluation methodologies and tooling are vital.

It was good to see a range of thinking, tooling and patterns emerge to help with this. We seem to be off the ground floor and a common language is emerging. Key themes included automated evaluation pipelines, hierarchical evaluative models, and the concept of evaluations as an integral part of CI/CD processes. Companies such as Weights & Biases and Braintrust.dev highlighted innovative approaches using AI to augment manual evaluation processes, enabling faster iteration and improved accuracy.

Notably, the community emphasised the idea of evaluations evolving into standardised "unit tests" for AI outputs. This shift underscores the necessity of proactive monitoring to mitigate model drift and ensure continuous alignment with business objectives.

Implications: An evaluation / supervisory strategy for AI systems are mandatory for enterprise, and should start at PoC / Pilot phases and carry into production systems. The tooling and patterns are emerging now to accelerate the visibility / monitoring / controllability of these systems. Evaluation leads to controllability which leads to reliability.

6. Parallelisation: The Next Big Unlock for Teams?

One theme that appeared to me as a meta-theme was the concept of parallelisation, and its potential as a transformative capability. We saw impressive demonstrations and discussions from teams such as Dagger, Morph Labs, OpenAI, Google Jules, and Factory AI. They each showcased how parallelisation significantly reduces cycle time for engineers and boosts exploratory capacity. The concept here is that, with the right infrastructure, tooling and processes, there is a potential for developers and their AI to trial multiple variations of any target outcome in parallel (and potentially cheaply as the cost of LLMs comes down). The ability to spin up environments and systems at will to trial variations (e.g. product variations, coding / architecture variations) to test winning ideas is a major opportunity. The ability for engineers to fix and execute more trivial bugs / tickets while on the go, represents an opportunity for major productivity uplift.

While challenges around merging, arbitration, and cost control persist, these appear solvable to me through disciplined engineering and strategic infrastructure investments as we look to the next 6-12 months of development, in what is still a nascent industry.

I was also encouraged by the trend of parallelisation to help with technical debt and to improve user experiences more quickly (e.g. web accessibility)

Prediction: Parallelisation is not ready today, but is a key trend to watch, and I would bet on it becoming a foundational AI concept for most teams and their architectures, mirroring the transformative impact CI/CD has had on software delivery. Early investment in orchestration frameworks, cost governance, and evaluative merging processes for development / product focused companies could offer substantial competitive advantages.

Big Questions for the next 6 months

Given the rapid pace of AI advancement, there are a number of big questions I am paying attention to:

How will new regulations shape the AI landscape?
What evaluation and auditing standards will become industry norms?
Will genuine interoperability emerge, or will vendor lock-in persist, across models and AI systems?
How can we ethically and effectively manage increasingly autonomous AI, especially in sensitive sectors?
How do we address the environmental impacts of growing AI workloads?
How much more will the role of engineer / product owner evolve as software development is enabled by AI?

Links to Conference

Subscribe to my blog via email or RSS feed