Beyond the Hype: Building Agentic Systems for the Enterprise

10 Aug, 2025

1. Beyond the Hype

A few weeks ago, I was experimenting with Claude Code where I had a real 'this is it' moment. In just an hour, I had set up 15 agents to traverse the internet and regulatory frameworks for financial services in Australia, producing a full regulatory standards to obligation mapping and control framework for a financial services client case — work that once took me three to four weeks with three people in my last start-up. Ironically, just a week later, running a similar pattern on a complex sales pipeline report for my own company, I hit hallucinations and errors where the work would have been faster to do manually.

That’s the paradox: agentic systems are poised to transform enterprise work, but without the right guardrails, they’ll automate your mistakes and cost you time. The challenge — and the opportunity — lies in building them so they’re not just impressive in a demo, but dependable in production. Adoption has tipped — around 71% of organisations now use genAI in at least one function, and the AI Index 2025 shows sustained investment and deployment momentum (McKinsey, 2025, Stanford AI Index, 2025).

I wanted to use this blog to summarise some of the learnings my team and I have picked up building enteprise AI systems.

2. The Promise and the Pitfalls

Why agentic systems can deliver value now:
Agentic systems — AI that can plan, choose tools, take actions, and adapt based on feedback — can finally go beyond answering questions to getting things done. Some examples include:

Automating complex workflows across CRM, ERP, and SaaS tools.
Monitoring live business metrics and kicking off remediation without waiting for the next meeting or stand-up.
Surfacing “unknown unknowns” — correlations or risks a human might never spot.
Understanding nuanced conversations and sentiment (customer transcripts, internal meetings) and helping companies take stronger action (next best action in call centres, accelerating deliverables drafting or workflow post meetings etc)
Creating insights and actions across data repositories — from internal SharePoint and financial systems to risk platforms and internal communications systems like Slack.

The challenges with implementing AI systems:
Many enterprises are failing after proof of concept stage¹. The jump from it works on my laptop to it works safely in prod involves more than better prompts. You need:

People: roles that bridge AI, technology, risk, and ops.
Process: governance that empowers, but with the right guardrails. Teams that select the right use cases, and know how to implement AI into existing business processes / roles.
Systems: agentic architecture that’s observable, auditable, and adaptable — for each agent or swarm of agents working as a team on business problems.

Some of the bigger pitfalls:

Over-autonomy too soon — skipping the human-in-the-loop stage because “it looks good in testing.”
Weak grounding — agents making confident decisions on stale or incomplete data.
Opaque behaviour — no clear logs of why it did what it did.
Change fatigue — teams feeling blindsided by workflows they no longer fully control.

Some powerful AI transformation examples we are working on:

Regulatory compliance automation — AI agents can map regulatory obligations to internal control frameworks and complete assurance activities in days instead of weeks, freeing compliance teams to focus on higher-value risk assessment and policy design.
Solution Delivery Lifecycle (SDLC) — we are reimagining the solution delivery lifecycle in the financial services sector. AI has the potential to augment every stage — from project management and requirements gathering to build, test, and deploy. We have seen AI systems dramatically improve quality, cut SDLC deliverable drafting time, provide new insights and speed up delivery. What was once a sequential process is becoming one where teams can work through many SDLC tasks in parallel, allowing teams to work faster, catch issues earlier, and deliver more value sooner.
Mining sector feasibility studies — we are helping mining companies accelerate capital project feasibility studies by using AI to synthesise vast technical, legal, and environmental data. These studies demand expertise across many domains, including geology, engineering, environmental and community law, risk, and regulation. Our AI agents produce faster, more comprehensive outputs that capture more risks and analysis, enabling human experts to focus on the more nuanced or specific matters.
Logistics field and back-office automation — we are working with a major logistics provider to merge image and ERP data to automate field technician installation work and on-site compliance paperwork. AI also streamlines back-office operations, compliance checks, and workflows. What was once a multi-step, multi-department process is now a seamless, real-time experience on local devices — cutting costs, improving quality, and reducing risk.
Contact centre transformation — we are working to reimagine contact centre experiences for customers and human agents. AI creates a single view of the customer, enabling faster discovery, insight and support, while enabling automation through integration with existing workflows. Increasingly, we see AI also serving as the first point of contact for lower-interaction experiences, improving efficiency while maintaining service quality for customers who wish to work with an AI rather than wait.

3. Three Practical Lenses for Getting It Right

Lens 1 — Business Benefits & Team Design

The tech is only half the story. The real magic is when business and tech teams co-design the problem and solution space. That means:

Start with a measurable business outcome — “reduce customer onboarding time by 30%” beats “try agentic AI.” Focus on the many ways agents could be tested to contribute to achieving that outcome, rather than locking into a single hypothesis that might succeed or fail in proof-of-concept. Test in the short iterations possible (ideally hours or days).
Shape the team:
- Business Owner — represents the business or technology outcome, provides subject matter expertise, and validates results without necessarily dictating the solution.
- AI Product Owner — owns the outcome and context.
- Agent Engineers — build orchestrations, tools, and safeguards. May include data scientists, system engineers, infrastructure experts, prompt engineers, or back-end developers as needed.
- Risk & Privacy Partner — signs off on scopes, manages dual-control points. This role may not always be required, but is often essential in more complex enterprise contexts.
- Ops/Platform — ensures observability, cost tracking, and rollback paths. This role may not be essential in early-stage experiments, but becomes increasingly important as projects mature into later-stage or production environments.
- UX/UI Designer — redesigns both the business process and the user interface, refining interaction patterns between users. Reimagines workflows as technology fills gaps once handled manually, while ensuring a seamless, intuitive experience for users who may not be AI experts. This role may not be critical early on, but becomes increasingly valuable as solutions mature and adoption scales.

Treat governance as empowerment — enabling teams to push boundaries while holding them accountable for delivering business outcomes quickly. They must show that proofs of concept have a clear, rapid path to a product outcome with ROI. Stay close to data that benchmarks how their system will outperform existing BAU costs or quality measures.

Take care on approach - Traditionally, enterprises have delivered projects in waterfall style, or agile digital projects with the UI and outcomes defined upfront. Given the emerging nature of this technology, we’ve found it’s better to run projects more like data projects — first securing data quality, building pipelines, then stress-testing the technology to reveal where automation and process gains deliver the most value. Only then do we define the transformed process, deciding how humans and AIs will work together. Surprises often arise over what the machine does better than the human, leading to process change. The UI and digital experience are best left until the final days or weeks during proof of concepts, which can challenge stakeholders seeking early certainty. This demands strong leadership and confidence in experienced teams.

Lens 2 — Experiment, Test, Learn, Realise the Benefit

Don’t try to automate the whole pie. Find thin slices where speed and accuracy can be measured. For each:

Golden tasks — the 10–20 examples that define “success” in your domain.
Instrumentation from day one — trace every action, input, and retrieved context.
Evaluation gates — the agent only “levels up” when tests pass (e.g. accuracy, cost takeout, and safety).

Think of this phase like building muscle memory. By running controlled reps, the system — and the team — learns how to handle edge cases before going live. Where evaluation is rigorous, we’re seeing measurable wins.

Lens 3 — Build, Monitor, and Manage in the Real World

Once you’ve proven your concept can deliver business benefits and is achievable through proof of concept / experimentation, we move into building out our production system. The question now evolves from 'can it do the job?' to 'can it do the job safely, consistently, and transparently?'

That means:

Architecture: The goal is to design systems that are secure, performant, fast to change, and simple to retire. In a fast-moving world, this ensures we can evolve rapidly without over-investing too early or risking obsolescence.
Interoperability: Interoperability is key — MCP connectors and other integration patterns keep you vendor-portable (MCP Spec, 2024, Microsoft MCP Support, 2025). Some of these patterns are new - so care is needed on security. Most models are already smart enough for most enterprise needs, so gains in raw intelligence with new model releases may not be obvious to most users. The real value lies in safely linking these increasingly agentic, longer-running models to core systems — where safe interoperability, not model IQ alone, can drive business benefit.
Knowledge you can trust: The key to agentic systems is trusted knowledge, built on the right architecture — whether RAG, GraphRAG (Microsoft Research, 2024), fine-tuning, or other patterns. Users must be able to rely on data, trace its source, and verify it when needed. Pipelines and interfaces should centre on trust, with reconciliation and validation at every stage. Without this, adoption, business value, and safety are quickly undermined.
Observability: Great systems have great observability, and in AI this must be shared by business and tech teams. Architecturally, we need to track system performance and evaluations at both model and system levels for performance and risk. On the business side, as agentic systems handle more processes, humans should monitor these AI “teams” like a supervisor: reviewing patterns, processes, inputs, and outputs. This kind of business observability — uncommon in tools like Microsoft Copilot or ChatGPT — is vital for enterprise adoption (LangSmith, Langfuse).
Governance frameworks: Align with NIST AI RMF or ISO 42001, and in Australia, OAIC privacy guidance + ACSC Essential Eight. These standards are strong foundations, but enterprises should also define AI controls tailored to their business and technology activities.

If your firm was hiring a person tomorrow — most companies would have a hiring, onboarding, training, performance management, and offboarding process. Successful AI adoption is more about thinking about Enterprise AI in human terms (without falling into the unnecessary trap of anthropomorphising). AI initiatives demand the same care and governance as building operations for people, to ensure AI-enabled processes and systems are productive, safe, and aligned with organisational goals.

From Flashy Demos to Trusted Digital Colleagues

Agentic systems will change how enterprises execute work, improving outcomes, reducing cost and risk. The way humans and systems interact will fundamentally change.

The companies who succeed here will be the ones who:

Anchor projects in business value and safe autonomy.
Invest early in people, process, and observability.
See governance as a feature, not a blocker.

For the first time in the history of organisational design, executives face a unique challenge. Traditionally, it’s been about managing people who work with tools; increasingly, organisational design will be about managing humans and AI systems that work together to use tools. These AI systems bring growing agency and autonomy, with the ability to dramatically accelerate business outcomes while also introducing new risks. AI adoption should not be managed as a back-office tech project — it’s a fundamental shift in organisational design.

Don’t anthropomorphise the AI, but viewing it as a team member can help clarify its value and the implementation considerations.

¹ Gartner predicts that at least 30% of generative AI projects fail after proof of concept stage (Gartner](https://www.gartner.com/en/articles/2025-trends-for-tech-ceos?utm_source=chatgpt.com).