HITL in AI Agents: Framework Comparison Guide

As I build agents every day, I get to see Agentic Frameworks evolve different approaches to solve the same core issues of agentic orchestration. Fundamentally, the differences come down to how the framework designers think about the common building blocks of an agentic system. Unfortunately, they all come with disparate terminology and adopt jargon that makes this seem more complicated than it really is. I feel this is creating a bit of a decision block, especially in beginners. This series of blogs and videos is my attempt to surface the patterns underlying all agentic systems.

This post is a companion to a video I posted a couple of days ago, I encourage you to watch it!

Here’s the experiment setup

I implemented the same agentic system using three different Frameworks:

LangGraph
OpenAI’s Agents SDK
Google’s Agent Development Kit (ADK)

In all cases, the agent uses a “supervisor” architecture, where a single agent receives most user prompts, and ultimately decides whether to delegate a task to other, more specialized agents. In this case, I have one Google Agent, capable of reading and sending emails. Also, a Slack agent, capable of reading and sending messages on Slack. I enforce explicit HITL approval in all “send” tools.

And of course, since these tools are real integrations, I implemented them using Arcade.dev

I intend to add more frameworks to the list, but reaching 3 is enough to see the emerging pattern of agent orchestration. It’s also a nice setup for a short and more thorough comparison.

I will compare these in multiple aspects, but the first is Human-in-the-Loop.

What is Human-in-the-loop (HITL)?

As with many things in the software engineering and computer science world, this is one of those terms that has a flexible definition depending on the context.

Wikipedia defers to the DoD’s 1998 definition:

> “A model requiring human interaction”

Which basically means pretty much anything that you can imagine. To have an useful definition in the context of agentic systems, I want to provide a “hard” definition of HITL:

> An autonomous model with steps that mandate human interaction

Not extremely different from 1998, but now we can at least say that if we mark a tool as “needs human interaction before running” and the agent doesn’t ask for permission to run the tool, we have a problem.

How does each framework approach HITL?

Fortunately it is possible to implement a strict enforcement of HITL flows in all the frameworks I tested. However, they have unique approaches to it.

What’s pretty universal about how the frameworks approach this is that they always involve a tool that will get information from the (human) user. This gives us a clue into the “natural” feature of LLMs that will enable HITL constructs: function-calling (i.e., tools). So at the core of it, the question can be mapped to how easy (or hard) it is to control the flow around a tool call.

Google’s Agent Development Kit

This framework approaches this with callbacks. HITL is not documented properly at the time of writing, however. Their repo does have a sample called human_in_loop, but I don’t consider this to fit within my definition above, as the agent could hallucinate and simply call the function. The suggested approach there is “enforcing” a call to ask for approval using prompt engineering.

What I like about their approach is the simplicity of it. Callbacks allow you to “intercept and control” the flow of information based on the context before, and after the tool call. If you do your checks on the before callback, return None and the real tool will be invoked, and continue with the usual flow. If you want to intercept, you can return something else, like a string or a dictionary, and that will be considered as if it comes from the tool.

What I don’t like about this is that I need to handle the “marking” of the tools outside of the agentic orchestration code. If I wanted to have a specific before callback for each tool, I need to handle that routing myself. Not great ergonomics in my opinion.

OpenAI’s Agents SDK

This framework does not support control flow very well. You can enforce HITL by code injection. This is, creating a custom function tool that is compatible with the Agents SDK, and then wrapping the bound on_invoke_tool function to manually wrap calling that into your own control flow function. On top of this inconvenience, it’s awkward to return something from the callback as if it came from the tool.

What I like about their approach is that it forces you to learn Python more deeply (if I didn’t know about functools this post would have been very different).

What I don’t like about this is that I consider control flow an essential part of designing agentic systems. I know we want to have more autonomous agents that do all the useful things and everything, but if I’m deploying something to production it better allow me to say “WAIT, DO NOT SEND THAT EMAIL” without me having to jump through too many hoops.

LangGraph

LangGraph is the only framework (so far) that has specific documentation for HITL. Their approach to this problem relies on interrupts and the underlying idea of a graph state that can be resumed at any point in time. Those are sophisticated enough that I won’t explain them thoroughly here, but I strongly recommend reading about them in the LangGraph docs.

This is my favorite approach by a lot. I believe interrupts are the correct abstraction to use for this (they are a type of explicit control flow primitive). I also believe graphs are the correct modeling tool for multi-agent systems. Of the three implementations, this one was the easiest to implement because I simply followed the docs and tada! I had a working agent at the other end.

What I don’t like about LangGraph/LangChain is that I think there are too many layers of abstractions in the framework, and it feels verbose and slightly bloated. However, this is not what I’m evaluating in this post, so I’ll leave that rant for another time.

So, which one is the best framework then?

Well, like many things in life: ✨it depends✨

If you’re willing to ship to production today, definitely LangGraph. If you’re exploring and learning, I’d recommend Google ADK (for now). It has the right ideas and the implementation is in such early stages that it’s easier for beginners to understand the architecture behind the framework, and “see” the patterns more clearly. If you want to explore more, give OpenAI Agents SDK a chance as well, it’s not amazing for human-in-the-loop, but it is valuable to see how they approach agentic orchestration.

Try it today!

The code and resources for this experiment are open-source.

You will need:

An Arcade.dev API Key
An AgentOps API Key for tracing and observability
Clone the repo.

Happy building!

Resources:

Full tutorial breakdown: https://youtu.be/VgQp-kMAt90
Grab the code: https://github.com/ArcadeAI/framework-showdown
Try Arcade.dev: https://try.arcade.dev/register
Get AgentOps observability: https://www.agentops.ai/

How to choose the best Agentic Framework, Part 1: Human in the Loop

Here’s the experiment setup

What is Human-in-the-loop (HITL)?

How does each framework approach HITL?

Google’s Agent Development Kit

OpenAI’s Agents SDK

LangGraph

So, which one is the best framework then?

Try it today!

Resources:

RECENT ARTICLES

Designing SQL Tools for AI Agents

Building MCP Together: Arcade's Contribution to Secure Agent Auth

Production-Ready MCP: Why Security Standards Matter for AI Tool Infrastructure

Get early access to Arcade, and start building now.