LangChain: From Chains to Threads

LangChain is one of the most exciting frameworks to emerge in AI development. It took what was once a scattered, low-level process — managing LLM calls, chaining outputs, handling retrieval — and wrapped it in a structured, developer-friendly abstraction. It made things easier. Before LangChain, you had to stitch everything together manually, from memory management to prompt templating. Now, with a few lines of code, you could assemble an LLM-powered pipeline in a fraction of the time.

But as AI applications are evolving beyond simple workflows into fully interactive, real-time applications, cracks are starting to show. LangChain works great when you’re treating an LLM like a function in a structured data flow, but things start to break down when you try to use it for complex, stateful, interactive agents. Developers are running into issues with debugging, performance, and flexibility. Some feel it abstracts too much, taking away control, while others find that it doesn’t abstract enough, requiring manual glue code for critical features like prompt engineering and tool integration.

Most of these pain points — performance optimizations, lacking documentation, debugging improvements, API flexibility — will be addressed over time. But there’s a deeper issue, one that may be harder to fix.

LangChain Was Forged in the Data Pipeline World

LangChain wasn’t designed in isolation — it was built in the data pipeline world, where every data engineer’s tool of choice was Jupyter Notebooks. Jupyter was an innovative tool, making pipeline programming easy to experiment with, iterate on, and debug. It was a perfect fit for machine learning workflows, where you preprocess data, train models, analyze outputs, and fine-tune parameters — all in a structured, step-by-step fashion.

When LLMs arrived, they were slotted into this existing pipeline mindset. At first, they were treated just like any other step in an ML workflow. You pass some text into an LLM, get an output, maybe filter or analyze it, and move on to the next step. The integrations were built fast. Now, with LangChain, you could call an LLM inside your Jupyter notebook and easily chain together multiple LLM-powered steps. And at first, it was amazing.

But LLMs aren’t just another step in a pipeline. They aren’t image classifiers or search indexes. They are interactive reasoning engines, capable of engaging in open-ended conversations, dynamically adapting to inputs, and maintaining context over long interactions. And that’s where the chain model starts to feel limiting.

Applications Are Not Built as Chains

Real-world applications don’t follow a simple chain of execution. They aren’t even structured as directed acyclic graphs. Instead, they are event-driven systems, where multiple subsystems interact concurrently and respond dynamically to user input.

Modern applications — whether it’s an AI-powered chatbot, a search assistant, or a customer service agent — aren’t built like an ML pipeline. Instead, they’re structured around threads, stateful interactions, and concurrent processes. A request comes in, background tasks execute, a database is queried, and new information is surfaced dynamically. These are complex, multi-layered architectures where LLMs are just one component of a larger system.

LangChain’s chain-based abstraction works well for structured LLM workflows, like document processing or batch summarization. But as soon as an agent needs to track context, handle multiple concurrent tasks, or adapt dynamically to changing inputs, the model becomes a constraint rather than an enabler.

This is why we’re already seeing developers look for alternatives. Goose, a new agent framework, wasn’t built in LangChain — it was written in Rust, optimized for speed and scalability, with a focus on real-time AI applications. The fact that developers are already reaching for lower-level, more application-friendly architectures suggests that the limitations of LangChain’s chain model are becoming more apparent.

What Would an AI Application Framework Look Like?

I’m not sure. AI development is moving so fast that anything we define today might be obsolete in a few months. But if we step back and think about the challenges developers face when building AI-powered applications, a few key ideas stand out. These aren’t definitive answers, but they point toward the kinds of abstractions that might be needed to move beyond the limitations of chain-based frameworks.

LLM-Managed Prompt Optimization

Right now, most AI applications rely on handwritten, static prompt templates, but this approach is fragile. LLMs are absurdly sensitive to minor variations in phrasing, word order, and formatting, often producing wildly different outputs for nearly identical instructions. Developers spend an unreasonable amount of time tweaking prompts manually, even though techniques like DSPy already exist to allow models to iteratively optimize their own instructions through self-play and reinforcement learning. A modern AI framework should treat prompts as dynamic artifacts, allowing LLMs to refine their own instructions over time instead of requiring manual intervention.

Seamless LLM-Agnostic Integration

A proper AI framework should be model-agnostic, seamlessly supporting OpenAI, Anthropic, Mistral, and fine-tuned proprietary models without major architectural changes. The AI ecosystem is evolving too fast for developers to lock themselves into a single provider, and switching between models should require minimal code changes. APIs should be abstracted in a way that makes model selection flexible, allowing applications to test multiple providers and dynamically switch based on cost, latency, or accuracy.

Built-In Auditing and Observability

AI applications are inherently unpredictable, and developers need deep visibility into how models behave in production. LangSmith already does a great job of providing structured logging, tracing, and analytics for LLM interactions, enabling developers to inspect failure patterns and debug issues effectively. Any AI framework should offer built-in observability tools, capturing input-output mappings, latency metrics, and contextual reasoning logs to ensure model performance can be systematically improved.

Robust Tool and API Integration

LLMs are not powerful in isolation — their real value comes from invoking external tools, searching databases, calling APIs, and interfacing with structured knowledge sources. A robust AI framework should provide first-class support for tool integrations, making it easy to define, test, and validate API interactions. Implicit function bindings should be introspectable and debuggable, ensuring that AI agents reliably interact with external systems without requiring excessive boilerplate.

Streaming Output as a First-Class Feature

Traditional application frameworks assume a request-response cycle where a function is called, a JSON response is returned, and processing continues. AI applications don’t always work this way — LLMs frequently stream responses token by token, requiring a framework to handle incremental processing rather than blocking execution until the full response is available. Streaming is particularly important for chat interfaces, multi-agent interactions, and real-time summarization tasks, where downstream processes need access to partial results before the full response has completed. Any AI framework should make streaming a core abstraction, ensuring that applications can handle real-time token-by-token processing naturally.

Intelligent Retry Mechanisms

Unlike traditional API calls, where retry logic is simple (e.g., if the request fails, retry with the same input), LLM calls are far more complex to retry. Failures in AI applications often result in partially incorrect or low-confidence responses, meaning a simple retry isn’t enough. Instead, applications may need to evaluate the output before deciding to retry, sometimes requiring another LLM to assess whether the result meets certain quality thresholds. A robust AI framework should provide built-in support for failure detection, adaptive retry logic, and self-correcting mechanisms, allowing models to recognize and recover from mistakes dynamically.

The Real Question: Who Evolves First?

LangChain is at a crossroads. Either it evolves into a true AI application framework, or existing application frameworks will integrate LLM abstractions and render it unnecessary. The next phase of AI development isn’t about chains — it’s about event-driven, stateful, and streaming-capable architectures.

And the real question is: Who will get there first?