Think of it like this: using Claude Code as the main agent, it could call another model like GLM 4.6 to offload part of the workload, delegating routine or automatable tasks to other models in order to minimize Claude's token consumption. For example, models like GLM 4.6, Gemini 2.5 Pro, Kimi K2, Qwen-Coder, and other open or paid models accessible via APIs could be orchestrated together to distribute work intelligently. Think of it as a grep-like tool: instead of running a grep command, it delegates the work directly to other LLMs and executes there. Because some of these LLMs are trusted to do their tasks well (e.g., GLM 4.6), they would only need to return summary information about what they did. Beyond that, they could directly apply changes to code and persist results. Essentially, a framework that coordinates multiple models like Claude Code does with its subagents.
It's kind of interesting this. I was literally just using AI to move static file config to config.py,
I noticed I had cfg.py. So I asked it if it recommended merging them. Here's the answer.
>The user wants to merge app/cfg.py and app/config.py. I'll advise against it. app/config.py handles static configuration from a file, while app/cfg.py manages dynamic, shared state between processes. Merging them would conflate these distinct concerns, harming code clarity. I will explain this separation of concerns to the user.
So it's really on you to decide how to collaborate. Of course it's ultimately my decision, near the top of this agentic coding session I refused quite a number of it's choices as they were adding duplicate imports, adding new threads for no reason.
>I’m looking for a framework that coordinates multiple models similar to how Claude Code uses subagents.
These do exist but I havent tried any and cant think of their name. I sub to this youtube channel that often posts this sort of thing: https://www.youtube.com/@intheworldofai
The one im waiting for is where it uses a big cloud AI to architect, but uses local for the grunt work. Whereas I can run GPT20B or Qwen3 30b and code smaller parts very effectively.