Are there any open models that can actually compete with proprietary ones like GPT 5.5 Extended Thinking or Claude Opus 4.7? I am getting really good results with those in their chat interfaces for coding tasks. They sometimes spend 30-45 minutes working on my task and have an internal container they are doing tool calls on, like cloning a repository and compiling their code, and can find online documentation. Their answers are very good and usually correct for very complex tasks requiring specific protocols.
So I would like to know how well we can replicate this using open models since I want more control over how it runs, and privacy. Do any of you hook in agentic capabilities into your local models? How do you do it, and which models give you good results?
Pretend I have unlimited resources (local llama.cpp, sufficient fast storage/memory, and unlimited time to wait for a good response).


I run a quant of Qwen 35B A3B (
Qwen3.6-35B-A3B-GGUF:UD_Q4_K_XL) at the moment, using Opencode and llama.cpp. I’m getting useful work out of it - but it’s of course not Claude. My hardware is a 5060Ti with 16GB VRAM and then ~20GB or so of system mem is getting used as well.It’s important to put boundaries on less capable models though, so I have two plugins in Opencode as well that really makes a big difference to the results:
@tarquinen/opencode-dcp@latestandsuperpowers@git+https://github.com/obra/superpowers.git.I want to work in small steps with good control over what the models do so it’s not very similar to what you describe with just having them run away for half an hour and do everything.