Qwen 3.5 with Ollama performing poorly locally?
opencode with ollama's qwen 3.5 is so dumb?
I installed ollama's qwen3.5 as my model for opencode (to run things locally) but my experience with it was horrible.. when switching to plan mode qwen3.5 just keeps on saying, user switched to plan mode, and then asking me "how can i help you?"
did anyone succeed in running models locally such that it performs similarly to Big Pickle?
Excellent! I found very relevant information. Now I have a clear picture of the issue. Let me compile my findings:
##Summary
Based on the code review and community feedback, I can explain what's happening with your Qwen 3.5 + Ollama setup in OpenCode's plan mode:
###The Core Issue
The problem you're experiencing isn't directly a bug in OpenCode itself, but rather a fundamental mismatch between how Qwen 3.5 is configured in Ollama and how the model was trained. When you switch to plan mode, the model gets confused because:
-
Tool Calling Format Mismatch - Ollama's Qwen 3.5 implementation uses the wrong tool-calling format. According to a detailed investigation I found (GitHub issue #14493 in the ollama repo), Qwen 3.5 was trained on XML-style tool calling (
<function=name><parameter=key>value</parameter></function>), but Ollama sends it JSON-style format instead. This causes the model to hallucinate responses like "how can I help you?" instead of properly executing plan mode instructions. -
Thinking Support Issues - Qwen 3.5 is trained with thinking capabilities (
enable_thinking=true), but Ollama's implementation has bugs in how it handles thinking blocks (<think>/</think>), especially with multi-turn conversations in plan mode. -
Missing Penalty Sampling - Ollama's Go runner (where Qwen 3.5 is forced) doesn't implement repetition penalties that Qwen 3.5's model card recommends (
presence_penalty=1.5), leading to repetition loops.
###Why It Performs Poorly vs "Big Pickle"
The search results show that users who've successfully run Qwen 3.5 locally report:
- Qwen 3.5 27B (Q3_XXS quantization) works reasonably well when properly configured with llama.cpp (not Ollama), achieving 76%+ accuracy on SWE-bench
- Big Pickle (the model from OpenCode Zen referenced as "big-pickle" in the codebase) significantly outperforms local models because it's specifically optimized for agentic tool-calling and has proper instruction-following
###Recommendations to Improve Your Setup
-
Switch from Ollama to llama.cpp - Based on the testing data, llama.cpp handles Qwen 3.5's tool-calling better than Ollama:
- Use Qwen 3.5 27B (smaller, more reliable) rather than 35B
- Use Q3_XXS or Q4_K_M quantization
- Expect ~40-50 tokens/sec generation speed
-
Configure OpenCode properly for local models:
{ "agent": { "plan": { "model": "local-llm/qwen3.5", "temperature": 0.1, "topP": 1 }, "build": { "model": "local-llm/qwen3.5", "temperature": 0.55 } } } -
Wait for Ollama fixes - The tool-calling bugs I found are being tracked and may be fixed in future Ollama versions.
-
Accept hybrid approach - The most successful setups use a hybrid approach: Qwen 3.5 locally for build/implementation (where it's strong), and a cloud model (Claude/GPT) for plan mode (where it struggles). This gives you both cost savings and quality.
###Key Reference Points in the Code
The plan mode system reminder is in packages/opencode/src/session/prompt/plan-reminder-anthropic.txt and the plan mode enforcement is in packages/opencode/src/session/prompt.ts around lines 235-360.
Would you like me to help you set up llama.cpp locally, or explore other configuration options for your OpenCode setup?
END OF POST