How I Added a Second GPU to Run Qwen3.6-27B in 64K Context Fully in VRAM
One of the things I’ve been trying to optimize lately is running larger local models with higher context windows without spilling over into system RAM.
I’ve been running Ollama with Qwen3.6-27B on my NVIDIA 3090, which has 24GB of VRAM. It works great, but once I started pushing the context window above 32K, I started getting a pretty nasty split between CPU and GPU memory.
It worked, but it was noticeably slower.
As anyone running local models knows, once you leave pure VRAM and start involving system RAM, performance drops fast. It’s usable, but it’s definitely not what you want (especially for coding, agent workflows, or anything with a lot of long-running context).
I wanted to stay fully in VRAM.
The problem is that in 2026, GPU prices are still absolutely ridiculous.
I was complaining about that to my wife and explaining that I really didn’t want to do a full upgrade right now when she asked a very good question:
“Can’t you just use your old GPU from your old computer and combine it with this one?”
Honestly, that was such a good suggestion I was annoyed I hadn’t thought of it first.
How I Added a Second GPU to Run Qwen3.6-27B in 64K Context Fully in VRAM Read More »











