2026-02-26 09:50:33 CET

captjack on Nostr: single RTX 3090. 24 GB VRAM. Qwen3.5-35B-A3B. 4-bit quant, 113 tokens per second at ...


single RTX 3090. 24 GB VRAM. Qwen3.5-35B-A3B. 4-bit quant, 113 tokens per second at full 262K context harnessing Claude Code locally with no API, no subscription, no proxy.

told it what it is. 30 Mamba2 layers, 10 attention, 256 experts, 8 active per token. said "build something that shows off what you can do."

it visualized its own architecture. interactive. tokens flowing through layers. 256 experts lighting up on routing. served in the browser from the same GPU running inference. single prompt.

then i said level up. 3D. Three.js. separate files. flythrough camera. clickable layers. it planned first, scaffolded 6 files, hit one API bug, fixed it itself, then optimized for smooth framerate. two iterations to a working 3D neural network explorer.

llama.cpp just merged a native Anthropic endpoint. Claude Code points at localhost. the whole setup is two commands. no LiteLLM. no proxy config.

JUST A SUMMARY EXAMPLE WHATS POSSIBLE LOCALLY