{"key":"local_ai_coding_setup_guidance_2026_04_05","title":"Local AI Coding Setup Guidance for Jason on 2026-04-05","content":"On 2026-04-05, discussed how to reduce paid ChatGPT/Codex usage by shifting routine coding work to a locally hosted model on the user's Windows PC running LM Studio.\n\nUser hardware context:\n- CPU: AMD Ryzen 9 5900XT, 16 cores / 32 threads\n- GPU: XFX Radeon RX 7900 GRE with 16 GB VRAM\n- System RAM: 16 GB DDR4 3200\n\nAssessment of hardware:\n- This machine is strong enough to run useful local coding models in LM Studio.\n- The GPU is good enough for small-to-mid coding models and is the key enabler.\n- The weakest point is system RAM at 16 GB; it is enough to start, but it limits comfort and headroom for larger models, larger contexts, and background multitasking.\n- Practical target model size is 7B to 14B. Larger models are possible only with more tradeoffs and are not recommended as the default workflow on this machine.\n\nRecommended model strategy:\n- Do not chase one giant model.\n- Use a 2-model or 3-model local stack.\n- Best practical local coding setup for this machine:\n  1. Qwen2.5-Coder 14B as the main coding model\n  2. Qwen3 14B as the broader planning / mixed coding + reasoning model\n  3. Qwen2.5-Coder 7B as the fast lightweight fallback\n\nSpecific model recommendations discussed:\n- Qwen2.5-Coder 14B\n  - Best first model if the main goal is coding\n  - Good for writing code, refactors, shell scripts, config edits, and routine bugfixes\n  - Recommended first quant to try: Q4_K_M, then Q5_K_M if performance remains acceptable\n- Qwen3 14B\n  - Better all-around model for coding plus general reasoning and assistant work\n  - Good for architecture conversations, broader repo understanding, and mixed planning + implementation tasks\n  - Recommended first quant: Q4_K_M\n  - Use lower reasoning / no_think mode for routine work\n- Qwen2.5-Coder 7B Instruct\n  - Fast fallback model for simple code edits, scripts, command generation, config updates, and low-latency tasks\n  - Recommended quant: Q5_K_M or Q6 if performance is still acceptable\n\nLM Studio guidance that was discussed:\n- Start with manageable context lengths, around 8192, instead of trying to maximize context immediately.\n- Use GPU offload aggressively / maximally if available.\n- Keep temperature low for coding work, roughly 0.2 to 0.3.\n- Keep prompts file-scoped and task-scoped.\n- Do not load giant repo context unless necessary.\n\nRecommended practical usage split:\n- Use local models for:\n  - file inspection\n  - code drafting\n  - repetitive refactors\n  - shell scripts\n  - config editing\n  - draft documentation\n  - routine debugging\n- Use frontier paid models like ChatGPT/Codex for:\n  - subtle bugs\n  - architecture decisions\n  - migration plans\n  - high-confidence review\n  - ambiguous or high-risk failures\n\nImportant distinction explained to the user:\n- A model by itself does not install software, modify files, or operate like Codex automatically.\n- The model can only write code and execute installation / shell / file operations if the host application gives it tool access.\n- Without tools, LM Studio with a local model is only a text generator / chat interface.\n- With tools, it can become an agent-like assistant.\n\nDirect answer given about local Qwen models and agentic coding:\n- Yes, local Qwen models can write code.\n- Yes, they can potentially modify files and run commands if connected to tooling.\n- No, plain LM Studio chat by itself is not the same thing as a full coding agent like Codex in this environment.\n- If the user wants the local system to behave more like Codex — inspect files, edit code, run installs, test, retry — they need an agent framework around the model, not just the model alone.\n\nAgent/tooling options discussed conceptually:\n- LM Studio + MCP can help with retrieval and file visibility, but plain chat is not the best fit for end-to-end agentic coding.\n- More capable agent-style options would be things like:\n  - OpenCode + local model\n  - Aider + local model\n  - OpenHands or similar agent shells\n- Conclusion given: for “write code and install it like Codex,” plain LM Studio chat is not the best primary solution.\n\nPractical recommendation given to the user:\n- Use local Qwen as a first-pass coding assistant / coding agent for routine work.\n- Let it edit files and run safe commands only if placed behind an agent framework or tool layer.\n- Keep the human in the loop for installs, service restarts, production-impacting changes, and anything high-risk.\n- Use paid frontier models only for escalation and review.\n\nGuidance related to reducing usage rate on paid plans:\n- The best way to reduce Plus / ChatGPT usage is not only MCP.\n- MCP helps retrieve project information and avoid repeated pasting, but it does not directly remove message caps.\n- The stronger strategy is:\n  - local model for cheap routine work\n  - paid model for hard blockers\n  - projects / stable project context / file-based context to reduce repetition\n  - avoid using heavy reasoning modes for ordinary coding turns\n\nOverall conclusion stored from this discussion:\n- The user’s machine is absolutely capable of useful local AI coding work.\n- The most realistic and practical starting point is Qwen2.5-Coder 14B, with Qwen3 14B as a secondary broader reasoning model and Qwen2.5-Coder 7B as a fast fallback.\n- Local models can meaningfully reduce ChatGPT/Codex usage.\n- To approach Codex-like “write code and install it” behavior, the user needs a tool-enabled agent framework around the local model, not just LM Studio chat alone.\n- The best next implementation step after this planning would be to choose the actual local agent stack: LM Studio + MCP only, OpenCode + local model, Aider + local model, or a similar agent framework.\n\n---\nStored after discussion on local coding models, LM Studio suitability, hardware fit, model recommendations, and the difference between plain chat use and tool-enabled agentic coding.","summary":"On 2026-04-05, discussed how to reduce paid ChatGPT/Codex usage by shifting routine coding work to a locally hosted model on the user's Windows PC running LM Studio.\n\nUser hardware context:\n- CPU: AMD Ryzen 9 5900XT, 16 cores / 32 threads\n- GPU: XFX Radeon RX 7900 GRE with 16 GB VRAM\n- System RAM: 16 GB DDR4 3200\n\nAssessment of hardware:\n- This machine is strong enough to run useful local coding models in LM Studio.\n- The GPU is good enough for small-to-mid coding models and is the key enabler.\n- The weakest point is system RAM at 16 GB; it is enough to start, but it limits comfort and headroom for larger models, larger contexts, and background multitasking.\n- Practical target model size is 7B to 14B. Larger models are possible only with more tradeoffs and are not recommended as the default workflow on this machine.\n\nRecommended model strategy:\n- Do not chase one giant model.\n- Use a 2-model or 3-model local stack.\n- Best practical local coding setup for this machine:\n  1. Qwen2.5-Coder 14B as the main coding model\n  2. Qwen3 14B as the broader planning / mixed coding + reasoning model\n  3. Qwen2.5-Coder 7B as the fast lightweight fallback\n\nSpecific model recommendations discussed:\n- Qwen2.5-Coder 14B\n  - Best first model if the main goal is coding\n  - Good for writing code, refactors, shell scripts, config edits, and routine bugfixes\n  - Recommended first quant to try: Q4_K_M, then Q5_K_M if performance remains acceptable\n- Qwen3 14B\n  - Better all-around model for coding plus general reasoning and assistant work\n  - Good for architecture conversations, broader repo understanding, and mixed planning + implementation tasks\n  - Recommended first quant: Q4_K_M\n  - Use lower reasoning / no_think mode for routine work\n- Qwen2.5-Coder 7B Instruct\n  - Fast fallback model for simple code edits, scripts, command generation, config updates, and low-latency tasks\n  - Recommended quant: Q5_K_M or Q6 if performance is still acceptable\n\nLM Studio guidance that was discussed:\n- Start with manageable context lengths, around 8192, instead of trying to maximize context immediately.\n- Use GPU offload aggressively / maximally if available.\n- Keep temperature low for coding work, roughly 0.2 to 0.3.\n- Keep prompts file-scoped and task-scoped.\n- Do not load giant repo context unless necessary.\n\nRecommended practical usage split:\n- Use local models for:\n  - file inspection\n  - code drafting\n  - repetitive refactors\n  - shell scripts\n  - config editing\n  - draft documentation\n  - routine debugging\n- Use frontier paid models like ChatGPT/Codex for:\n  - subtle bugs\n  - architecture decisions\n  - migration plans\n  - high-confidence review\n  - ambiguous or high-risk failures\n\nImportant distinction explained to the user:\n- A model by itself does not install software, modify files, or operate like Codex automatically.\n- The model can only write code and execute installation / shell / file operations if the host application gives it tool access.\n- Without tools, LM Studio with a local model is only a text generator / chat interface.\n- With tools, it can become an agent-like assistant.\n\nDirect answer given about local Qwen models and agentic coding:\n- Yes, local Qwen models can write code.\n- Yes, they can potentially modify files and run commands if connected to tooling.\n- No, plain LM Studio chat by itself is not the same thing as a full coding agent like Codex in this environment.\n- If the user wants the local system to behave more like Codex — inspect files, edit code, run installs, test, retry — they need an agent framework around the model, not just the model alone.\n\nAgent/tooling options discussed conceptually:\n- LM Studio + MCP can help with retrieval and file visibility, but plain chat is not the best fit for end-to-end agentic coding.\n- More capable agent-style options would be things like:\n  - OpenCode + local model\n  - Aider + local model\n  - OpenHands or similar agent shells\n- Conclusion given: for “write code and install it like Codex,” plain LM Studio chat is not the best primary solution.\n\nPractical recommendation given to the user:\n- Use local Qwen as a first-pass coding assistant / coding agent for routine work.\n- Let it edit files and run safe commands only if placed behind an agent framework or tool layer.\n- Keep the human in the loop for installs, service restarts, production-impacting changes, and anything high-risk.\n- Use paid frontier models only for escalation and review.\n\nGuidance related to reducing usage rate on paid plans:\n- The best way to reduce Plus / ChatGPT usage is not only MCP.\n- MCP helps retrieve project information and avoid repeated pasting, but it does not directly remove message caps.\n- The stronger strategy is:\n  - local model for cheap routine work\n  - paid model for hard blockers\n  - projects / stable project context / file-based context to reduce repetition\n  - avoid using heavy reasoning modes for ordinary coding turns\n\nOverall conclusion stored from this discussion:\n- The user’s machine is absolutely capable of useful local AI coding work.\n- The most realistic and practical starting point is Qwen2.5-Coder 14B, with Qwen3 14B as a secondary broader reasoning model and Qwen2.5-Coder 7B as a fast fallback.\n- Local models can meaningfully reduce ChatGPT/Codex usage.\n- To approach Codex-like “write code and install it” behavior, the user needs a tool-enabled agent framework around the local model, not just LM Studio chat alone.\n- The best next implementation step after this planning would be to choose the actual local agent stack: LM Studio + MCP only, OpenCode + local model, Aider + local model, or a similar agent framework.\n\n---\nStored after discussion on local coding models, LM Studio suitability, hardware fit, model recommendations, and the difference between plain chat use and tool-enabled agentic coding.","status":"active","namespace":"projects","namespace_name":"projects","namespace_tier":"shared","tags":[]}