I ditched LM Studio for llama.cpp and my native LLM doesn't feel like downgrading.

LM Studio has been my standard run during my stay runs local LLMsit’s not something I’ve experienced anymore, long enough to call it part of my daily flow. The appeal of LM Studio is that it has a GUI, one-click installs, and no command-line stuff to go through. Development is not my domain, not even using the terminal, so a user-friendly running was important to me and that’s what makes me comfortable a self-hosted AI first of all.

But I’m more based on local models I’m actually starting to get more and more into the limits of what LM Studio can do for things that I don’t want a cloud chatbot to touch. Some models are not fully functional, so some flagship features remain untouched in newer models because my runner doesn’t support them yet. A co-worker mentioned llama.cpp to me a while ago and I initially considered it a developer option, but when I finally thought about it, it became clear that I was protecting myself from something more accessible than I thought.

Want to stay up-to-date on the latest developments in artificial intelligence? The XDA AI Insider newsletter drops every week with deep dives, tool tips, and hands-on coverage you won’t find anywhere else on the site. subscribe by change your ballot.

A terminal based runner that I ran for no reason

I got it up and running in five minutes

For the longest time, llama.cpp lived in my mind as an option you could graduate to if you really knew what you were doing. Every installation guide I’ve gone through in the past opens with something about installing the compiler and I’d close the tab before the page would load. None of these were true for my use case. The GitHub release page It has pre-built binaries for Windows, Mac and Linux, separately built depending on your hardware. Literally all it took was to download, unzip, run a command in the terminal, and that was it.

llama.cpp is an open source C++ runtime for handling large language models natively, built by Georgi Gerganov in March 2023, shortly after Meta dropped the LLaMA weights. And in fact, llama.cpp is the main support engine for LM Studio, Ollama, and most other native AI programs you’ve heard of. They’re essentially wraps built around it, so going direct cuts out the middleman. It also ships with a llama-server and a built-in web UI that you access through your browser, so the actual conversation can happen in a clean GUI.

There are real reasons to use it over a GUI runner. Wrappers add overhead, so compared to LM Studio on the same model of llama.cpp on the same hardware, it runs noticeably faster in the 5-20% range, depending on your setup. llama.cpp also tends to support newer models first because it’s the upstream project that everything builds on, while LM Studio and Ollama have to wait for an update cycle. That way, you don’t have to wait for your runner to catch up to use the newest outdoor weights.

Ollama is still the easiest way to start domestic LLMs, but the worst way to continue them

Ollama is great to get you started… just don’t stay.

What drove me away from LM Studio

I’m kind of getting over it

Setting context and settings for qwen in lm studio

I’m actually not going to stop using LM Studio completely, and it’s not a bad tool. But the more I relied on native models for actual workflows, the more I started to run into things that LM Studio didn’t yet support – and the model I wanted to use was one of them.

This the model was Gemma 4 E4B. Its standout feature is native audio input – automatic speech recognition and speech-to-text translation in multiple languages – and this capability is only available on two trim variants, the E2B and E4B. None of the larger Gemma 4 models have this. So if your runner doesn’t sound, you’re using a multimodal model with one of its modes turned off. LM Studio doesn’t support sound at all, which means I haven’t really been able to use Gemma to its full potential, and that’s what finally prompted me to give llama.cpp a spin.

What actually changed after switching to Llama.cpp

A few unexpected victories along the way

The first thing I noticed was something I hadn’t tried to fix and had actually forgotten: Gemma 4’s reasoning combines its reasoning with the response in LM Studio. It was no more. llama.cpp’s WebUI puts the reasoning in a separate collapsible box, so you see the answer and can expand the reasoning if you want. I left myself to live with this bug in LM Studio, but llama.cpp fixed it.

The sound side was the real reason I switched and it works well. You can’t hit the record button without doing some workarounds, but downloading WAV files is a bit faster than setting it up anyway. Its audio analysis is perfect based on my playback tests of the same prompts I sent in text – it interpreted them the same way. Image analysis also meets my expectations. It goes beyond text and can also interpret organic objects and understand the context of a photograph.

But the value I actually got was from the built-in functions. It has custom session system prompts, a cleaner way to connect MCP servers, and more parameter control than LM Studio. Since I’m not using llama.cpp for development, but just for research and daily tasks, these controls did most of the developing for me.

Most of the settings outside of the regular settings (temp, min-p, repeat, and presence penalty) aren’t very relevant to my workflow, but I’ve found settings that actually make a noticeable difference. DRY (Don’t Repeat Yourself) is a smarter version of the repetition penalty – it only kicks in when all phrases start to repeat, single characters, so I don’t see the side effect of the pattern unnaturally avoiding common words like “the”.

Mirostat and Dynamic Temperature should also be noted. Both adjust how the model selects tokens in real-time based on the actual response, rather than locking everything to a fixed value beforehand. This makes for more coherent prose over longer sessions. But these are better suited for more technically creative workflows.

7 things I wish I knew when starting self-hosted LLMs

I’ve been hosting LLMs for a long time and these are all things I’ve learned over time and wish I’d known at the start.

I should have changed sooner

I switched because the model I wanted to use was ahead of the tool I was working with and llama.cpp was actually able to continue. It’s likely that outdoor models will continue even as they become more capable, so it’s worth having a runner that doesn’t fall behind. Also, although llama.cpp is aimed primarily at developers, you don’t have to be one to use it. Once the GUI is up and running, it’s basically a regular chat, but faster, cleaner, and with more controls. LM Studio still has its place for casual use, but llama.cpp has really been an improvement.

Source link

I ditched LM Studio for llama.cpp and my native LLM doesn’t feel like downgrading.

A terminal based runner that I ran for no reason

I got it up and running in five minutes

Ollama is still the easiest way to start domestic LLMs, but the worst way to continue them

What drove me away from LM Studio

I’m kind of getting over it

What actually changed after switching to Llama.cpp

A few unexpected victories along the way

7 things I wish I knew when starting self-hosted LLMs

I should have changed sooner

Leave a ReplyCancel Reply

‘Other ideas… just had to fall by the wayside’: Fallout: New Vegas director says Obsidian had to limit RPG’s scope due to time constraints

Merck and Mastercard are seeing real agent AI results. Both say plumbing came first.

Websites have a new way to spy on visitors: analyzing their SSD activity

A terminal based runner that I ran for no reason

I got it up and running in five minutes

Ollama is still the easiest way to start domestic LLMs, but the worst way to continue them

What drove me away from LM Studio

I’m kind of getting over it

What actually changed after switching to Llama.cpp

A few unexpected victories along the way

7 things I wish I knew when starting self-hosted LLMs

I should have changed sooner

Leave a ReplyCancel Reply

Trending now

‘Other ideas… just had to fall by the wayside’: Fallout: New Vegas director says Obsidian had to limit RPG’s scope due to time constraints

Merck and Mastercard are seeing real agent AI results. Both say plumbing came first.

Websites have a new way to spy on visitors: analyzing their SSD activity