I stopped trusting cloud AI with my personal files and switched to this local setup instead


I use AI tools a large part of my productivity and creative tasks now, so uploading files to them has become second nature at this point. Things like course notes, research papers, screenshots, image references, and just random PDFs. Bringing in outside context is, in my opinion, one of the most useful and underappreciated aspects of AI. But among my design notes and article research are more personal documents like health reports and bank statements.

My tech background is in audio engineering, design and video editing, so I’m not a crowd talking about server infrastructure. A few years ago, I could upload personal files anywhere without a second thought. To be honest, I’m even more motivated now because the file analysis and RAG are so well done. But once I started to understand a little more about how servers work, I really didn’t. Once something leaves your device, it’s pretty much in your hands, and “we process your data securely” means something different to every company — and not always what it seems.

Cloud AI and our files have a longer relationship than we think

There are terms and conditions, but nobody actually reads all of them

Offers twins with file

When you upload a file to Cloud AI, it’s not just read and thrown away. The model has to live somewhere while processing it, and that’s their infrastructure, which are servers that you don’t see at all. Even after you delete something from your chat history, traces are likely to remain in backups and logs. It is almost impossible for you to completely remove it from a distributed system like this.

Take ChatGPT for example. It stores files separately from your chat history, so deleting a chat doesn’t even delete the document – they’re managed independently and stay there until you specifically delete them. Also, standard accounts retain chat history indefinitely unless you manually delete it, and even then it takes up to 30 days to actually delete it in the background. By default, Claude doesn’t use your data for training, and deleted conversations are deleted for 30 days – but if you’ve opted to upgrade the model at any point, the retention period extends to five years.

Gemini keeps conversations for 18 months by default, but if a person views one of your reviewer sessions, that data is kept separately for up to three years, regardless of whether you delete it. NotebookLM is probably the cleanest of the bunch, it doesn’t track your downloads at all and files stay in place until you delete them. It’s still Google infrastructure though, so the same caveats apply – ie your data is on their remote servers and may be subject to internal policies, backups, legal holds or whatever.

My point is that none of these tools know what type of file you just downloaded, there is no personal file filter. The bank statement and the general research paper are handled in the same way and I find it a bit difficult to ignore.


Running DeepSeek on Radha Orion O6

I don’t pay for ChatGPT, Confusion, Gemini or Claude – instead I stick to self-hosted LLMs

No sense relying on AI tools when my native LLMs can handle everything

How can I manage my personal files now?

The local install I went through

My actual setup for anything private is a native LLM via LM Studio. Nothing I write to it ever leaves my machine, so there are no retention windows or terms of service for how my entries will be used sequentially. The privacy aspect isn’t even why I started local LLMs to be completely honest (it was curiosity in the first place), but it quickly became clear that it was a good reason to continue using local models. For many, using local AI for sensitive documents is the whole point of going local. For reference, mine the base domestic model is the Qwen 3.5 9Bbut i still use gpt-oss 20B from time to time.

The more capable than I expected to work with documents enters. LM Studio has had built-in document support for a while now – add a file and if it fits in the model’s context window, everything will be loaded directly into the query. If it’s too long for that, it switches to RAG, breaks up the document, and draws the most relevant sections based on what you’re asking. It’s not as seamless as NotebookLM for very long files, but for a shorter health report or contract, it handles it without a problem. I have it too Joined the Brave Quest MCPso if the model needs external context or up-to-date information to make sense of something in the document, it can pull it from the web mid-conversation without me changing tools or adding my personal information to a search engine.

That’s almost all. Also, I’m not signed in to a Microsoft account so it doesn’t affect me, but it’s worth checking if you are – OneDrive syncs certain folders automatically by default, so your files may not be as local as you think. At first I just didn’t want to deal with Microsoft’s setup instructions, but now I only keep a local Windows account on purpose. I use it to sync with my other devices Synchronizationit exposes minimal metadata and performs no server-side encryption or telemetry unless explicitly enabled. As for backup: my tried and true, encrypted little physical hard drive. No need for large cloud storage for text and PDF files.


Accessing Open Notebook from the Web UI

4 reasons Open Notebook is the best standalone NotebookLM alternative

You no longer need to share your research data with Google

The context ceiling is real

But documents worth preserving are generally not long documents

gwen couldn't finish the answer

The honest limitation is context. I can move my native model to a 60k context window with a small GPU. That’s a lot, or at least it sounds like a lot until you factor in the back-and-forth for a work session, especially if you’re dealing with information you don’t generally understand, like genetics testing. So it’s worth paying attention to your prompts to make sure you get better responses without wasting verses on repeat requests.

Another thing is long documents. Something like NotebookLM will almost always handle it more reliably because it has a context ceiling that’s hard to compete with natively. The solution for these situations is simple: split the files. There are self-administered tools for this, e.g OmniToolsso you can control the data. I’ve only had to do this once or twice with a private document; most of the time the types of documents I’m talking about here are quite small.

It’s worth it for files that are important to install

For most files it doesn’t matter where they are processed, at least not for me. But there’s a specific category it does, and it doesn’t satisfy the “trust us, we de-identify it” answer after reading the privacy policy. A local installation isn’t without limitations – your hardware will really dictate what you can run and how long your sessions can last. But now I almost have a new criterion: if I don’t feel comfortable having a stranger see it, I probably shouldn’t upload it to cloud AI.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *