Many businesses want to harness the power of AI but are terrified of leaking proprietary data to public clouds or burning through massive API costs. The solution isn’t a better cloud contract, it’s moving the intelligence to the edge.
I have written extensively how the power of running an LLM locally on your own machine can open up the world of AI without fear of cost or security. I have been running locally extensively for a variety of different services, but of late I had an opportunity to really use it at volume for cataloging images.
From Italy to Innovation: A Case Study in Image Cataloging
We recently returned from a business trip to Italy, as my wife overseen the fall collection for Kelley Kouture her high-end footwear company. Tagging along, I took over a thousand photos, of various footwear, scenes, factories, all with the mindset for stock photos for use in upcoming social media posts as Kelley demands authentic posts.
Instead of just dumping into one big folder, or uploading to Google Photos, I decided to have a little fun and build a Media Library for her, as a place to manage all her digital assets. Simply uploading them wasn’t good enough, I thought, let’s run them through AI to catalog each image. I wanted a description to be produced, as well as tags that would allow it to be easy to search across the photos as she decides what to post next.
I chose to bypass public AI APIs not just for the sake of the challenge, but for data sovereignty. When dealing with proprietary assets like a brand’s entire visual catalog or private internal documents sending that data over the wire to a third-party server creates a permanent security footprint. By running the model locally, the data never leaves the local network. It stays within the ‘walled garden’ of your own infrastructure.
How well would a local AI running an open source LLM handle the load? As it turns out – pretty darn well.
I vibed up a plugin for my Media Library to call Google’s open source Gemma 4 12B model, hosted on LM Studio. Developed my prompt and then let it run. Each image cost me just under 2 seconds, with my GPU running around 70%. My CPU was untouched. Not a single byte left my network.



The actual results, descriptions/tags, where indistinguishable from the results from the latest Claude/Gemini spot tests I performed. Result – and that is thousands of less photos we didn’t send to the public models for their training.
The Benefits of Embedded Models vs. Public APIs (Cost, Security, Privacy)
Imagine what you can do with your product if you embedded AI inside of it? Not call a public API AI endpoint, but actually ship AI with your product.
Shipping local models with your desktop product is feasible today. Adding an embedded general purpose model like Gemma 4 (~9GB) or a model that is specifically tuned for your purpose is just extra disk space. Who cares especially if its adding value to your user. They will thank you.
Your user does not have to worry about burning through expensive token credits, or being locked into a subscription service contract. You don’t have to worry about hosting infrastructure or tracking usage.
This is happening now. Google has been quietly stuffing your machine with a 4GB model with the latest Chrome installs, all with the goal of moving AI processing away from them and closer to you. Ethically, them silently doing this is a little questionable, but for products that specifically call it out as a feature, no problems.
Beyond just saving on token costs, local AI eliminates the ‘Risk Premium.’ When you use a public API, you are essentially paying a premium for convenience while accepting a risk of data exposure. By embedding a model like Gemma 4 directly into your product, you eliminate that risk entirely. You aren’t just cutting costs; you are insulating your company from the liabilities of third-party data processing.
Do not let anyone tell you local AI is not viable. It is not only viable, but accessible now. Open source models are more than good enough for the vast majority of use cases you would probably find yourself needing. As I have noted historically you literally have millions of models to choose from.
Real-World Applications for Local AI
Applications where you can harness the power of local AI, offloading processing and keeping your client data secure by embedding the model include:
- PDF/WORD document scanning
- Long form text clean up
- Image cataloging for ecommerce, social media, or gallery applications
- Data/Reporting Analysis
- Source code analysis/generation
- Architectural Review
- Voice to Text
For some of these, you can even expose the underlying prompt and let the user be completely creative, without worrying about them running up costs in your ecosystem.
The key to success is using the right model for the task you are asking of it – not by simply throwing the most general purpose model you can have. Not everyone will be able to run the embedded models, but choose wisely, and it will just be a timing issue.
In an era of increasing regulation and cyber-threats, ‘Data Sovereignty’ is no longer a luxury; it’s a requirement. When you host your own LLM, you maintain 100% ownership of the data flow. There are no logs kept by third parties, no training sets being built on your proprietary information, and no risk of your private data leaking into a public model’s output. For industries like legal, healthcare, or high-end manufacturing, this is the only way to innovate with AI without compromising compliance.
If you’re still weighing the cost of API tokens against the security of local deployment, it’s time to look at what local models can do for your workflow.






Leave a Reply