What's the 2025 stack for a self-hosted photo library with local AI?

Question

First of all, this is purely a personal learning project for me, aiming to combine three of my passions: photography, software engineering, and my family memories. I have a large collection of family photos and want to build an interactive experience to explore them, ala Google or Apple Photo features.My goal is to create a system with smart search capabilities, and one of the most important requirements is that it must run entirely on my local hardware. Privacy is key, but the main driver is the challenge and joy of building it myself (an obviously learn).The key features I'm aiming for are:Automatic identification and tagging of family members (local face recognition).Generation of descriptive captions for each photo.Natural language search (e.g., "Show me photos of us at the beach in Luquillo from last summer").I've already prompted AI tools for a high-level project plan, and they provided a solid blueprint (eg, Ollama with LLaVA, a vector DB like ChromaDB, you know it). Now, I'm highly interested in the real-world human experience. I'm looking for advice, learning stories, and the little details that only come from building something similar.What tools, models, and best practices would you recommend for a project like this in 2025? Specifically, I'm curious about combining structured metadata (EXIF), face recognition data, and semantic vector search into a single, cohesive application.Any and all advice would be deeply appreciated. Thanks!

crobibero · Accepted Answer

I think Immich checks a lot of thesehttps://immich.app/

iforgotpassword · Answer

I currently use photoprism, but it's moving rather slowly. Facial recognition misses a lot of faces, the automatic clustering works fine at first but once you tagged a few thousand faces the implementation grinds to a halt and the background worker runs for hours pegging single cpu core.
The dev is really reluctant of accepting external contributions, which has driven away a lot of curious folks willing to contribute.
Immich seems to be the other extreme. Moving really fast with a lot of contributors, but stuff occasionally breaks, the setup is fiddly, but the Ai features are 100x more powerful. I just don't like the ui as much as photoprism. I with there was some kind of blend of the two, on a middle ground of their dev philosophies.

ranger_danger · Answer

https://immich.app/https://ente.io/https://photonix.org/https://github.com/LibrePhotos/librephotoshttps://github.com/photoprism/photoprism

mossTechnician · Answer

This may not interest you, but Ente checks most of these boxes for me. It has face recognition and AI-based object search out of the box, and you can self-host their open-source server without any restrictions. The models they used might be useful for your project.

coffeecoders · Answer

I have been building something like this but for personal use.

As of now, I use SentenceTransformer model to chunk files, blip for captioning (“Family vacation in Banff, February 2025”)) and mtcnn with InsightFace for face detection. My index stores captions, face embeddings, and EXIF metadata (date, GPS) for queries like “show photos of us in Banff last winter.” I’m working on integrating ChromaDB for faster searches.

Eventually, I aim to store indexes as:

{

  "filename": "/Vacation/Banff/Wife.jpg",

  "chunk_id": 0,

  "text": "Family at Banff, February 2025",

  "caption_embedding": [0.1, 0.2, ...],

  "face_embeddings": [{"name": "NT", "embedding": [0.3, 0.4, ...]}, ...],

  "exif": {
     
     "DateTimeOriginal": "2025:02:15",

     "GPSCoordinates": "18.387, -65.992"

    }

}

I also built an UI (like Spotlight Search) to search through these indexes.

Code (in progress): https://github.com/neberej/smart-search

nico · Answer

I don't know about the photo-management aspects. However, I've had very good experiences running gemma3 (4b and 12b) locally via ollama
I've used gemma to process pictures and get descriptions and also to respond questions about the pictures (eg. is there a bicycle in the picture?). Haven't tried it for face recognition, but if you already have identified someone in one photo, it can probably tell you if the person in that photo is also in another photo
Just one caveat, if you are processing thousands of pictures, it will take a while to process them all (depending on your hardware and picture size). You could also try creating a processing pipeline, first extracting faces or bounding boxes of the faces with something like opencv, and then passing those to gemma3
Please post repo link if you ever decide to open source

nicoburns · Answer

It's not self-hosted, but https://ente.io/ is an independent commercial solution with E2E encrypted cloud storage and local AI (EDIT: apparently you can also self-host)

elevaet · Answer

I think a really valuable feature in a photo library app would be something that can identify sets of very similar or identical photos and decide which one is the "best" and offer to discard the rest.I must be wasting so much storage on the 4 photos I took in a row of the family pose, or derivatives that got shared on whatsapp and then stored back to my gallery, and so on, and I know I'm not the only one.

wooben · Answer

I've been running Nextcloud in Docker with the Recognize and Memories apps for about a year and half now. It's in an off-lease refurbished Dell Precision tower from 2018.
I'm using docker compose to include some supporting containers like go-vod (for hardware transcoding), another nextcloud instance to handle push notifications to the clients, and redis (for caching). I can share some more details, foibles and pitfalls if you'd like.
I initiated a rescan last week, which stacks background jobs in a queue that gets called by cron 2 or 3 times a day. Recognize has been cranking through 10k-20k photos per day, with good results.
I've installed a desktop client on my dad's laptop so he can dump all of the family hard drives we've accumulated over the years. The client does a good job of clearing up disk space after uploading, which is a huge advantage in my setup. My dad has used the OneDrive client before, so he was able to pick up this process very quickly.
Nextcloud also has a decent mobile client that can auto-upload photos and videos, which I recently used to help my mother-in-law upload media from her 7-year-old iPhone.

gavin_gee · Answer

i swear the single best feature for me would be:take my photo catalog stored in google photos, apple pictures, Onedrive, Amazon photos. collate into a single store, dedupe. Then build a proper timeline and geo/map view for all the photos.

mlunar · Answer

It's a pretty deep rabbit hole. For semantic search CLIP and cosine similarity are just fine. SmolVLM(2) mentioned by spacecadet looks interesting though. I haven't integrated face recognition myself, but [deepface] seemed pretty complete.
I focused more on fast rendering in [photofield] (quick [explainer] if you're interested), but even the hacked up basic semantic search with CLIP works better than it has any right to. Vector DBs are cool, but what is cooler is writing float arrays to sqlite :)
[deepface]: https://github.com/serengil/deepface
[photofield]: https://github.com/SmilyOrg/photofield
[explainer]: https://lnar.dev/blog/photofield-origins/

pmetras · Answer

If you want light configuration requirements with no database, you can try to enhance https://gitlab.com/paolobenve/myphotoshare. MyPhotoShare is a static photo-gallery where AI features have been added through extensions of the parser. This is a one-developer project, mostly Python and JavaScript, and he is open to contributions.

sneak · Answer

I believe Ente supports all of this, and can be self-hosted. All of the AI stuff is done locally.
I pay them for service/storage as it’s e2ee and it doesn’t matter to me if they or I store the encrypted blobs.
They also have a CLI tool you can run from cron on your NAS or whatever to make sure you have a complete local copy of your data, too.
https://ente.io - if you use the referral code SNEAK we both get additional free storage.

ksec · Answer

Slightly Off Topic: I have always wanted (old) Apple to make Time Machine / Personal Cloud where Data is stored and processed in my property. While only offering Subscription based storage as long term storage Cloud backup and software update.For Features. I dont know why there's isn't a tag for Screen Caps. I made lots of them and I want to group them together.

gerdesj · Answer

Nextcloud with a few addons. Now this might look like overkill for your use case but I get the impression that you might want to go further in future.
Stock NC gets you a very solid general purpose document management system and with a few addons, you basically get self hosted SharePoint and OneDrive without the baggage. The images/pictures side of things has seen quite a lot of development and with some addons you get image classification with fairly minimal effort.
The system as a whole will quite happily handle many 100,000 files with pretty rubbish hardware, if you are happy to wait for batch jobs to run or you throw more hardware at it and speed up the job schedules.
NC has a stock phone app which works very well these days, including camera folder uploads. There are several more apps that integrate with the main one to add optional functionality. For example notes and voip.
It is a very large and mature setup with loads of documentation and hence extensible by a determined hacker if something is missing.

chrisgd · Answer

This is my dream. I started building something that would upload all my photos from my phone to my desktop, back them up somewhere and then present them 6 at a time on a local website solely so you could look at them again and decide if you wanted to keep them. Heart any you wanted to keep, favorite some, and delete the rest then show me 6 more.The addition of an AI tool is a great idea.

ssnepenthe · Answer

The gallery I use has an "internals" page in their docs: https://docs.home-gallery.org/internals/It gives a sort of high level system overview that might provide some useful insights or inspiration for you.

weinzierl · Answer

In addition to all of that I want an AI solution that pre-selects good images for me, so I do not have to go through all of them manually. Similar to Apple Memories or Featured Photos. Is there anything self-hosted like that?

simonw · Answer

There are some spectacular local models for generating text descriptions of images now. I suggest starting with Mistral Small 3.2, Gemma 3 and Qwen 2.5VL - all available via Ollama.I expect we will see a Qwen 3VL soon.

ciaranmca · Answer

I have used https://www.photoprism.app/ and have found the face recognition to work quite well.

ggm-at-algebras · Answer

Dedupe over edited photos, and handling highly approximate date information are my "nobody has this right yet" criteria.

slackpad · Answer

Haven&rsquo;t tried it yet (I&rsquo;d love to find something like this too) but I saw a conference talk on https://docs.voxel51.com/ that looked pretty interesting. It is kind of a data frame for images with a GUI for exploring them. They make it pretty easy to rip various models over your images to add tags, and to evaluate the results.

spacecadet · Answer

I built this same solution for myself last year, used Hugging Face's "SmolVLM". It works surprisingly well. I use the model to generate verbose descriptions of each image, embed the descriptions using another model, which I also use for the query embedding.The stack is hacky, since it was mostly for myself...

joesweetsox · Answer

Are any of these systems doing true image based entity resolution? It seems like its only pair-wise similarity checking. If you are trying to index say 20 years of family photos how do they do linking kindergardeners to thier adult images?

stormfather · Answer

I would try the Qwen models before LLaVaDo you need the embeddings to be private? Or just the photos?

SirFatty · Answer

It looks as you are primarily using a phone to view and share? We often (visually) share via our living room TV (via attached computer). Is that something you're looking to incorporate?

xnx · Answer

https://www.digikam.org/ does a lot of what you're looking for.

ProfessorZoom · Answer

I'm also curious as to the best local high quality background removal, such as for gradation images where people are wearing tassels

sigmonsays · Answer

i'm still old school syncthing + photoprism. Perhaps I should give immich a better look

owebmaster · Answer

The Browser. Just pure JavaScript, HTML, CSS and WebGPU running on a bulletproof sandbox.

kkfx · Answer

Photoprism and Immich

What's the 2025 stack for a self-hosted photo library with local AI?

I think Immich checks a lot of these
https://immich.app/

https://immich.app/
https://ente.io/
https://photonix.org/
https://github.com/LibrePhotos/librephotos
https://github.com/photoprism/photoprism