Tf-Idf? No. Just kidding. Infromation retrieval, text mining, ML/DL ?! what is going on with this field ! Every other resource seems outdated? What is the state of art ?
Reading some of these posts : https://boyter.org/2010/08/build-vector-space-search-engine-python/
https://www.dr-josiah.com/2010/07/building-search-engine-using-redis-and.html
https://stevenloria.com/tf-idf/
https://stories.algolia.com/a-search-engine-in-css-b5ec4e902e97
I’d start off with not doing state of the art because it’s overkill for an “MVP”. And if you don’t need proper browser rendering of pages, there’s open source crawlers out there like Nutch that might work. If you’re making one yourself, the outdated academic papers and presentations by search companies are a good resource as the basic ideas of crawling and indexing haven’t changed too much (even if ranking and other components have changed a lot). A search engine is really a set of related components and there are many examples out there to use as inspiration for your MVP.
https://danluu.com/sounds-easy/
It describes some of the various difficulties in building the next Google Search, much better than I could.
Neither it is some site-wide search que.
And not a business model as well.
Not a "privacy first but no results" search.
Something that works.
TLDR: my regex search engine needs result ranking to be more useful before I consider showing it to other humans.