Forum Moderators: phranque

Message Too Old, No Replies

Recommendation for Site Search

         

smallcompany

10:47 pm on Jan 25, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hello,

I was looking into site search options, but got puzzled as many looked like an overkill. Primarily, I looked among open source, for one reason only, so I could try it first. I don't mind paying for it if I know "that is it."
Google's Programmable Search Engine is easy and ok, except I would not want any external links to show, and Google serves its own "search at Google" link.
As I was going through some of the solutions, I noticed many were search only, so you need another "thing" to crawl your site(s). I came across something called OpenSearchServer which does both, but that looked very outdated to me. I like the concept of all-in-one though.
Finally, there is a security thing to think about, right?

This would be for sites hosted on Apache/Ubuntu platform, up to 300 static (if that matters) HTML pages. Preferred solution would be a single one that collects the data and does the internal site search for visitors. The best would be if I could have one installation of this with multiple search instances for multiple websites. A few of such websites are not having any huge traffic, so the load would be ok, for sure. Yes, I prefer a local installation over the cloud based one.

I wonder about your experience as most of sites feature site search these days.

Thank you

NickMNS

5:37 pm on Jan 26, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't know of any products or service, but one could build a pretty good system using vector search. You need take your 300 pages, use embedding model to convert the content to vectors, store the vectors into a vector database. Then create a search form on your website, take the query, embed that and then search for it in the DB, then return the result to a search results page. You can use open source models for the embedding and for the DB, but the problem is that it takes a pretty powerful and thus costly machine to make it work smoothly, you'd be better off using a cloud service. So there can be a cost. I've used Cloudflare they have pretty generous free tier that would likely cover your use case. There are many options available now.

To be clear, vector search is semantic search, if you have recipe site and search for "Apple pie" it would return results that are semantically similar, like "Apple strudel", "Apple Crumble", "Peach Cobble", etc.

thecoalman

7:45 pm on Jan 27, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Static as in static content or stici files? Doesn't work for static .html files as far as I know but if the content is DB driven Sphinx can be installed. You need to be able to install it, usually requiring VPS or better and allocate RAM to it. Lightning fast results even for large content sites. .

Marshall

11:28 am on Jan 28, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Check out Zoom Search

Mark_A

3:31 pm on Feb 17, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi smallcompany, keep looking, when I made my own sites I installed a site search which both spidered and delivered searches. I can't remember what it was called but it was popular with small to medium sites where the webmasters were not overly technical and you could encode the search and results page in your own website header and footer so it really looked like part of your site. If I can dig it out I will but probably my files are gone from that period. Oh and did I mention, I am pretty sure it was free also!

smallcompany

8:44 pm on Feb 19, 2025 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks very much. I am playing with Lunr.js, seems quite manual, lol, but may work fine for a smaller website. I'll continue searching for sure.

JennyWilson

11:17 am on Mar 3, 2025 (gmt 0)

Top Contributors Of The Month



On Apache/Ubuntu, the search engines Typesense or MeiliSearch can usually be considered for self-hosting: both are fast, lightweight, and pretty easy to set up. They index and search content with no need for a separate crawler if you are able to generate a structured data set (e.g., in JSON).

With Apache Nutch and Solr, you can set up a crawler and a search engine, but if all you have is 300 static pages, that might be overkill. Another good alternative is Sphinx. From a security standpoint, ensure that your search endpoints are not exposed unnecessarily, while you restrict access to admin/config areas. Hopefully, this helps!