[seeking solution] self hosted search engine
My latest Google search replacement recently made a decision that basically forces me to turn off ad block in order to click results. I was wondering if there was any self hosted solution that is fairly easy to deploy in TrueNAS scale or if it is even worth doing. Bonus points if it's federated somehow. I'll deal with bad results if it needs time to grow as a project.
I also want to add that what little self hosting I've done so far has felt like cutting out a festering cancer and it feels so good to be in control of my online life again. Thanks so much for the guidance since the Rexxit. Finding out that you could easily self host a Reddit replacement with other people was what got me going to into this to begin with.
Perhaps https://github.com/searxng/searxng ?
It looks like a few people are recommending this, so just a quick note in case people are unaware:
If you want to avoid being tracked, this is not a good solution. Searxng is a meta search engine, meaning it is effectively a proxy: you search on Searxng, it searches multiple sites and sends all the results back to you. If you use a public instance, you may be protected from the actual search engine*, because many people will use the same instance, and your queries will be mixed in with all of them. If you self host, however, all the searches will be your own - there is then no difference between using Searxng and just going to the site yourself.
*The caveat with using the public instances is while you may be protected from the upstream engine, you have to trust the admins - nothing stops them from tracking you themselves (or passing your data on).
Despite the claims in their docs, I would not consider this a privacy tool. If you are just looking for a good search engine, this may work, and it gives you flexibility and power to tune it yourself. But it's probably not going to do anything good for your privacy, above and beyond what you can get from other meta search engines like Startpage and DuckDuckGo, or other "private" search engines like Brave.
OP isn't asking for a secure search engine though, they're asking for one without ads that they can control themselves. Also while searxng and other meta search engines won't neccesarily protect you from data harvesting they will protect you from tracking cookies and the absolute trash mountain of fake results (imo especially noticeable with google search)
google's results got so bad recently I had to turn it off in my searxng instance
Use Yandex.ru, if you are looking for free access to the content in English. https://www.reddit.com/r/Piracy/comments/nd7w7s/lpt_if_you_cant_find_a_torrent_via_google_because/
They are explicitly trying to move away from Google, and are looking for a new option because their current solution is forcing them to turn off ad-blocking. Sounds to me like they are looking for a private option. Plus, given the forum in which we are having the discussion (Lemmy), even if OP is not specifically concerned with privacy, it seems likely other users are.
As for cookies, searxng can't do any more than your browser (possibly with extensions) can do, and relying on your browser here is a much better solution, because it protects you on all sites, rather than just on your chosen search engine.
"Trash mountain" results is a whole separate issue - you can certainly tune the results to your liking. But literally the second sentence of their GitHub headline is touting no tracking or profiling, so it seems worth bringing attention to the limitations, and that's all I'm trying to do here.
You're partially right about self hosting, but it still strips out the user tracking scripts and only provides the pure results, and you can make SearXNG route to Tor..
I noted in another comment that SearXNG can't do anything about the trackers that your browser can't do, and solving this at the browser level is a much better solution, because it protects you everywhere, rather than just on the search engine.
Routing over Tor is similar. Yes, you can route the search from your SearXNG instance to Google (or whatever upstream engine) over Tor, and hide your identity from Google. But then you click a link, and your IP connects to the IP of whatever site the results link to, and your ISP sees that. Knowing where you land can tell your ISP a lot about what you searched for. And the site you connected to knows your IP, so they get even more information - they know every action you took on the site, and everything you viewed. If you want to protect all of that, you should just use Tor on your computer, and protect every connection.
This is the same argument for using Signal vs WhatsApp - yes, in WhatsApp the conversation may be E2E encrypted, but the metadata about who you're chatting with, for how long, etc is all still very valuable to Meta.
To reiterate/clarify what I've said elsewhere, I'm not making the case that people shouldn't use SearXNG at all, only that their privacy claims are overstated, and if your goal is privacy, all the levels of security you would apply to SearXNG should be applied at your device level: Use a browser/extension to block trackers, use Tor to protect all your traffic, etc.
I'm not an expert but one could funnel all web traffic through a VPN if they needed right? Gaining possibly even more obscurity and shifting the trust to a company vs a small user
(relative whether that's an upgrade or not in privacy)
You mean between their instance and the final search engines? Or between them and a public instance of searxng?
In either case, I'm not sure it buys you anything in terms of privacy you wouldn't get by using the VPN and going directly to the search engines.
Seconding this. I use it and it works fairly well.
I had no idea that was what that was. Learn something new every day.
I'm really happy with my searxng instance
Like others, I use searxng.
But you can also try whoogle and librex
I am using searxng
You can customize a lot and the results are good imo ๐
Now i need to figure how I need to make it public/accesible so I can use it when I'm out and about.
Easiest way if it's only for yourself is using tailscale
Set up a VPN. Safest / best way to do it
Whoogle is great. https://hub.docker.com/r/benbusby/whoogle-search
Search engines take a LOT of work to run, which is why there's so few of them. You can self-host a search engine that indexes one site, but not one that indexes the entire internet lol. The closest you'll find is SearxNG as others mentioned. It's not a search engine itself though; it just uses other search engines.
Yacy is pretty great.
Yes, Yacy is what you want OP (https://yacy.net). It's rather pathetic that people are still trying to be a parasite, but wanting to do so anonymously. Roll up your sleaves and commit your resources to making community search engines work. You have the control.
Instead of a 'normal' search engine, you could take a look at a Gpt like replacement, maybe there is one that also protects you your privacy, and it can certainly be used to find what normal search engines could find
Huhโฆso thereโs currently no open source search engine out there? I see a few crawlers, and some UIs the crawlers can use but no one project consolidating the two.