mgdigital

3 Post – 13 Comments

Joined 1 years ago

The DHT is basically the wild west - EVERYTHING is on there (but that is also the power of it). Bitmagnet is attempting to overlay some order on it, make it more easily usable, and automatically filter the truly harmful content. Once the core features are more fleshed out, chapter 2 will hopefully look more like a fediverse with curation and moderation. There's still lots to be done but it's getting there!

Bitmagnet Allows People to Run Their Own Decentralized Torrent Indexer Locally

mgdigital@lemmy.world to

Selfhosted@lemmy.world – 188 points – 9 months ago

torrentfreak.com

Hi, this is a great point and one that I've already given consideration to. I'll address separately the issue of the primary datastore ,i.e. Postgres, and the Redis dependency:

Postgres as the only option for the data store

There are 2 reasons for this:

Performance: while SQLite could offer a simpler/embedded data store, it simply doesn't have the performance and features of Postgres. Bitmagnet has a faceted search engine and is write-intensive (it will be discovering ~5k torrents per hour and writing these to the database along with associated metadata). As such, its database may not be suitable for running on older hardware. A SQLite adapter, if it was developed, may simply not be up to the job (although as I haven't attempted this I can't say what the performance would be like). That said, Bitmagnet itself is not especially resource intensive, you could probably run it on a Raspberry PI but point it to a Postgres instance on some more powerful hardware. At this stage I've only been running it on a M2 Mac Mini with Postgres located on its SSD and so would be interested to know people's mileage on other hardware.
Development, support and maintenance overhead: I'm a lone developer and this project is already too big for one person. A SQLite adapter, if feasible performance-wise, I think could only happen if other contributors joined the project as my to-do list is already pretty long. It would have to achieve feature parity with the Postgres implementation which makes use of several Postgres-specific features and extensions. It would also mean a longer testing cycle and therefore probably a slower release cadence. That said, if there was enough demand and assistance then I'd be open to looking into the feasibility of this once the rest of the application is a little more mature and the current database schema more finalised.

Redis dependency

Redis is currently used only for the asynchronous task queue. I would like to have put this in Postgres, but there simply is not a good out-of-the-box solution that works well with Postgres and GoLang, and is actively maintained. I looked at quite a few queuing libraries and eventually settled on asynq (https://github.com/hibiken/asynq), which is a great library and does the job well - but could really do with support for non-Redis backends.

Using Redis here was a pragmatic decision that allowed me to make progress, rather than an optimal one. I guess I could have built a simple Postgres-based queue myself but that would have been a distraction and probably sub-optimal compared with a mature/separately developed library. It remains an option. Since I looked into this a new project has sprung up which I'm keeping an eye on - https://www.tork.run/ - it has a Postgres backend and looks like it might be up to the job, but is very new.

So yes, I'm very aware that the additional Redis dependency is not ideal and it may well disappear at some point.

1 more...

Yes, see https://bitmagnet.io/tutorials/servarr-integration.html

Hi, and thanks!

As a priority I'd like to gather some more rigorous performance benchmarks, but I can give you some hand-wavey stats now: Bitmagnet is currently fluctuating between 2-10% CPU usage on my M2 Mac Mini, and is using ~120MB of memory having currently been running for around 48 hours. Overall, the GoLang implementation seems pretty efficient to me considering how much I know is going on in the background.

Disk space usage of the database- this will be highly dependent on 2 configuration options, the first of which I've only just added in the just-released version. Copied from the configuration page of the website:

dht_crawler.save_files (default: true): If true, file metadata from the DHT crawler will be saved to the database. This provides more rich information about a torrent, but will use a lot more disk space. If disk space is at a premium you may want to consider disabling this.
dht_crawler.save_pieces (default: false): If true, the DHT crawler will save the pieces bytes from the torrent metadata. The pieces take up quite a lot of space, and aren’t currently very useful, but they may be used by future features.

For me, 24 hours of crawling uses ~2.5GB of database disk space for metadata on the ~120k torrents it has discovered. Yep, that sounds like a lot, however 90% of that is taken up with the files metadata, and could have been saved by setting dht_crawler.save_files to false. In fact I may set this to false by default and allow users to opt-in to the full-fat torrent info.

I've also imported the entire RARBG backup (the SQLite one, see tutorial on the Bitmagnet website). This, along with all the associated metadata from TMDB, took around 4GB of database space, which seems quite acceptable considering it's basically every movie and TV show. Note that this does NOT include the metadata on individual files as I described above.

A priority feature for me (detailed on website) is smart deletion - this would allow you to automatically discard a lot of data that can be automatically determined of no interest and therefore greatly reduce disk space demands.

4 months in my database takes up around 50GB; for the size of a few hi-res movies it's worth it for me...

1 more...

Hi, yep that's expected. Torrents will only move out of "Unknown" once the classifier is able to categorise them. The classifier currently only supports movie and TV show content, and can recognise these with quite high accuracy assuming a well-named torrent (and a badly named torrent is unlikely to be a high quality release). The other content types (music, games etc) can currently only be populated via an import (see the tutorial on the website). A priority feature is classifiers for other content types - however we will likely always have a lot of torrents ending up in "Unknown" given the poor naming of many crawled items. Another roadmap feature, smart deletion, could help in future with getting rid of all the rubbish whose contents cannot be inferred from the torrent name.

Hi, the default port is 3333, which should be exposed if you're using the example configuration here: https://bitmagnet.io/setup/installation.html - I'm not sure what the app is in your screenshot but the provided config definitely exposes that port and is tested on Docker for Mac.

Hi, yes this is mentioned on the installation page of the website, below the Docker instructions. The app can be installed Dockerless using go install; if you choose this option you'll have to provide and configure Postgres and Redis instances for the app to connect to. That said, Docker is the recommended and easiest option.

2 more...

I've never used I2P but I don't see why not!

Yep

Scraping torrent sites will be avoided is it'll be prohibitively slow and break the self-sufficiency concept - we'll infer as much as possible from the torrent meta info alone. You could have a guess at the bitrate from the file sizes. Sonarr/Radarr will already do this for you with quality profiles I think.

There's a PR currently open for multi-platform builds so should have this sorted soon

Can you find something you wouldn’t find otherwise?

Yes, quite a lot of content that's otherwise difficult to find on the public trackers. Also public trackers can be shut down.

Introducing Bitmagnet: A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration

mgdigital@lemmy.world to

Selfhosted@lemmy.world – 521 points – 1 years ago

bitmagnet.io

RARBG selfhosted: A self hosted Torznab API for the RARBG backup, compatible with Prowlarr/Radarr/Sonarr etc

mgdigital@lemmy.world to

Selfhosted@lemmy.world – 3 points – 1 years ago

GitHub - mgdigital/rarbg-selfhosted: A self-hosted Torznab API for the RARBG backup, compatible with Prowlarr, Radarr, Sonarr etc.

github.com