robots.txt is a suggestion

tekeous@usenet.lol to Lemmy Shitpost@lemmy.world – 616 points –
25

You are viewing a single comment

time to fill sites code with randomly generated garbage text that humans will not see but crawlers will gobble up?

I don't think it's a bad idea but it's largely dependent on the crawler. I can't speak for AI based crawlers, but typical scraping targets specific elements on a page or grabbing the whole page and parsing it for what you're looking for. In both instances, your content is already scrapped and added to the pile. Overall, I have to wonder how long "poisoning the water well" is going to work. You can take me with a grain of salt, though; I work on detecting bots for a living.

I work on detecting bots for a living.

You should just tell people you're a blade runner.

I'm a blade runner. 😁

You see a turtle, upended on the hot asphalt. As you pass it, you do not stop to help. Why is that?

also that job title is cool as fuck

I agree and I wish I was actually that cool. I just look at data all day and write rules. 🫠

Until you realize that you are paying for access fees/network