How does Lemmy decide what goes in the hot feed?

ipodjockey@lemmy.world to

No Stupid Questions@lemmy.world – 288 points – 1 years ago

Title.

You are viewing a single comment

View all comments Show the parent comment

And the relevant source code

https://github.com/LemmyNet/lemmy/blob/main/migrations/2021-01-05-200932_add_hot_rank_indexes/up.sql#L4-L13

And this is a great thing about open source software

Want to know how something works? Want to know the implications of something, or whether it is artificially manipulated? You can go directly to the code.

How does the algorithm work for other software, and is it authentic and not manipulated for other gains? Nobody knows except them, and bad stuff can be hidden away.

Can someone who knows PL/pgSQL help parse this line:

return floor(10000*log(greatest(1,score+3)) / power(((EXTRACT(EPOCH FROM (timezone('utc',now()) - published))/3600) + 2), 1.8))::integer;

It seems to me that the issue might be that the function returns an integer. If the scaling factor is inadequately large, then floor() would return zero for tons of posts (any post where the equation inside floor() evaluates to less than one). All of those posts would have equivalent ranks. This could explain why we start seeing randomly sorted old posts after a certain score threshold. Maybe better not to round here or dramatically increase the scaling factor?

I'm not sure what the units of the post age would be in here, though. Probably hours based on the division by 3600? And is log() the natural log or base 10 by default?

In any case, something still must be going wrong. If I'm doing the math correctly, a post with a score of +25 should take approximately 203 hours (assuming log base 10) before it reaches a raw rank score of < 1 and gets floored to zero, joining all of the really old posts. So we should be seeing all posts from the last 8.5 days that had +25 scores before we see any of these really old posts... But that isn't what's happening.