In search of the least viewed article on Wikipedia

ray@lemmy.ml to Technology@lemmy.ml – 171 points –
In search of the least viewed article on Wikipedia
colinmorris.github.io
11

You are viewing a single comment

Really enjoyed the read. Thanks for sharing. I’m surprised by the random page implementation.

Usually in a database each record has an integer primary key. The keys would be assigned sequentially as pages are created. Then the “random page” function could select a random integer between zero and the largest page index. If that index isn’t used (because the page was deleted), you could either try again with a new random number or then march up to the next non empty index.

Marching up to the next non-empty key would skew the distribution—pages preceded by more empty keys would show up more often under “random”.

Fun fact, that concept is used in computer security exploits: https://en.wikipedia.org/wiki/NOP_slide

For choosing an article, it would be better to just pick a new random number.

Although there are probably more efficient ways to pick a random record out of a database. For example, by periodically reindexing, or by sorting extant records by random (if supported by the database).