only in the podcasts I listen to
Yes definitely. Many of my fellow NLP researchers would disagree with those researchers and philosophers (not sure why we should care about the latter’s opinions on LLMs).
it’s using tokens, which are more like concepts than words
You’re clearly not an expert so please stop spreading misinformation like this.
The temperature scale, I think. You divide the logit output by the temperature before feeding it to the softmax function. Larger (resp. smaller) temperature results in a higher (resp. lower) entropy distribution.