LLM myths 2: perplexity and surprise

Sun, 27 Oct 2024 20:58:38 -0400

Given a language model, “perplexity” is defined as the mean of the negative log-likelihood. It is perhaps coined to correlate with the human sense of confusion (more when the number is high) and in many cases it is reasonable. Sometimes, we also refer to the negative log-likelihood of a single token as “surprise”. They are indeed good terminologies used both in academia and industry, and they also get the concept across a wide audience. But when people build their intuition by relying too much on these terms, misleading subtleties will occur.

LLM myths 1: why does LLM generate infinite loops

Sun, 27 Oct 2024 19:57:30 -0400

Looping is fairly common when sampling from an LLM. We normally do not want it to happen and there has been many tricks trying to make it behave such as repetition penalties or hard-coded loop detections, but their effectiveness is debatable. The explanation of this phenomenon seems scarce in literature and it might at first feels like another bug in our day-to-day data/model engineering without anything deep.

But for modern LLMs without ad-hoc outer logics to guard its output, sampling loops has been out there with us all along. For example, you can easily induce Deepseek model into a loop by writing this prompt (as of Oct 30th, 2024): “Write me a bunch of bullet points.”

LLM Myths on Honglu Fan

LLM myths 2: perplexity and surprise

LLM myths 1: why does LLM generate infinite loops