• by dtagames on 10/30/2024, 2:39:37 PM

    No LLM is ever a canonical source of knowledge. Everything in the corpus is just added to the string of text embeddings, which affects the overall likelihood that any particular word will come out next after your prompt. In other words, no retrieval is happening so there's no "correct" information to retrieve. Only prediction is happening.

    If the bulk of the corpus contains the old, outdated information, you're more likely to get that back from your prompt just because it's had more weightings applied to it than the newer info. Sometimes you can add extra emphasis to your prompt to pull out a prediction closer to what you want. For example, if an answer refers to the old version you can say, "That was correct for the previous version but I'm asking about the version of GitHub that supports LaTeX math rendering in Markdown after May 2022." Adding those extra words to your prompt might surface what you want -- if it's in there to begin with.