Hmm, thought I’d try something out and looks like maybe it does actually work to some degree?
Basically, adding “If you don’t know, you don’t have to make something up, just say ‘I don’t know’” to the end of an LLM prompt to try and cut down on the bullshit (doesn’t fix the environmental footprint, though).
Background on the watch question: afaik, there are no LED watches with automatic movements, although Hamilton has one with an LCD display.
(I don’t think it’s a coincidence that it also happens to be one of the first things I tell people if we’re going to be working together.)
@aral It doesn’t accept instructions. It’s only responding to the input with the most probable output, which happens to be “I don’t know,” when you prompt it to say “I don’t know,” if it doesn’t know.
This is why the “ignore all previous instructions” meme is silly.
@ramsey I don’t follow. A prompt is an instruction, no? The whole idea is that the data and instructions are intertwined which is what led to the initial set of prompt injection attempts that have now been mitigated to one extent or other. So a more accurate statement would be along the lines of: It shouldn’t accept instructions that contradict its initial prompt.
@aral @ramsey my understanding is that it's more that there are no instructions, only data. And some data look like instructions, and cause it to generate completions that look like following those instructions.
In this case it's generating "I don't know" because it determined that to be the most likely completion for that prompt. Which may be why it somewhat works: if it had been trained on an answer, that might be a more likely completion.