LLMs and Reasoning
I had a Twitter thread about the reasoning capabilities of LLMs, here I have converted it into a blog post -
Quite a few ppl seem to be doing this — asking GPT4 to generate stories, and getting plenty impressed by the results. But a far surer sign of intelligence in machines will be their ability to spot plot holes given a story rather than generate stories themselves.
I have tried this with chatGPT, but it is not very good at it. For eg. there are a couple of well-known plot holes or glitches in the Harry Potter books(the later editions of the books got them corrected). I tried to prompt the bot with the edition number, book name, and chapter name
In the Goblet of Fire, in the chapter Priori Incantetum, the order in which the pale shadows come out of Voldemort’s wand in the graveyard is wrong. Lily Potter should have come out before James Potter, but in the book, the order is reversed. chatGPT does not spot this.
I later literally pointed it out and asked the AI bot if it spots anything amiss or inconsistent with this part of the story with the general Harry Potter canon. chatGPT kept failing with a lot of over-polite apologies thrown in.
If the LLM had truly understood the story and knew the canon then it should have been able to spot and explain the plot hole. I then gave it a small passage of my own writing with a few subtle and not-so-subtle plot inconsistencies thrown in.
Even in this case, the bot was not able to reason through the logical inconsistencies in the passage I gave it -
All this gains significance and raises concerns because Indian courts have started using chatGPT to summarize and assess the worldwide view on granting bail to the accused in certain scenarios -
The below tweets show more examples of how LLMs don’t do so well when tested against benchmarks where they are required to reason, not merely remember all the data available on the internet -
😂🤣🤣 -
[Dec 3rd] LLM alarmism spiked up again last week because Geoffrey Hinton lent his voice to the cause of the doomers. So I went back and checked chatGPT, and I was surprised how experts like Hinton can think LLMs can think and understand as humans do.
Some examples follow — I asked chatGPT to solve simple problems, like “count the frequency of every letter in a given word.” It failed consistently at this task, which even a primary school kid will be able to perform easily.
But when asked if it could give the code to perform the same task, it was able to give an accurate and functional code.
But when asked if it could run this code, it again got it wrong
So basically the LLM agent cannot solve a simple problem “by hand” but it can generate code to do that same operation. This is strong evidence that LLM does not really understand what it is generating, it is only fetching the code from somewhere in its training data.
Thinking and reasoning abilities would involve the following —
- understanding the question
- figuring out the logic
- working through the solution for small test cases
- then writing the code for the solution.
LLM can do step 1&4 but is incapable of step 2&3. I gave more algo puzzles — “reverse the letters in a given word”. As expected, it failed again. It can reverse numbers accurately, but fails consistently with letters and characters -
Its Python code generation is correct.
So LLM is a bit like the semi-literate hawker on the street who can sell you calculators that can perform mathematical operations or solve equations but the hawker himself won’t even know what sine and cosine functions are.
What is funny is, that people are expecting these same LLMs to win math olympiads and they are even offering cash prizes for it, like 👇 -
Unless the solution for the math olympiad problems already exists somewhere on the internet, I doubt an LLM will be able to ‘solve’ the problems.