Not so intelligent?

koenfucius

Feb 3

Featured image: JahnmitJa/Flickr CC BY SA 2.0

ChatGPT is all the rage, but does it live up to our expectations? And are our expectations realistic?

I remember my first trip to Japan in the 1990s. I had worked all over Europe, and being fortunate enough to speak a few languages, most of the time I had been able to get at least some sense from notices, menus and other various written material. My British colleagues, in contrast, were usually entirely out of their depth, whether we were in Portugal, Austria, Italy or Sweden. But in Tokyo, I quickly discovered what that must have felt like for them. Even after a week, the furthest I got was that I was able to recognize the same combinations of two or three characters on the doors of vans and taxis. But I had no idea what they meant.

I was like an unwitting participant in a real-life Chinese Room experiment (albeit in Japan). This thought experiment, conceived by philosopher John Searle, hypothesizes a computer program that behaves as if it understands Chinese: in response to a string of Chinese characters as input, it produces another string of Chinese signs as output, in such a way that it passes the Turing test (i.e., what it produces is indistinguishable from what a human would). A human could use computer program's instructions in the same way, to respond to a string of Chinese signs with another string of Chinese characters. Even though it appears as if the computer program, or the human using it, understand Chinese, neither do, Searle argued. Maybe, if I had stayed long enough in Japan to learn many more such patterns of characters from perusing Japanese papers and books, I might have been able to construct rules similar to the computer's, and appear to understand Japanese.

Intelligent bullshit

This is how easy it is to (pretend to) know Chinese! (photo: via YouTube)

John Searle described the Chinese room in 1980, presumably not expecting that he would live to see the kind of computer program actually to come into existence. Yet, here we are: ChatGPT, the "intelligent chatbot" launched at the end of 2022, has caused a great deal of interest and commotion. (If you have not yet tried it out, give it a go – it is quite spectacular.)

To only say that it produces coherent answers to all kinds of questions wouldn't do it justice. It is witty (it tells jokes and writes humorous text), knowledgeable (it explains complex concepts from just about any scientific domain), creative (it produces lists of ideas and writes lyrics in the style of existing artists) and intelligent (it managed to pass several exams).

It also has some flaws, but only a curmudgeon would hold that against an artificial intelligence program that is so human-like that one genuinely feels compelled to say "please" and "thank you" when conversing with it. If it is human occasionally to err, then should we not grant that privilege to artificial intelligence chatbots, too?

Maybe not. I asked it why some languages have two grammatical genders, while others have three. It replied that French has three: masculine, feminine, and neuter. Intrigued, I asked it for some examples of the latter, and it produced "l'oeuf, le soleil, la pluie, le bonheur, l'amour". (In case you do not speak French, none of these words are neuter, as there is no such gender in French.) So, why does the 'neuter' word soleil take the masculine article le, I wondered. ChatGPT duly obliged: "The word 'soleil' is a masculine noun because it ends in '-il'." This rule is, of course, utter nonsense.

The technical term for what ChatGPT was producing here is bullshit. The philosopher Harry Frankfurt wrote a splendid diminutive volume on the topic, in which he compares and contrasts bullshit and bullshitting with lies and lying. The latter is deliberately speaking untruths; the former simply has no regard for the truth: "It is just this lack of connection to a concern with truth – this indifference to how things really are – that I regard as the essence of bullshit," Frankfurt writes. He even puts his finger on the reason why ChatGPT cannot help but bullshit: "Bullshit is unavoidable whenever circumstances require someone to talk without knowing what he is talking about." A human may have motives to bullshit that ChatGPT doesn't – they may want to impress others, or feel obliged to express an opinion, for example. ChatGPT bullshits simply because it does not and cannot know the truth.

Doomed to bullshit

ChatGPT derives what passes for knowledge from its training. Economist David Smerdon explains the process in a Twitter thread. The program is built on top of a so-called Large Language Model, a computer program that predicts the most probable next words given a particular sequence, based on millions of texts that ChatGPT has 'read'. It is, in effect, no more than a (very clever and very fast) autocomplete engine, like the one in your smartphone.

But the most likely words are not necessarily conveying the truth. Smerdon describes how ChatGPT completely invents the most-cited economics paper of all time – much like it invented the rule for establishing that a French noun is masculine. It first predicts the most likely title, then the most likely author for a paper with this title (and the most likely co-author), and comes up with a non-existent paper, claiming it was cited more than 30,000 times according to Google Scholar.

ChatGPT's makers recently announced that the latest version has improved factuality. But unless it is told, about every possible fact it might refer to, whether it is true or not, it has no way of knowing. We should therefore not expect it to give us correct, objective information. It may well do so much more often than not, but we have no way of determining when it is truthful and when it is bullshitting.

"A pie is a dessert, but not when it is a pork pie, you dummy! " (photo: Drew McLellan/Flickr CC BY NC 2.0

When humans converse, there is a thought in the mind of person A, which is converted into language, and transmitted to person B (where it is converted back into thought). With ChatGPT, that first step is missing: there is no thought, and not even a mind for it to be in. The meaning of the sentences that it produces is entirely the result of the probabilistic prediction of the successive words. ChatGPT itself has no conception of reality or truth. Furthermore, it also uses only textual context to generate its responses, and cannot rely on situational context. For example, I asked it why my friend might have been upset when, after dinner, I served him pork pie for dessert. It listed various technically possible reasons (not liking pie, not being hungry, allergic to gluten etc.), but failed to point out that pork pie, while definitely a pie, is not normally considered to be a dessert.

Comparing ChatGPT with Music LM, Google's model that generates music from text descriptions, is remarkably instructive. The absence of any initial thought (beyond the user-supplied input prompt) in generating music is blatantly clear, and the less formulaic the desired output, the poorer the melodic, harmonic and rhythmic quality. It truly is the musical equivalent of bullshit. How come we spot the musical bullshit so much more quickly than the bullshit text from ChatGPT? Perhaps it is because music is not intended to convey truth, but emotion. It is easier to fake truth with plausible text, than to fake emotion. If there is no genuine emotion to start with, then the bullshitting will be apparent very quickly.

What it can and cannot do for us

In the mid-1960s, computer scientist Joseph Weizenbaum created Eliza **, a program capable of parsing and producing natural language and simulating an empathic psychotherapist. Its responses simply reiterated what the user said and appeared to probe further ("I feel sad" – "What specifically makes you feel sad?"). It was intended to illustrate how superficial communication between people could be, but the illusion was so powerful many people took it seriously. Sixty years on, ChatGPT is much more powerful, but ultimately just as much an illusion of intelligent thought. Maybe it is, unintentionally, telling us how much shallow bullshit humans actually produce and get to read all day long.

Nonetheless, tools like ChatGPT can be very useful, provided we acknowledge its limitations. Ethan Mollick, a professor innovation and entrepreneurship at the University of Pennsylvania, has been an enthusiastic explorer and experimenter with the latest AI tools. He documents how they can help us write stuff, come up with ideas, make computer programs, and learn skills, as well as make images and videos.

But one area where ChatGPT has caused much unrest is in education: it is very easy to get it to produce a plausible-looking essay on just about any subject (just watch out for the fake citations!). Should we really worry? Yes, if the students' assignments can be easily produced by such models. This is where the problem lies: if we set the bar so low for essays, exams, articles and so on, that any old bullshit is good enough, then surely ChatGPT will allow students to cheat – yet it will "resemble the kind of answers students give when they are winging it". The challenge to teachers and professors is to redesign their assignments, and make sure they probe the students' thinking, not their ability (or that of the chatbot they use) to string words together in a plausible manner.

Human knowledge is not, and certainly not just, capable of being captured in words. Yet that is all ChatGPT can handle. If it appears human, it is because it can bullshit like the best of us (and arguable better).

What it has done, though, is to give us a benchmark for bullshit. If we produce output that is no better than what ChatGPT would, we are just bullshitting. We humans can, and should, be expected to do better.

Comment