Philosophical Investigations: chatbots

By Keith Tidman

It was back in 1980 that the American philosopher John Searle formulated the so-called ‘Chinese room thought experiment’ in an article, his aim being to emphasise the bounds of machine cognition and to push back against what he viewed, even back then, as hyperbolic claims surrounding artificial intelligence (AI). His purpose was to make the case that computers don’t ‘think’, but rather merely manipulate symbols in the absence of understanding.

Searle subsequently went on to explain his rationale this way:

‘The reason that no computer can ever be a mind is simply that a computer is only syntactical [concerned with the formal structure of language, such as the arrangement of words and phrases], and minds are more than syntactical. Minds are semantical, in the sense that they have … content [substance, meaning, and understanding]’.

He continued to point out, by way of further explanation, that the latest technology metaphor for purportedly representing and trying to understand the brain has consistently shifted over the centuries: for example, from Leibniz, who compared the brain to a mill, to Freud comparing it to ‘hydraulic and electromagnetic systems’, to the present-day computer. With none, frankly, yet serving as anything like good analogs of the human brain, given what we know today of the neurophysiology, experiential pathways, functionality, expression of consciousness, and emergence of mind associated with the brain.

In a moment, I want to segue to today’s debate over AI chatbots, but first, let’s recall Searle’s Chinese room argument in a bit more detail. It began with a person in a room, who accepts pieces of paper slipped under the door and into the room. The paper bears Chinese characters, which, unbeknownst to the people outside, the monolingual person in the room has absolutely no ability to translate. The characters unsurprisingly look like unintelligible patterns of squiggles and strokes. The person in the room then feeds those characters into a digital computer, whose program (metaphorically represented in the original description of the experiment by a ‘book of instructions’) searches a massive database of written Chinese (originally represented by a ‘box of symbols’).

The powerful computer program can hypothetically find every possible combination of Chinese words in its records. When the computer spots a match with what’s on the paper, it makes a note of the string of words that immediately follow, printing those out so the person can slip the piece of paper back out of the room. Because of the perfect Chinese response to the query sent into the room, the people outside, unaware of the computer’s and program’s presence inside, mistakenly but reasonably conclude that the person in the room has to be a native speaker of Chinese.

Here, as an example, is what might have been slipped under the door, into the room:

什么是智慧

Which is the Mandarin translation of the age-old question ‘What is wisdom?’ And here’s what might have been passed back out, the result of the computer’s search:

了解知识的界限

Which is the Mandarin translation of ‘Understanding the boundary/limits of knowledge’, an answer (among many) convincing the people gathered in anticipation outside the room that a fluent speaker of Mandarin was within, answering their questions in informed, insightful fashion.

The outcome of Searle’s thought experiment seemed to satisfy the criteria of the famous Turing test (he himself called it ‘the imitation game’), designed by the computer scientist and mathematician Alan Turing in 1950. The controversial challenge he posed with the test was whether a computer could think like — that is, exhibit intelligent behaviour indistinguishable from — a human being. And who could tell.

It was in an article for the journal Mind, called ‘Computing Machinery and Intelligence’, that Turing himself set out the ‘Turing test’, which inspired Searle’s later thought experiment. After first expressing concern with the ambiguity of the words machine and think in a closed question like ‘Can machines think?’, Turing went on to describe his test as follows:

“The [challenge] can be described in terms of a game, which we call the ‘imitation game’. It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart from the other two. The aim of the interrogator is to determine which of the other two is the man and which is the woman. He knows them by labels X and Y, and at the end of the game he says either ‘X is A and Y is B’ of ‘X is B and Y is A’. The interrogator is allowed to put questions to A and B thus:

C: Will X please tell me the length of his or her hair?

Now suppose X is actually A, then A must answer. It is A’s object in the game to try and cause C to make the wrong identification. His answer might therefore be: ‘My hair is shingled, and the longest strands are about nine inches long’.

In order that tone of voice may not help the interrogator, the answers should be written, or better still, typewritten. The ideal arrangement is to have a teleprompter communicating between the two rooms. Alternatively, the question and answers can be repeated by an intermediary. The object of the game is for the third party (B) to help the interrogator. The best strategy for her is probably to give truthful answers. She can add such things as ‘I am the woman, don’t listen to him!’ to her answers, but it will avail nothing as the man makes similar remarks.

We now ask the question, ‘What will happen when a machine takes the part of A in this game?’ Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, ‘Can machines think?’ ”

Note that as Turing framed the inquiry at the time, the question arises of whether a computer can ‘be made to play satisfactorily the part of A in the imitation game, the part of B being taken by a [person]?’ The word ‘imitation’ here is key, allowing for the hypothetical computer in Searle’s Chinese room experiment to pass the test — albeit importantly not proving that computers think semantically, which is a whole other capacity not yet achieved even by today’s strongest AI.

Let’s fast-forward a few decades and examine the generative AI chatbots whose development much of the world has been enthusiastically tracking in anticipation of what’s to be. When someone engages with the AI algorithms powering the bots, the AI seems to respond intelligently. The result being either back-and-forth conversations with the chatbots, or the use of carefully crafted natural-language input to prompt the bots to write speeches, correspondence, school papers, corporate reports, summaries, emails, computer code, or any number of other written products. End products are based on the bots having been ‘trained’ on the massive body of text on the internet. And where output sometimes gets reformulated by the bot based on the user’s rejiggered prompts.

It’s as if the chatbots think. But they don’t. Rather, the chatbots’ capacity to leverage the massive mounds of information on the internet to produce predictive responses is remarkably much more analogous to what the computer was doing in Searle’s Chinese room forty years earlier. With long-term future implications for developmental advances in neuroscience, artificial intelligence and computer science, philosophy of language and mind, epistemology, and models of consciousness, awareness, and perception.

In the midst of this evolution, the range of generative AI will expand AI’s reach across the multivariate domains of modern society: education, business, medicine, finance, science, governance, law, and entertainment, among them. So far, so good. Meanwhile, despite machine learning, possible errors and biases and nonsensicalness in algorithmic decision-making, should they occur, are more problematic in some domains (like medicine, military, and lending) than in others. Importantly remembering, though, that gaffs of any magnitude, type, and regularity can quickly erode trust, no matter the field.

Sure, current algorithms, natural-language processing, and the underpinnings of developmental engineering are more complex than when Searle first presented the Chinese room argument. But chatbots still don’t understand the meaning of content. They don’t have knowledge as such. Nor do they venture much by way of beliefs, opinions, predictions, or convictions, leaving swaths of important topics off the table. Reassembly of facts scraped from myriad sources is more the recipe of the day — and even then, errors and eyebrow-raising incoherence occurs, including unexplainably incomplete and spurious references.

The chatbots revealingly write output by muscularly matching words provided by the prompts with strings of words located online, including words then shown to follow probabilistically, predictively building their answers based on a form of pattern recognition. There’s still a mimicking of computational, rather than thinking, theories of mind. Sure, what the bots produce would pass the Turing test, but today surely that’s a pretty low bar.

Meantime, people have argued that the AI’s writing reveals markers, such as lacking the nuance of varied cadence, phraseology, word choice, modulation, creativity, originality, and individuality, as well as the curation of appropriate content, that human beings often display when they write. At the moment, anyway, the resulting products from chatbots tend to present a formulaic feel, posing challenges to AI’s algorithms for remediation.

Three decades after first unspooling his ingenious Chinese room argument, Searle wrote, ‘I demonstrated years ago … that the implementation of the computer program is not itself sufficient for consciousness or intentionality [mental states representing things]’. Both then and now, that’s true enough. We’re barely closing in on completing the first lap. It’s all still computation, not thinking or understanding.

Accordingly, the ‘intelligence’ one might perceive in Searle’s computer and the program his computer runs in order to search for patterns that match the Chinese words is very much like the ‘intelligence’ one might misperceive in a chatbot’s answers to natural-language prompts. In both cases, what we may misinterpret as intelligence is really a deception of sorts. Because in both cases, what’s really happening, despite the large differences in the programs’ developmental sophistication arising from the passage of time, is little more than brute-force searches of massive amounts of information in order to predict what the next words likely should be. Often getting it right, but sometimes getting it wrong — with good, bad, or trifling consequences.

I propose, however, that the development of artificial intelligence — particularly what is called ‘artificial general intelligence’ (AGI) — will get us there: an analog of the human brain, with an understanding of semantic content. Where today’s chatbots will look like novelties if not entirely obedient in their functional execution, and where ‘neural networks’ of feasibly self-optimising artificial general intelligence will match up against or elastically stretch beyond human cognition, where the hotbed issues of what consciousness is get rethought.

Monday, 3 April 2023

The Chinese Room Experiment ... and Today’s AI Chatbots