Two months back, a company called OpenAI released its chatbot, ChatGPT, to the public. ChatGPT is a so-called Large Language Model (LLM) that is trained on the nearly 600 gigabytes of text of all kinds found on the World Wide Web to learn to complete any text prompt by predicting the next word, and the word after that, and so on.
The purported aim of the system is to put the “auto complete” functionality for words, found on cellphones, on steroids so it can complete entire paragraphs. The powers of these types of LLMs have long been known in the technology sector, thanks to ChatGPT’s predecessor, GPT3, from OpenAI and similar systems from other Big Tech companies.
Nevertheless, ChatGPT’s wide deployment has caught the public’s fancy in an unprecedented way, with over 1 million users playing with the system within the first week. Even in a year of magical developments in so-called generative Artificial Intelligence (AI) — with such jaw-dropping technologies as DALL-E, Stable Diffusion and Midjourney — the impact of ChatGPT on the public imagination is staggering. The initial shock and awe at ChatGPT’s ability to write coherently, even poetically, on any topic quickly gave way to angst about the many disruptions it may cause.
Teachers and school boards have worried that the surprisingly coherent text which ChatGPT produces in seconds will likely make essay-writing questions — a staple of many high school and college tests — obsolete, with students prompting ChatGPT to produce essays for them. Some school districts have outright banned ChatGPT, and a race is on to “detect” text written by LLMs like it. Others worry that all written endeavors — including journalism — will be rendered obsolete thanks to ChatGPT and its successors. Already, there are instances of media outlets using ChatGPT to write some explainer stories.
Big Tech itself wonders if ChatGPT will soon make search engines obsolete. The argument is that if ChatGPT can provide a coherently written answer to any query, who needs Google? Microsoft has introduced a version of its Bing search engine that uses an LLM as a backend to answer user queries. To get our bearings amidst all this hype, it is worth taking a step back to understand the inherent capabilities and limitations of LLMs like ChatGPT.
As mentioned above, ChatGPT is trained to provide the best completion of the dialog state that is statistically most likely given its web-scale training data. The completion is done in the context of a moving window which, in the case of ChatGPT, is 4,096 tokens or about 3,000 words. Such a large context window, coupled with a massive learned network with 175 billion tunable parameters, engenders a surprising level of local coherence in the text produced, making it look like ChatGPT is logically answering the questions posed by the user.
Nevertheless, it is important to understand that ChatGPT has no concept of truth or falsity. Unlike search engines which give users pointers to documents authored by humans (who know the difference between truth and falsity), LLMs do not index the documents they are trained on. Instead, they learn the patterns and correlations between words and phrases — the information that is stored in the billions of weights defining their trained networks. What this means is that when ChatGPT is “completing” a prompt, it is not doing this by copy-pasting documents but, rather, by constructing a plausible completion on the fly, using its network.
Indeed, it is this very inability to remember and reproduce verbatim that impresses users (and frightens teachers using old-school plagiarism detection tools), as it in a way helps ChatGPT generalize beyond the text it is trained on.
Of course, human memory is itself not veridical — we don’t store and retrieve experiences verbatim but, instead, stitch them on demand (thus leading to false memories and unreliable witnesses). However, unlike humans who can (sometimes) verify their memories against external sources, LLMs focus just on the statistical likelihood of the completion provided. They do not have any model of the world we inhabit beyond this.
Thus, in the case of ChatGPT, all meaning and accuracy — beyond plausible completion in the context of training data — is very much in the eye of the beholder. ChatGPT itself is neither lying nor telling the truth, it is simply “afactual.” We may see it as capturing the distribution of plausible realities, rather than the single reality we all inhabit. So, ChatGPT can give a highly relevant-sounding answer to any query, whether it involves grade-school essays, text summarization requests or questions involving reasoning and planning — but there are no guarantees about the accuracy of its answers. Where does all of this leave today’s hype about putting ChatGPT in charge of search?
The temptation to get a direct answer to your query, rather than having to click and read the documents to which a search engine points you, is indeed very high. Unfortunately, left to themselves, LLMs reconstruct (and in so doing, effectively hallucinate) the completions to your queries. The fact that their context-sensitive completions are “generally” accurate is not enough since, after all, most search is done when we don’t already know the answer and hope to trust the search engine’s results. Indeed, the recent demo of ChatGPT-backed Bing, seen at first as pretty impressive, were found to be riddled with errors by the later fact checking.
Given the rising tide of misinformation, the last thing we need is Google and Bing turning into authoritative fountains of misinformation themselves — something already illustrated by recent attempts to make LLMs directly answer queries.
On the other hand, if LLMs are used to parse the query, call the search engine (which does retrieval rather than reconstruction) and summarize the results, the inaccuracy of the search can be reduced while making the search experience look like a Q&A experience. Not surprisingly, Google reportedly has been doing some of this with BERT, one of the first LLMs.
It also is worth noting the cultural upheavals which LLMs like ChatGPT are bringing about; the angst about essay plagiarism is but the tip of the iceberg. Throughout history, we humans have conflated form (syntax, physical beauty) with content (truth, character). We of course knew these are imperfect surrogates but stuck with them anyway, as they made life easy.
The strongest of these — that a well-written piece must somehow be true — lingers on. It makes teachers’ lives easy by allowing them to grade essays based on form features such as grammar and the notorious “five-paragraph essay structure,” without having to spend time delving into the originality of the arguments. News consumers could assume that well-written news stories are perhaps true (as the tech news site CNET found to its chagrin!). The rise of ChatGPT and other LLMs makes the continued use of these facile surrogates all but untenable, and therein lies the real reason for at least some of the angst.
There are, of course, frantic attempts to postpone the inevitable by developing techniques to identify text generated by LLMs. As with deep fakes, however, the early techniques have been shown to be rife with false positives — with the test provided by OpenAI itself classifying original Shakespeare text as “AI generated.” There are more robust ideas in the wings, based on the idea of white lists, and forcing LLMs to use words outside the white lists with lower probability. These, however, would need a buy-in from the companies, and it is not clear what incentives those companies will have, especially if they are selling writing assistance services.
In a world with easy access to ChatGPT, everyone can produce good form — if only with trite or erroneous content. Ironically, a poorly expressed original idea may well become the best shibboleth of the humanity of the writer.
Broad but shallow linguistic competence exhibited by ChatGPT is both frightening and exhilarating because we know that many of us are so easily taken by it. Used as assistance tools for humans, with appropriate guard rails, they can indeed improve our lives. The trick, as we have seen, is to resist the rushed deployment of such tools in autonomous modes and in end-user-facing applications.
Be what may, to the extent that these alien intelligences force us to recalibrate our ideas of hallmarks of intelligence and avoid over-reliance on form and beauty as facile surrogates for content and character, it is perhaps not an entirely bad thing.
Subbarao Kambhampati, PhD, is a professor of computer science at Arizona State University and the past president of the Association for the Advancement of Artificial Intelligence. He can be followed on Twitter @rao2z.
Source : The Hill
Leave a Comment