GPTs and other gollems On The A.I. Dilemma, Dec. 2023 In March of this year (2023) co-founders of the Center for Humane Technology and "tech ethicists", Tristan Harris and Aza Raskin gave a talk under the title of The A.I. Dilemma[1]. The talk was followed by a presentation at CogX Festival in September[2] on the dangers of artificial intelligence when OpenAI unleashed ChatGPT and large language models (LLMs) onto the general public. The technology itself was not new, but public awareness soared after ChatGPT was made publicly available. Harris and Raskin's talk refitted the subject and title of their 2020 documentary on the dangers of social media, The Social Dilemma, but goes on to apply the same critical framework to artificial intelligence in the age of LLMs. In The Social Dilemma, the onboard goals of social media algorithms were seen to have had the unintended and perverse consequences of corrupting everything from national politics and security, the electoral process, human values, childrens' identity, and media. While their 2020 charges against social media are unquestionably disquieting (I had already mostly dumped social media and plan to use any parental pull I might have to thwart my two young children's introduction into social media), applying the same perspectives to their indictment of artificial intelligence has, at the very least, led to some category mistakes that introduce serious misconceptions of what namely LLMs do that I think we would benefit from clarifying. While I'm quite certain that Harris and Raskin are candid in their convictions, the consequence of their views—-along with those of Max Tegmark at the Future of Life Institute[3] among others—-is that it trades a clear-eyed look at the real and specific potential dangers of artificial intelligence for a fear mongering that deflects attention away from either those dangers or toward the tremendous public value that it might or does provide in health, hunger, information, verification, access to education, environmental threats, and industrial automation. On the whole, there has perhaps been no technological discovery for which the general response did not fall on a spectrum of unease to outcry. The technophobes, neo-Luddites, fuddy-duddies, and conservative sticks-in-the-mud leery of all conceivable varieties of converging catastrophes have a way of seeing monsters under every bed-—artificial intelligence not a bit exceptional in this case. Following the charges brought on social media, Harris and Raskin's particular hobgoblin is fueled by a goal to "INCREASE MY CAPABILITIES AND ENTANGLE MYSELF WITH SOCIETY!" (cf. social media's more basic tactic of entanglement: "MAXIMIZE ENGAGEMENT!") which, in their telling, will result in reality collapse, trust collapse, automated fake religions, automated cyberweapons, automated lobbying, and synthetic relationships among a profusion of other regrettable societal setbacks. They even give their bogeyman a name: Gollems, or Gollem-class A.I.s, borrowing from the Jewish folklore a mnemonic for Generative Large Language Multi-Modal Models (GLLMMs). What Harris and Raskin's inclination sacrifices in perspective is that goals-—like in the case of social media's like-button-driven engagement algorithms—-are absent in the case of large language models and in machine learning algorithms more generally. They find themselves caught in a critical framework that does not admit of nuance, denies us a more subtle discussion, and at worst, misrepresents artificial intelligence as a techno-pessimist's catastrophe just waiting to happen. This is not to say that we should live gleefully on the edge, throwing caution to the wind while developing new technologies. But responsible engineering requires a sober assessment of the nature of the problem that confronts us. So what is this monster? Generative AI and large language models are the specific technology that Harris and Raskin see as concerning. While artificial neural networks architectures have been around in various forms for nearly a century, advances in computing power has only very recently allowed research in neural networks to flourish in ways that the general public might notice. In 2017, a new neural network architecture was developed at Google[4] that incorporated a new way of incorporating attention. These were called transformers which account for most large language model architectures today. What this enabled earlier architectures to evolve into were enormous general purpose pre-trained models known as large language models. These pre-trained models are built upon titanic volumes of textual data and boast hundreds of billions of parameters (thinking of model variables or coefficients is close to the mark), giving them prodigious flexibility. This flexibility is how they manage to achieve a synthesis of what Harris and Raskin describe as having previously been the non-overlapping and growing set of artificial intelligence domains of computer vision, speech recognition, and robotics—fields so distinct that they had unique language to speak about their domain. A more familiar example helps to illustrate how LLMs are more like tools than goal-directed agents. A more common and well-understood machine learning method might be linear regression where, let’s say you have fit a model on some data. So once you have trained your model, and assuming that it generalizes well to its domain, you can use it to predict some output given any input. The small amount we lose in subtlety allows us to see LLMs as performing a similar task: once the herculean task of pre-training an LLM has been done, a person or application can use that model to produce any output (such as generated text) given a certain input (a prompt). The output is the result of the user’s inputs. This output is not based on any predefined objective. What this might help us to expose is that large language models are tools that are not use case specific applications in and of themselves. The first point that we now have under our belt is that: LLMs are general-purpose tools. If we stick with terms tool and application here, a tool differs from an application in two obvious ways: 1. When compared to applications, a tool solves a much broader set of problems, and 2. Tools are not designed with specific goals in mind. Sure, a screwdriver has a narrow range of means by which it can accomplish anything. However, the anything that it might accomplish is an undefined and unconstrained set, which could equally be said of a chainsaw, ladder, or spatula. Large language models are examples of tools. Though they are themselves individual instances of models (these model instances being familiar LLMs such as LaMDA, PaLM, GPT, and Gemini), they are commonly confused to function more similarly to traditional machine learning algorithms rather than the models produced by those algorithms. It is in this way that Harris and Raskin seem to be confusing LLMs with social media. On the other hand, the algorithms that steer social media's target of engagement maximization are specific goals. Data signals like a click on a Facebook or Twitter/X like button or the amount of time spent on a TikTok video[5] drive the recommender engines that in turn drive engagement and retention. Or as TikTok sees it, "user value" and "long-term user value"[6]. Every time a user clicks a like button or, on average, hovers a few seconds longer on a particular video, they are providing the platform rich feedback about what categories of content will win their continued attention. These pieces of positive feedback tell the platform specific information about what the user will tend to engage with more in the future-—and this is precisely what the platform feed feeds them. The user is unwittingly, or only somewhat wittingly, party to the creation of their own private, algorithmically curated and diminished content diet. There is a parallel that we've, so far, glossed over. In machine learning pipelines, there are data collected, cleaned, and collated, then fed into an algorithm. This algorithm takes all contained information (observations, records, rows) and creates a mathematical abstraction of those data: a model. This pipeline of data, algorithm, and output model is a shared feature of both social media engagement algorithms and large language models. So where is the distinction that we're claiming that Harris and Raskin seem to miss? Where we begin to see social media as a use case specific application in contrast to a general purpose application is in the inputs. The first difference is that vast range of information fed into each of these examples-—social media engagement data only interests itself in clicks, hovers, and views, while LLMs' training consumption seems almost indiscriminate in contrast. This is largely what explains the much broader domain of applicability of LLMs compared to social media, leading to their seemingly limitless flexibility while social media algorithms are narrowly fixated on serving you any of the slender range of formats available to them. But this is the less interesting point for our purposes here. The more salient point is this: while social media data signals are limited in class and form, as we have seen, their source includes you. In 2018, Mark Zuckerberg testified in a Senate Commerce and Judiciary committee hearing and found himself the lucky recipient of one of the most well known of poorly researched questions publicly uttered[7]. Senator Orrin Hatch of Utah pressed Zuckerberg on prior claims to keep Facebook free, asking "how do you sustain a business model in which users don't pay for your service?". Zuckerberg wryly quipped through a poorly concealed smile, "Senator, we run ads." This fact, painfully obvious to anyone with an internet connection, is the same one that declaws Harris and Raskin's Gollem. The two primary components driving social media's inertia are user engagement and advertising. The more that users are drawn to consume content curated by social media feeds, the more advertising dollars are generated through those ad views. Those ad dollars are then converted back into feature development and algorithmic optimization that help to further drive acquisition and retention of users. And this is where the commonalities between social media and generative LLMs fall short. What of the goals to increased capabilities and entanglement with society? What are LLMs' like buttons and scroll time? There are none. While social media algorithms continue to train on user behavioral data, finely tuning content delivery to users' preferences, LLMs are pre-trained and do not consider prompts submitted yesterday when responding new ones. While social media algorithms take into account each and every user actions and behaviors, LLMs use attention only to provide outputs relevant given a user's entire prompt and previous questions in a specific conversation. While social media algorithms aim to lure users into watching "just one more video", LLMs are trained to provide relevant outputs given any prompt. There are still undoubtedly many areas that we would do well to proceed cautiously, most of which we haven't realized are problems worth considering. Bias exacerbated by systems that are, so far, not well understood and that are growing more complex. Concentration of power and general access to the boon of artificial intelligence. Job displacement and education in quickly changing professional environments. Legal and regulatory challenges related to intellectual property rights and liability as more content is generated by A.I.. Proliferation of misinformation. Hallucinations in A.I.-assisted medical diagnoses and treatments. All of these are legitimate and specific concerns that should be confronted with clear-eyed sanity rather than appealing to bogeymen and bugbears to arouse panic-induced action. We will not succeed in halting technological progress, and we will certainly not succeed in steering or harnessing its benefits if we do not take the time to better understand what it is we are trying to participate in guiding. Notes 1. "The A.I. Dilemma - March 9, 2023", Harris, T. and Raskin, A. Center for Humane Technology, https://youtu.be/xoVJKj8lcNQ 2. "Tristan Harris: Beyond the AI dilemma | CogX Festival 2023", Harris, T., CogX Festival, https://youtu.be/e5dQ5zEuE9Q 3. "Pause Giant AI Experiments: An Open Letter", Future of Life Institute, https://futureoflife.org/open-letter/pause-giant-ai-experiments/ 4. "Attention Is All You Need," Vaswani, Shazeer, Parmar et al., https://arxiv.org/abs/1706.03762 5. "Inside TikTok’s Algorithm: A WSJ Video Investigation," Wall Street Journal, https://www.wsj.com/articles/tiktok-algorithm-video-investigation-11626877477 6. "How TikTok Reads Your Mind," Ben Smith, https://www.nytimes.com/2021/12/05/business/media/tiktok-algorithm.html 7. 2018 Senate Commerce and Judiciary committee hearing, https://youtu.be/n2H8wx1aBiQ