16 May 2023

You’re Probably Underestimating AI Chatbots

STEVEN LEVY

IN THE SPRING of 2007, I was one of four journalists anointed by Steve Jobs to review the iPhone. This was probably the most anticipated product in the history of tech. What would it be like? Was it a turning point for devices? Looking back at my review today, I am relieved to say it’s not an embarrassment: I recognized the device’s generational significance. But for all the praise I bestowed upon the iPhone, I failed to anticipate its mind-blowing secondary effects, such as the volcanic melding of hardware, operating system, and apps, or its hypnotic effect on our attention. (I did urge Apple to “encourage outside developers to create new uses” for the device.) Nor did I suggest we should expect the rise of services like Uber or TikTok or make any prediction that family dinners would turn into communal display-centric trances. Of course, my primary job was to help people decide whether to spend $500, which was super expensive for a phone back then, to buy the damn thing. But reading the review now, one might wonder why I spent time griping about AT&T’s network or the web browser’s inability to handle Flash content. That’s like quibbling over what sandals to wear just as a three-story tsunami is about to break.

I am reminded of my failure of foresight when reading about the experiences people are having with recent AI apps, like large language model chatbots and AI image generators. Quite rightfully, people are obsessing about the impact of a sudden cavalcade of shockingly capable AI systems, though scientists often note that these seemingly rapid breakthroughs have been decades in the making. But as when I first pawed the iPhone in 2007, we risk failing to anticipate the potential trajectories of our AI-infused future by focusing too much on the current versions of products like Microsoft’s Bing chat, OpenAI’s ChatGPT, Anthropic’s Claude, and Google’s Bard.

This fallacy can be clearly observed in what has become a new and popular media genre, best described as prompt-and-pronounce. The modus operandi is to attempt some task formerly limited to humans and then, often disregarding the caveats provided by the inventors, take it to an extreme. The great sports journalist Red Smith once said that writing a column is easy—you just open a vein and bleed. But would-be pundits now promote a bloodless version: You just open a browser and prompt. (Note: this newsletter was produced the old-fashioned way, by opening a vein.)

Typically, prompt-and-pronounce columns involve sitting down with one of these way-early systems and seeing how well it replaces something previously limited to the realm of the human. In a typical example, a New York Times reporter used ChatGPT to answer all her work communications for an entire week. The Wall Street Journal’s product reviewer decided to clone her voice (hey, we did that first!) and appearance using AI to see if her algorithmic doppelgängers could trick people into mistaking the fake for the real thing. There are dozens of similar examples.

Generally, those who stage such stunts come to two conclusions: These models are amazing, but they fall miserably short of what humans do best. The emails fail to pick up workplace nuances. The clones have one foot dragging in the uncanny valley. Most damningly, these text generators make things up when asked for factual information, a phenomenon known as “hallucinations”' that is the current bane of AI. And it’s a plain fact that the output of today’s models often have a soulless quality.

In one sense, it’s scary—will our future world be run by flawed “mind children,” as roboticist Hans Moravec calls our digital successors? But in another sense, the shortcomings are comforting. Sure, AIs can now perform a lot of low-level tasks and are unparalleled at suggesting plausible-looking Disneyland trips and gluten-free dinner party menus, but—the thinking goes—the bots will always need us to make corrections and jazz up the prose

Yet it’s folly to draw definitive conclusions based on these early versions of the technology, including the shotgun blast of AI updates announced by Google this week. Folks, this is an illusion. Today’s chatbots are taking baby steps in a journey that will rise to Olympic-level strides. Oren Etzioni, former CEO of nonprofit research lab the Allen Institute for AI, told me this week that they’re already getting better. One force driving that trend is the millions of users—including reporters trying to goad the systems into doing nutty things that make good copy—exposing areas in need of improvement. Computer scientists, whose specialty is optimizing, after all, have been hard at work addressing the flaws.

“Hallucinations have already dropped substantially,” Etzioni says. The next wave of evolution will come as scientists figure out how to level up the quality and capabilities of these algorithmic super brains. “I expect continued breathtaking innovation in the near future because it's just so early in the cycle of understanding and building these generative technologies,” he says.

I guarantee you that 10 years from now tasking a large language model with business communications is going to be nothing like using today’s beta versions. The same goes for search, writing a college essay, or running a political ad campaign. Not to mention churning out a network sitcom—which is why, in part, screenwriters are now on strike. The Writers’ Guild understands that GPT-4 can’t crank out an acceptable version of Young Sheldon right now but GPT-19 might actually make that series funny.

This doesn’t necessarily mean that humans are doomed. As with previous technological breakthroughs, the sweet spot might lie in collaborations that automate drudge work and leave the most nuanced forms of creation to humans. As the tech improves, our new era will be marked by a fuzzy borderline between copilot and autopilot. My guess is that over time, more and more activities will cross over to the autopilot side. It won’t be long before AI circa 2023 looks like television sets from the early 1950s. Or the iPhone before the app store, which launched a year after the device appeared.

Before I get off the phone with Etzioni, I ask him to envision AI as a motion picture. In that movie, how far along in the narrative would we currently be? He thinks for a moment before answering, but when he responds, there is little doubt in his voice. “We have just watched the trailer,” he says. “The movie has not even started.”

Time Travel

It’s almost exactly 15 years since I tested the iPhone. My June 2007 review appeared in Newsweek. I liked it!

Apple is already working hard at improving this first version of the iPhone. I think the best way to make it more valuable would be to encourage outside developers to create new uses for it, and Apple has indicated that they are welcoming Web-based applications geared to their new device. But as the Google Maps program shows, the results of a separate client application created for the iPhone can be spectacular, and I think the company will do well to keep those coming …

Bottom line: In a sense, the iPhone has already made its mark. Even those who never buy one will benefit from its advances, as competitors have already taken Apple's achievements as a wake-up call to improve their own products. But for all its virtues, the iPhone is still a risky venture because it's yet to be proven that, despite the wow factor, millions of people are ready to pay several hundred dollars more than the going rate for phones—and in some cases, paying even more to bail out of their current mobile contracts. There's also a potential backlash from those sick of the hype. During our iPhone conversation, however, Jobs professed that he wasn't concerned about inflated hopes, and certainly not whether he would meet his own projections of 10 million sold in 2008: "I think we're going to blow away the expectations."

Certainly all those people lining up to buy iPhones will find their investment worthwhile, if only for the delight they get from dazzling their friends. They will surely appreciate the iPhone's features and the way they are intertwined to present a unified experience. But in the future—when the iPhone has more applications and offers more performance, with a lower price—buyers will find even more value. So smart consumers may well wait for that day. But meanwhile they can only look with envy as the person sitting next to them on the subway, or standing ahead of them in the Whole Foods line, is enjoying the phone that finally fulfills the promise of people-friendly palm-top communication and computing.

Ask Me One Thing

Chris asks, “Why aren't media outlets and AI critics doing more to point out the terrible environmental cost in creating large language models and AI in general?”

Hi, Chris. There’s no question that the new breed of AI models require a lot of computation, which means huge energy consumption. And though there has been reporting on the subject, you’re right that most articles concentrate on the technology, not the environmental impact. Maybe that’s because compared to mining Bitcoin and flying airplanes, the energy consumed by LLMs isn’t overwhelming. One study calculated that the energy Google consumed by training those models in 2019, for instance, was less than .005 percent of its total consumption—training a single model used the equivalent of providing electricity to about 100 homes. Another study figured that training one big LLM model churned out 626,155 pounds of CO2 emissions—which is roughly equal to what five automobiles would produce in their lifetime. Running an LLM after it has been trained uses much less energy, though of course when millions are making queries the servers get a workout.

When you calculate the total amount of energy consumed by AI, the numbers get bigger, and they will grow even more as we routinely use those power-hungry applications. Currently, an estimated 15 percent of Google’s data center energy comes from AI activities. That’s enough to power the city of Atlanta for a year. Mitigating this somewhat is the fact that big companies like Google and Microsoft try very hard to keep energy consumption in their data centers to a minimum. Google aims to make all its facilities and data centers carbon neutral by 2030. But there’s no way around it—ChatGPT, Bard, and Bing collectively draw tons of electricity as they chat with their millions of users. Nonetheless, I do think the ultimate measure of AI will come from its impact on the way we work, learn, and entertain ourselves.

No comments: