[This post and its associated comments were originally published on Cohost.]
If you pay any attention at all to AI-related news, you know that today was the day that OpenAI.com released GPT-4, the newest version of its large language model (LLM), and made it available through the paid version of its ChatGPT web-based chatbot. I had already tried out the free version of ChatGPT (based on the previous GPT-3 LLM) and decided it was worth spending $20 for the paid version (ChatGPT Plus) to find out what all the hype was about.
Here are a few of my hype-free thoughts:
First, a minor comment: I really wish that people demoing LLMs would stop doing things like telling the LLM to make all words in the response start with the same letter, or having it produce the response in rhymed verse—both examples from the GPT-4 developer livestream today. These feel like fancy parlor tricks of minimal relevance for how LLMs might actually be used in real life, like showing off a border collie by having it dance on its hind legs.
Second, the most interesting application of GPT-4 I saw today was the Be My Eyes Virtual Volunteer, an app that leverages the new ability of GPT-4 to analyze images and provide a text description of what’s in them. The original Be My Eyes app allowed blind or otherwise visually impaired users to call upon the help of sighted human volunteers: the user would point their smartphone at something and the human volunteer would describe what was shown on camera. The Virtual Volunteer, as its name implies, substitutes GPT-4 for a human.
(Incidentally, this app illustrates an interesting point about how LLM-enabled apps will be tailored to the needs and expectations of users. In the brief demo video of Virtual Volunteer, the audio produced by the app seems very robotic and difficult to understand. This is a feature, not a bug. Experienced users of screen reader software can comprehend computer-generated text-to-speech “spoken” at speeds that would astound the typical sighted person, and in this context having the computer voices sound “natural” is a secondary consideration at best.)
Third, I tried out ChatGPT Plus with GPT-4 for what I would likely use it for, namely helping me write online essays like this one, and gave it a number of example tasks. (I’m skipping including the GPT-4 responses for reasons of length. If there’s real interest I’ll post the entire set of transcripts somewhere.)
In the first task I copied in the entire text of my post “I fought the power law and the power law won” and asked for it to be summarized. The resulting three-paragraph summary was pretty good, hitting all the main points I made. I could definitely see using this feature of GPT-4 as a way to summarize other people’s essays and news stories to see if it’s worth my reading them in full.
In the second task I asked the LLM to explain the meaning of the analogy “AI is to the world of ideas as index funds were to investing,” the topic of another of my posts. Here GPT-4 didn’t do much better than GPT-3 in explaining the analogy at more than a surface level.
I also tried asking about the meaning of a metaphor I was planning to feature in a future post. Here GPT-4 actually produced some useful and interesting meanings, but again not the particular meaning I intended.
As a final task I asked GPT-4 to explain the concept of a “fixation index,” a measure of distance between populations originally used in genetics and then adapted in the study of cultural evolution (a recent interest of mine). Here GPT-4 did very well, producing definitions of the fixation index that made sense in both contexts, and detailed explanations of how the measure might be calculated in both contexts.
But (and there’s often a “but” when it comes to LLMs), I can’t really use the GPT-4 output to help me learn about the concept, because based on others’ experiences with GPT-3 and similar LLMs I can’t trust that the output from GPT-4 is completely correct. I’d have to go back to the original source material and look up definitions and explanations there, and do some work to convince myself that the original source’s explanation matches what GPT-4 produced. But if I’m doing all that, why would I bother with GPT-4?
Some final thoughts:
Given the contempt heaped upon the heads of humanities majors by those promoting the “shape rotators vs. wordcels” meme, I find it amusing that GPT-4 does quite well on various physics and math tests and tests like the bar exam that test a combination of rote knowledge plus deductive ability, but has difficulties with advanced English language and literature tests. One speculation I’ve seen is that current LLMs are not that great at reasoning with abstract and fuzzy concepts, which sounds plausible based on my own experience.
But of course it’s possible that future versions of LLM, be it GPT-5 or others, will overcome this lack, which will only intensify further the confident, even giddy, pronouncements that LLMs will replace humans in most if not all professions involving mental activities. Certainly corporate executives across the business landscape will attempt to do just that—nobody ever got fired for pursuing reductions in labor costs as the primary if not only path to profitability—but how that scenario plays out remains to be seen.
As for me, I’ll pay my $20 a month for at least a little while longer to see how GPT-4 might help improve my own writing and research. But based on my experience thus far I suspect I may end up saving the money and using it either for more specialized LLM-enabled tools, or just buying more volumes of manga.
AndreL (@AndreL) - 2023-03-15 13:29
phind.com has an LLM which cites its sources. The catch is that it also generates plausible nonsense, complete with plausible citations.