Anything you can do, an AI can do better?
There are a lot of discussions happening about AI. Is it a good thing, is it made ethically, is it biased, and the big one: can it take someone’s job? It certainly has its limitations. If you try to ask it questions about facts, there’s a possibility that it’ll ‘hallucinate’, which is the term coined for when AI just gets it wrong and makes stuff up. It can confidently tell you the answer to anything you care to ask it, though whether it’s correct is tough to say.
But it also has its capabilities. For example, it can generate good code for accomplishing simple tasks. And what seems to be a great strength of these AI is reading information and generating a summary of it — reading a long text so you don’t have to. And that made me wonder: can it do what I do? Can an AI take a text and proofread it? Can it take my job?
Well, I had to find out.
In the red corner
When I started working on this blog there was only one AI to talk about, ChatGPT . I was going to put it through its paces and really test how well it could solve word puzzles. But things changed quickly as the generative AI field ballooned. So instead I’m going to be looking at four different brands of AI. Alongside ChatGPT there are Microsoft Bing, Google Bard and the late entry, Grammarly GO.
They’re all going to be given a couple of test pieces with a mix of errors in them to see what they can pick up and what they miss. And also to get an idea of the flavour of each AI, I’ll be asking them each to invent a word.
Its name is first though it arrived last
Grammarly GO is a recent add-on that leaves the main Grammarly interface unchanged. I heard about Grammarly GO from a YouTube ad, likely served to me because I’m constantly looking up obscure word definitions and grammar rules. It’s marketed as a tool to help you write boring work emails faster. I suppose there are people who would appreciate that, though the long-term implications are a bit weird to think about, AI writing emails that humans don’t read and use other AI to reply to.
I had to be very careful with using Grammarly GO, as using it for free you’re heavily restricted in how many prompts you can enter a month. When you pay you get more, but still a finite amount.
I began by asking Grammarly GO to invent a word. It refused to do it. So, not a great start. But I suppose that’s part of it being a professional tool for professionals. All it’s missing is a ‘this is not a toy’ sticker.
Moving on, I copied in one of my test pieces, this one 750 words long, and gave the AI the instruction ‘proofread this’. It rewrote the piece as a single paragraph summary. While summation is a strength of AI, I was disappointed in how it had interpreted the instruction.
Next, I entered in my second text piece, this one 250 words, and asked the AI to give me a list of changes it would make to correct it to UK English. In response the Grammarly GO reprinted the entire piece with a few changes. Most of the errors it was able to pick up and correct, though there were a few it missed. It’s not very good at continuity, so a character’s hair colour can change suddenly for example. But I had to reread the entire piece and compare it to the original to figure that out.
A fireside ChatGPT
While not the first chat bot in the world, it’s certainly the most famous. With record-setting user growth, it’s likely to become one of the synonymous brand names, in the same way Google, Coca-Cola and Xerox are. It’s uncertain whether ChatGPT itself will be the product that makes it though. It’s free to use with unlimited usage. It has clear limitations in the length of input and output it can handle, and its knowledge base has a hard stop in 2021, which is going to be a growing problem as time advances. For example, I asked it how to generate an image with AI, and while it could give me an overview of how an image generator would work, it couldn’t name any apps as its knowledge base cuts off before the release of DALLE-2 and Midjourney. There is a paid option for the upgraded version, GPT4, which can handle more text at once and supposedly makes less mistakes, but I don’t know as I didn’t try it out.
I asked ChatGPT to invent a word, and it presented me with ‘blissify’, to make something blissful. I thought that was a nice sentiment.
When asked to proofread one of my test pieces, it was able to reprint with corrections first try, but missed the same continuity problem as Grammarly GO and also mixed up two very similar verbs. Seeing as it followed instructions right first try, I tried asking it to list the changes it had made to the text, and that’s where it fell over. It output a list, but it was hallucinating hard, listing changes it hadn’t previously made and leaving changes it had made out of the list. So, no good. With the second test piece, I tried getting ChatGPT to just list changes it would make. It decided to ignore me and reprint the text with some corrections. I guess it had a lane and was sticking to it?
No relation to Chandler Bing
Microsoft Bing, more than either of the previous AIs, is a big deal. While it is basically made out of GPT4, it has something the other’s don’t: internet connectivity. So when you ask it questions, it can search the whole of the internet as its knowledge base. Where ChatGPT can’t tell you about events after 2021, Bing can tell you what’s going on today. It’s completely free to use, and while you can ask as many things as you want, there is a limit on how long your prompts can be, only 4,000 characters. You couldn’t even fit a decent short story in there.
My first question, as usual, was to ask Bing to make up a word. It gave me ‘fleem’, but didn’t supply a definition, it asked me to come up with that myself. An interesting twist.
When it came to asking Bing to proofread, I was limited to only using the 250-word test piece, so entered it twice over two sessions. First, I asked it to proofread the piece. It reprinted with corrections, getting most of the errors. But it missed the continuity error and the similar verbs. When I asked it to list the changes it had made to the piece, Bing started hallucinating. And I got similar results in the second session when I asked it to list changes without making them first. Most suggestions are okay, but some aren’t needed and other changes it had made when reprinting were missing from the list.
Not quite a modern Shakespeare
Bard is made from a different language model to ChatGPT and Bing, and has been in development for years, though its rollout was rushed in the aftermath of ChatGPT’s release. It’s completely free to use with no limits, as Google is trying to avoid losing searching market share to Microsoft. But it fails to be a better product.
When I asked Bard to invent a word, it returned ‘bardify’, meaning to transform something into poem or song. I can’t help but wonder if that was hard coded in, considering which AI generated it.
When asked to proofread the test pieces, Bard returned the now familiar song and dance. It could reprint the piece with most of the errors corrected, but it missed the continuity problems. When asked to list what changes it had made, it started hallucinating and wasn’t helpful. When I asked it to just list changes it would make to a piece, it gave another list that was mostly good but contained some odd ideas, as well as missing one of the continuity problems. Then unprompted, it reprinted the piece with things it didn’t list done and things it did list not done.
A good workman doesn’t blame his tools, but…
What I found with all these AI tools was their limitations. From the restriction on how much you can enter at a time, that none of them could catch every error in the text, the failure across the board to make a list of made or suggested changes, and the base ability to follow instructions.
It’s important to remember that despite their label, these AI are not thinking machines. Because they can output convincing text it’s easy for us to anthropomorphise them. I’ve done it several times in this blog, ‘the AI decided that’, ‘I asked the AI’, ‘the AI replied with’. When given an input, they just output the most statistically likely response, one word at time. There isn’t even any pre-planning before they start generating text, they simply put one foot in front of the other and go on a journey.
This lack of real intelligence is I think what shapes how good they are at proofreading. They can output a recreation of a small piece with most of the errors corrected, but with no change log. So the only way to know what it did is to manually compare the output with what you input, in effect proofreading it yourself. The most efficient method of using these AI is to trust it completely, assuming that the output will be ‘good enough’. On the one hand, that’s how you get text sent to you from a human including the phrase ‘As an AI model…’, which looks unprofessional. On the other, I think a person who would send AI-generated text without reading it themselves wouldn’t have employed a human proofreader in the first place, so there’s no loss of business on my part there.
Where to next?
As said above, some people will try to replace human proofreaders with AI as what they return is ‘good enough’. A more practical use of AI is to give it to writers to use for solving quick proofing questions. ‘What’s the word for [abstract concept]?’, ‘Which of these is grammatically correct, [sentence A] or [sentence B]?’ A form of self-service in the writing industry. But both of these use cases are going to run into problems in the long term. Only one of the AI seen here seems to have a pricing model that is sustainable against the cost of the sheer amount of computing power required to run them. This compounds with the theory about long-term use of AI to sum up internet searches. If the search engines give users decent summaries of what’s on existing websites, no one is going to click through and give those websites ad revenue, so eventually the websites go out of business or put up a paywall. This means that there’s nothing for the AI to summarise, causing the whole system to collapse.
I believe that we are still in the early stages of an AI bubble, and when it pops the cost of using the AI is going to skyrocket, to the point that it’s cheaper to get humans to do the work again.
But what do you think? Do you believe that AI will wipe out entire industries, or is it a passing fad like crypto and the metaverse? Let me know in the comments.
This blog was written by a human.