LLM and Generative AI (Open-AI | CHAT-GPT) in SEO
The out-of-this-world writing capability of OpenAI’s ChatGPT has sparked never-ending debates in online communities with the massive public interest. Writers are afraid of a replacement by AI tools such as ChatGPT. Yet, that may not happen anytime soon. It lacks creativity, uniqueness, and that ‘human touch’ when it comes to writing.
However, it can be an excellent addition to a writer’s toolset for crafting new ideas, getting inspiration, performing research, reducing mundane tasks, improving grammar, and much more. And, many writers have already jumped on the bandwagon and started using Chat GPT in their day-to-day tasks.
The quality of ChatGPT content is astounding, so the idea of using it for SEO purposes should be addressed.
How ChatGPT Can Do What It Does
“Google may be only a year or two away from total disruption,” Gmail creator Paul Buchheit tweeted, adding that AI will be able to “instantly do what would take many minutes for a human” to do using a search engine like Google.
In a nutshell, ChatGPT is a type of machine learning called a Large Learning Model.
A large learning model is an artificial intelligence that is trained on vast amounts of data that can predict what the next word in a sentence is.
The more data it is trained on the more kinds of tasks it is able to accomplish.
Sometimes large language models develop unexpected abilities.
Stanford University writes about how an increase in training data enabled GPT-3 to translate text from English to French, even though it wasn’t specifically trained to do that task.
Large language models like GPT-3 (and GPT-3.5 which underlies ChatGPT) are not trained to do specific tasks.
They are trained with a wide range of knowledge which they can then apply to other domains.
This is similar to how a human learns. For example if a human learns carpentry fundamentals they can apply that knowledge to do build a table even though that person was never specifically taught how to do it.
GPT-3 works similar to a human brain in that it contains general knowledge that can be applied to multiple tasks.
The Stanford University article on GPT-3 explains:
“Unlike chess engines, which solve a specific problem, humans are “generally” intelligent and can learn to do anything from writing poetry to playing soccer to filing tax returns.
In contrast to most current AI systems, GPT-3 is edging closer to such general intelligence…”
ChatGPT incorporates another large language model called, InstructGPT, which was trained to take directions from humans and long-form answers to complex questions.
This ability to follow instructions makes ChatGPT able to take instructions to create an essay on virtually any topic and do it in any way specified.
It can write an essay within the constraints like word count and the inclusion of specific topic points.
ChatGPT can write essays on virtually any topic because it is trained on a wide variety of text that is available to the general public.
There are however limitations to ChatGPT that are important to know before deciding to use it on an SEO project.
The biggest limitation is that ChatGPT is unreliable for generating accurate information. The reason it’s inaccurate is because the model is only predicting what words should come after the previous word in a sentence in a paragraph on a given topic. It’s not concerned with accuracy.
That should be a top concern for anyone interested in creating quality content.
1. Programmed to Avoid Certain Kinds of Content
For example, ChatGPT is specifically programmed to not generate text on the topics of graphic violence, explicit sex, and content that is harmful such as instructions on how to build an explosive device.
2. Unaware of Current Events
Another limitation is that it is not aware of any content that is created after 2021.
So if your content needs to be up to date and fresh then ChatGPT in its current form may not be useful.
3. Has Built-in Biases
An important limitation to be aware of is that is trained to be helpful, truthful, and harmless.
Those aren’t just ideals, they are intentional biases that are built into the machine.
It seems like the programming to be harmless makes the output avoid negativity.
That’s a good thing but it also subtly changes the article from one that might ideally be neutral.
In a manner of speaking one has to take the wheel and explicitly tell ChatGPT to drive in the desired direction.
Here’s an example of how the bias changes the output.
I asked ChatGPT to write a story in the style of Raymond Carver and another one in the style of mystery writer Raymond Chandler.
Both stories had upbeat endings that were uncharacteristic of both writers.
In order to get an output that matched my expectations I had to guide ChatGPT with detailed directions to avoid upbeat endings and for the Carver-style ending to avoid a resolution to the story because that is how Raymond Carver’s stories often played out.
The point is that ChatGPT has biases and that one needs to be aware of how they might influence the output.
4. ChatGPT Requires Highly Detailed Instructions
ChatGPT requires detailed instructions in order to output a higher quality content that has a greater chance of being highly original or take a specific point of view.
The more instructions it is given the more sophisticated the output will be.
This is both a strength and a limitation to be aware of.
The less instructions there are in the request for content the more likely that the output will share a similar output with another request.
As a test, I copied the query and the output that multiple people posted about on Facebook.
When I asked ChatGPT the exact same query the machine produced a completely original essay that followed a similar structure.
The articles were different but they shared the same structure and touched on similar subtopics but with 100% different words.
ChatGPT is designed to choose completely random words when predicting what the next word in an article should be, so it makes sense that it doesn’t plagiarize itself.
But the fact that similar requests generate similar articles highlights the limitations of simply asking “give me this. ”
5. Can ChatGPT Content Be Identified?
Researchers at Google and other organizations have for many years worked on algorithms for successfully detecting AI generated content.
There are many research papers on the topic and I’ll mention one from March 2022 that used output from GPT-2 and GPT-3.
The research paper is titled, Adversarial Robustness of Neural-Statistical Features in Detection of Generative Transformers (PDF).
The researchers were testing to see what kind of analysis could detect AI generated content that employed algorithms designed to evade detection.
They tested strategies such as using BERT algorithms to replace words with synonyms, another one that added misspellings, among other strategies.
What they discovered is that some statistical features of the AI generated text such as Gunning-Fog Index and Flesch Index scores were useful for predicting whether a text was computer generated, even if that text had used an algorithm designed to evade detection.
6. Invisible Watermarking
Of more interest is that OpenAI researchers have developed cryptographic watermarking that will aid in detection of content created through an OpenAI product like ChatGPT.
A recent article called attention to a discussion by an OpenAI researcher which is available on a video titled, Scott Aaronson Talks AI Safety.
The researcher states that ethical AI practices such as watermarking can evolve to be an industry standard in the way that Robots.txt became a standard for ethical crawling.
He stated:
“…we’ve seen over the past 30 years that the big Internet companies can agree on certain minimal standards, whether because of fear of getting sued, desire to be seen as a responsible player, or whatever else.
One simple example would be robots.txt: if you want your website not to be indexed by search engines, you can specify that, and the major search engines will respect it.
In a similar way, you could imagine something like watermarking — if we were able to demonstrate it and show that it works and that it’s cheap and doesn’t hurt the quality of the output and doesn’t need much compute and so on — that it would just become an industry standard, and anyone who wanted to be considered a responsible player would include it.”
The watermarking that the researcher developed is based on a cryptography. Anyone that has the key can test a document to see if it has the digital watermark that shows it is generated by an AI.
The code can be in the form of how punctuation is used or in word choice, for example.
He explained how watermarking works and why it’s important:
“My main project so far has been a tool for statistically watermarking the outputs of a text model like GPT.
Basically, whenever GPT generates some long text, we want there to be an otherwise unnoticeable secret signal in its choices of words, which you can use to prove later that, yes, this came from GPT.
We want it to be much harder to take a GPT output and pass it off as if it came from a human.
This could be helpful for preventing academic plagiarism, obviously, but also, for example, mass generation of propaganda — you know, spamming every blog with seemingly on-topic comments supporting Russia’s invasion of Ukraine, without even a building full of trolls in Moscow.
Or impersonating someone’s writing style in order to incriminate them.
These are all things one might want to make harder, right?”
The researcher shared that watermarking defeats algorithmic efforts to evade detection.
Limitations of CHAT-GPT in terms of SEO
1. AI Content is Detectable
Many people say that there’s no way for Google to know if content was generated using AI.
I can’t understand why anyone would hold that opinion because detecting AI is a problem that has more or less already been solved.
Even content that deploys anti-detection algorithms can be detected (as noted in the research paper I linked to above).
Detecting machine generated content has been a subject of research going back many years, including research on how to detect content that was translated from another language.
Autogenerated Content Violates Google’s Guidelines?
Google’s John Mueller in April 2022 said that AI generated content violates Google’s guidelines.
“For us these would, essentially, still fall into the category of automatically generated content which is something we’ve had in the Webmaster Guidelines since almost the beginning.
And people have been automatically generating content in lots of different ways. And for us, if you’re using machine learning tools to generate your content, it’s essentially the same as if you’re just shuffling words around, or looking up synonyms, or doing the translation tricks that people used to do. Those kind of things.
My suspicion is maybe the quality of content is a little bit better than the really old school tools, but for us it’s still automatically generated content, and that means for us it’s still against the Webmaster Guidelines. So we would consider that to be spam.”
Google recently updated the “auto-generated” content section of their developer page about spam.
Created in October 2022, it was updated near the end of November 2022.
The changes reflect a clarification about what makes autogenerated content spam.
It initially said this:
“Automatically generated (or “auto-generated”) content is content that’s been generated programmatically without producing anything original or adding sufficient value;”
Google updated that sentence to include the word “spammy”:
“Spammy automatically generated (or “auto-generated”) content is content that’s been generated programmatically without producing anything original or adding sufficient value;”
That change appears to clarify that simply being automatically generated content doesn’t make it spammy. It’s the lack of all the value-adds and general “spammy” qualities that makes that content problematic.
ChatGPT could work on Content Watermarking
Lastly, the OpenAI researcher said (a few weeks prior to the release of ChatGPT) that watermarking was “hopefully” coming in the next version of GPT.
So ChatGPT may at some point become upgraded with watermarking, if it isn’t already watermarked.
Optimized use of AI
The best use of AI tools is for scaling SEO in a way that makes a worker more productive. That usually consists of letting the AI do the tedious work of research and analysis.
Summarizing webpages to create a meta description could be an acceptable use, as Google specifically says that’s not against its guidelines.
Using ChatGPT to generate an outline or a content brief might be an interesting use.-
Handing off content creation to an AI and publishing it as-is might not be the most effective use of AI if it isn’t first reviewed for quality, accuracy and helpfulness.