Is Perplexity AI a Plagiarism Culprit or the Future of Search?

Key takeaways

  • Perplexity AI controversy: The Perplexity AI plagiarism accusations for allegedly scraping content from websites like Forbes and Wired without proper authorisation or credit.
  • Impact on content creation: Uncredited AI-generated content risks devaluing original work and disincentivising in-depth research, threatening the sustainability of content creation.
  • AI and search rankings: Google is cracking down on AI content in search rankings to protect intellectual property, but platforms like Perplexity AI operate outside Google’s ecosystem.
  • AI data arms race: Partnerships, like Reddit and Google, highlight the competitive nature of AI search engines needing vast amounts of data, risking an imbalance in information access.
  • Future of AI search engines: Transparency in information collection and rewarding content creators are crucial for AI tools to avoid undermining the quality and sustainability of online content.

Robot.txt might be a boring web protocol but it represents much more

A company updating its robot.txt file shouldn’t be newsworthy. For the uninitiated, robots.txt is essentially a website’s guidebook for search engines and other web crawlers, dictating what they can and cannot explore. It’s about as thrilling as a documentary on train schedules. However, Reddit’s announcement has sparked considerable buzz. This isn’t your everyday update—it’s a response to serious allegations of plagiarism by companies like Perplexity AI.

Perplexity AI is an AI-driven search engine, that provides answers conversationally. In the words of its founder Aravind Srinivas, “It’s almost like Wikipedia and ChatGPT had a kid.”

However, far from being a poor orphan child, Perplexity AI is valued at $3bn and backed by investors such as Jeff Bezos. However, accusations that Perplexity AI is unlawfully scraping data have led Amazon, which provides cloud services to Perplexity AI, to investigate if the company has violated their terms of service.

Yet, these allegations are secondary to the real issue at hand. Does this plagiarism by Perplexity AI threaten the future of content creation and search? 

Perplexity AI is acuste of plagiarism

The allegations of plagiarism against Perplexity AI explained

Plagiarism by Perplexity AI has been a focal point of investigations by Wired and Forbes, which revealed that the AI-powered search engine accessed their content thousands of times without authorisation.

Yet, to understand the allegations, it’s important to understand and analyse what Perplexity AI intend to be. Rather than a Google competitor, they seek to be an answer engine – similar to ChatGPT. In this context, they summarise data and information for searchers seeking an answer. 

In May 2024, this capability went further. Perplexity AI launched its “Pages” feature, which enables people to create articles automatically – including pictures and sources. These articles can be shared, including on the Perplexity AI platform via its Discovery tab.

However, Forbes has alleged that Perplexity published a story and an AI-generated podcast suspiciously similar to a Forbes article, without properly crediting Forbes. This included using the same images. Moreover, when Forbes investigated further, they found that similar issues had occurred in other articles, impacting numerous publishers.

Yet, Wired added further fuel to the fire. They have accused Perplexity AI of disregarding their robot.txt file instructions which blocked the answer engine from crawling their website. They claim Perplexity AI can scrape information and train its AI algorithms on data explicitly blocked from crawling by third-party bots. This is a key element of training their AI, without proper referencing or compensation to content creators.

Allegations of plagiarism by Perplexity AI risk the future of content creation 

The ramifications for content creators are clear. While AI-generated summaries and content can be valuable tools, the potential for misuse is concerning. 

Firstly, uncredited content depreciates value. If tools like Perplexity AI can copy content verbatim and present it as original, they undermine content creators and risk disincentivising them. Consequently, we are left with a situation where in-depth research, analysis and insights are discouraged. AI will change the search engine and content creation industries, but we must protect the basic incentives for rich insights.

For instance, imagine a scenario where a health tech company spend significant sums on collecting data. This is used for their content creation. However, rather than being rewarded with new customers, the traffic and credit go to tools like Perplexity AI. Such a scenario isn’t sustainable. Therefore, it’s a problem for content creation and how search engines and modern businesses work.

Secondly, it’s important to acknowledge AI chatbots are here to remain. Consequently, how we access information will be transformed. Yet, without proper attribution, we run the risk of a loss of trust. This can breed misinformation where people expect these tools to provide only half-truths without any way of verifying their authority.

The Perplexity AI plagiarism scandal should concern the search engine industry

Google has already started cracking down on AI content via its search engine. Consequently, AI can harm your search rankings. While this can feel frustrating for many, since they can’t scale their content creation quickly, such a move should be applauded. It helps protect the intellectual property of content creators and SEOs alike.

However, platforms like Perplexity AI operate outside of Google’s ecosystem. Therefore, if such tools are allowed to continue growing unchecked, we risk creating a vicious cycle. This will see the tools designed to enhance information accessibility ultimately diminish the pool of original content available, leading to a degradation of the overall information ecosystem.

The AI data arms race

Yet, copying content is not the only risk permeating out of the Perplexity AI scandal. The announcement from Reddit may sound like a move to block companies like Perplexity AI, but it could also be a way to protect their existing partnerships.

In February 2022, Reddit and Google signed a partnership, enabling the search giant to use the platform as training data for its AI. However, access to such data is critical for any emergent AI search engine. The more data they have, the better the system.

So, is this the start of a data war between AI search engines? It’s certainly possible. But, as seen with ChatGPT’s business model, this may lead to further issues of information equality and placing too much power in the hands of the few.

The hope for the AI generation is that information access will be democratised. Yet, early signs are this will not be the case. Instead, the AI search engine race will be won by those best funded with content creators as an afterthought.

What will this mean for the future?

With the surge of new AI search engines, we face a survival-of-the-fittest scenario. Google, a ten-ton gorilla has a commanding start – owning 80% of the global search market. Not only do they have deep pockets but they have rolled out AI tools like AI Overviews and Gemini. This makes them a formidable opponent. Despite being well-funded, new contenders like Perplexity AI face a steep uphill battle just to survive.

Yet, the issue of plagiarism by Perplexity AI serves as a warning. AI companies must be transparent about how they collect information. Moreover, rewarding content creators should be a key focus for all AI search tools. Without this basic ingredient, they risk discouraging creators while simultaneously destroying the basic principles of AI – the better the quality of information, the more effective the AI becomes.

Moreover, the robot.txt file is clearly no longer fit for purpose. Developed in the dial-up era the robot.txt protocol was launched in 1994 making it about as useful as a fax machine against AI. It is simply not enough to handle the complexities of AI. Therefore, we need new rules that reflect this AI information age.

AI-powered search is exciting, but we must ensure it’s done right. We need a future where search engines actually enhance, not exploit, the web and content creators.


Submit a Comment

Your email address will not be published. Required fields are marked *