In Delante, before we, SEOs, swing into action, we always run an in-depth analysis of a client’s website. Only in this way are we able to spot possible issues soon enough to get them fixed before they play havoc with website performance.
The unpalatable truth is that, on most occasions, the clients that come to us have no clue there is something wrong with their website. The same applies to the case I’m going to use as an example today.
When the client asked us to help make their website more likable for search engines, they didn’t know that it was the CMS that had been rocking the boat. Simply put, Joomla was making the client’s efforts futile by creating duplicate content every time a URL was updated.
Completely unaware of this issue, the client was making the problem even more serious by – ironically – doing the right thing, which was setting SEO-friendly URLs for new pages. As you may guess, it harmed SEO pretty badly.
Read on to learn how we managed to identify and come to grips with the Joomla duplicate content problem. See how after introducing the fundamental changes to the CMS, you can double the organic traffic.
By the time you finish reading this article, you will have learned:
- why it’s crucial for you to know the specificity of the tools you use
- how the Joomla content management system duplicates the pages you set custom URLs for
- what problems are caused by duplicate content, and what friendly URLs are
- how to spot duplicate content and find the source of the problem
- how to solve duplicate content problem resulting from duplicate pages with various URL structures
- how to protect your website from the possible Joomla automatic content duplication that may happen in the future
The tools you use for your day-to-day operations somehow force you to do particular things in a certain way. What I mean by that, before settling for a tool, you should learn how it works and what exactly you can use it for. Only if you explore the range of possibilities it offers, can you make a conscious decision.
It’s worth realizing though that there is no such thing as the perfect do-it-all tool. Regardless of what you go for, you need to keep in mind that each tool has its limitations. Things get serious when you’re unaware of the tool’s weak points that make your work less effective – just as it happened with one of my clients.
Due to the specificity of the CMS the client used, which is Joomla, their domain was filled up with duplicate content. Even though the client did everything by the book and followed the best practices of creating URLs, their website suffered from low visibility, hence low traffic.
Although the client has great marketing and dev teams that in the majority of cases handle their CMS well, they didn’t recognize the issue with Joomla automatic content duplication. They were unaware that the CMS of their choice was responsible for dragging the website down in SERP.
Wanting to improve the website’s performance in search engines, the client contacted us for help. Luckily, it didn’t take us long to look into the problem and find the main culprit – Joomla content management system, or rather its automatic content duplication it was causing.
Joomla Automatic Content Duplication
How does it happen that Joomla duplicates content automatically?
Here’s what we found out.
Whenever our client added a new page to the domain, they set a custom URL to make it readable both for the users and bots. Creating such an easy-to-understand address is definitely worth doing as it describes the content of a page, improving UX and SEO. However, the client didn’t know one of the CMS’s functionalities which caused them trouble.
Joomla automatic content system automatically assigns a URL with parameters to each new page added to a domain, so it looks like this:
When the admin changes this query string, making it more friendly for users and bots, like here:
the CMS simply duplicates the page, creating another place with the exact same web content. The only thing that differentiates both pages is the URL. This is basically how the duplicate content was created on its own, causing low website ranks.
Truth be told, the automatic content duplication isn’t that problematic unless you’re aware it takes place. Our client had no clue about it, meaning the more content they created, the higher level of duplication there was. In other words, the new articles published on the client’s website didn’t serve their purpose.
We learned about this issue while running the SEO audit, which is the obligatory stage we always go through before creating an SEO strategy for our clients. This way we are able to spot such bugs and issues and fix them in order to make a solid foundation for our further actions.
Before going deeper and analyzing this case fully, let’s briefly define two key terms – just for the sake of clarity.
An SEO-friendly URL should:
- be short and simple
- be readable for the user
- describe the content of the page
- include a keyword the page is optimized for
Such easy-to-understand URLs don’t only have a positive impact on SEO but they also contribute to higher CTR. In other words, SEO-friendly URLs improve UX and increase the chance of getting more referral links.
Here are examples of good and bad URLs:
To put it in a nutshell, content duplication is identical content used in more than one place on the internet. There are two types of content duplication:
- external (cross-domain duplicates) – the same piece of content is published on two different web pages
- internal – the same piece of content is published on two pages within one domain
As a result, when a user types a phrase into a search engine, Google may display not the page you want in the results.
Moreover, ever since the Panda algorithm was introduced, having duplicate content published on the website can contribute to lower website rank. Sadly, it also impacts the remaining pages within a domain, even the ones containing unique content.
All in all, duplicate content has nothing but downsides. For that reason, it should be avoided, and when found, it should be deleted.
How We Found the Joomla Duplicate Content Problem
Now, when all possible doubts are cleared up, we can move on to how running an audit helps us discover problems with websites that even the owners don’t know they’re dogged by.
Regardless of the industry, our client works in, we always investigate each website thoroughly. This way we’re able to eliminate even the tiniest problems right away, minimizing the risk of setbacks. Otherwise, all our attempts to improve SEO may be running into the sand.
Joomla Automatic Content Duplication – Site: Analysis
One of the basic steps we take during an SEO audit is checking whether all target keywords call out the client’s website in SERP. We do that by typing the site: formula into the Google search bar.
This formula allows us to narrow down the results just to the domain of our choice. The prime goal is to verify which pages are displayed in SERP after typing the keywords our client wants to be identified with. Sometimes, however, the site: formula helps us find a problem that otherwise would remain undiscovered.
Here’s how you can do this. The formula you need to type into the Google search bar looks as follows:
Press Enter and you will see a list of pages within one domain that appear in SERP after typing a given keyword.
To make it even clearer for you, let’s use the Delante domain to check which pages are called out by word duplication.
As you can see, each page covers the topic of duplication and each URL is SEO-friendly.
Now, let’s move on to our client’s case.
When I used the formula and typed the target keywords, I noticed that page titles and H1s were okay. However, when I inspected the URLs, it turned out that none of them were SEO-friendly. All of the URLs featured a question mark that was followed by a string of irrelevant characters.
After checking more target and branded keywords, I noticed that the problem was far more serious. I had to dig deeper, and therefore I reached out to Screaming Frog – an advanced SEO audit tool. The findings were pretty striking.
Joomla Automatic Content Duplication – Screaming Frog Analysis
To confirm my suspicions, I opened Screaming Frog.
Actually, I wanted to do two things:
- check out whether the duplicate content was really an issue, and
- get a list of all non-optimized URLs with parameters.
Screaming Frog’s crawlers collect various items of information, which was crucial for me at that time when I was auditing the client’s website.
In case you’d like to check out whether the duplicate content is what harms your website, go through the following steps:
STEP 1 Open Screaming Frog. Introduce your website URL (top of the screen), and click the gray Start button (see the print screen below).
STEP 2 To verify if the duplicate content issue exists on your website, look at the tabs on the right. Pay close attention to the values shown in Page titles and H1, and later check the level of duplicate content on the left side panel.
STEP 3 The Page title tab allows you to verify more than just the title – you can also use it to analyze the meta descriptions (see the first print screen).
As shown in the screenshots, there is no duplicate content problem on the website as we have already fixed it. However, before we did that, all of the elements marked on the screenshot indicated a huge problem with duplicate content.
If you notice that the titles and headings are duplicated, treat it as a red flag – this is a sign that the same piece of writing is published on a few pages.
Finally, Screaming Frog gives you a complete list of URLs of the website you’re verifying (see the second screenshot: the list of URLs is provided on the left). Having it at your disposal, you can see which addresses were set by Joomla automatically. In other words, you will get the complete list of the URLs with parameters (see the “bad address” above).
How to Solve the Joomla Automatic Content Duplication Issue
The moment I noticed the sign of duplicate content (site:), and confirmed it via Screaming Frog, I had a strong suspicion that it was Joomla that caused the problem.
Here is what happens.
Every time a new page is added to a website, Joomla automatically uses parameters to create a URL and assigns it to the very page. When the URL is customized manually, Joomla duplicates the page, creating two identical entities within one domain.
I contacted the client (let me add that the client is an easygoing and sure-footed person, which made our cooperation harmonious and effective) to find out whether my theory was right. It turned out that unaware of the specificity of Joomla, the client was setting SEO-friendly URLs, thereby causing duplicate content completely unintendedly.
Having the whole picture, I could move on to dealing with the mess made by the Joomla content management system.
Stay up to Date with News from the SEO World
Join our Newsletter
Joomla Automatic Content Duplication – Gathering All the Duplicated URLs
The very first step that I had to take was collecting all the URLs with parameters in one place. To do so, I used Screaming Frog that not only shows the percent value of duplicate content but also displays the non-optimized addresses.
Once I had them, I transferred the list to a spreadsheet:
In this simple way, I created a form that facilitated my client and me working on the Joomla automatic content duplication.
Joomla Automatic Content Duplication – Removing Duplicated Content
There were a few ways to get rid of the duplicated content that content management Joomla had created unpromptedly.
1st SOLUTION It seemed that the easiest way to deal with the “bad” addresses would be just by redirecting them to the homepage. This would require writing one simple formula in code, which the client’s developer would introduce in a snap. However, such a solution wouldn’t be effective. After all, all the “bad” links would bring the users and bots to one place – the homepage.
That’s why we had to come up with a 2nd SOLUTION. Redirecting the pages with “bad” URL structure to their copies with an SEO-friendly URL structure would make the most sense from the search engine optimization standpoint. Yet, it wouldn’t be effective in terms of the time spent on this task. Matching the corresponding pages would be almost a never-ending job, as it would take plenty of time.
Taking the page title as the indicator wasn’t always a good way to go. Sometimes finding the duplicate content required analyzing a larger group of pages, which wouldn’t be time-efficient.
Trying to find a less time-consuming way to deal with the issue, we came up with a 3rd SOLUTION. This one seemed to be easy – using the robots.txt file to disallow crawling of the websites with the “bad” URL structures. It would allow us to make global changes, but since the number of duplicate pages was pretty high, the process would – again – last too long.
However, this time it was the crawler that would need to process all the information and go through the following stages:
- find the blocked pages
- remove the pages from indexing
- find new pages with an SEO-friendly URL structure
The thing was that we had to speed up the process because some of the duplicate pages were crucial for our client’s business. Therefore, we had to come up with yet another, 4th SOLUTION, that would cut down on the time the deletion of duplicate content occupies, maximizing the benefits brought by the process.
- We provided our client with the list of non-optimized URLs that directed visitors and bots to the duplicate content (the same list you saw in Joomla Automatic Content Duplication – Gathering All the Duplicated URLs).
- From that list, the client selected the most important pages, in a way sorting out the sheep (important pages) from the goats (less important pages).
- This way we had an exact number of pages of the utmost importance that we had to focus on first. We found their duplicates manually and asked our dev team to redirect each one of them individually.
- For the remaining, less important pages that didn’t generate much organic traffic or impressions, we used the third solution mentioned above – we excluded them from the index using the robots.txt file. We could do it, meaning block them, without affecting the results negatively.
This is the final version of the URL list:
Even though it’s better to leave the redirection process to the web developer, you can make the changes in the robots.txt file on your own. Actually, by doing so, you not only solve the existing problem but you’ll also prevent it from occurring and messing up with your website again.
Let me tell you how you can do this.
Joomla Automatic Content Duplication – Disallow in robots.txt as a Way of Problem Solving and Future-Proofing
I strongly advise you to prevent crawlers from indexing the pages with non-optimized URL structures. This way every single “bad” URL will be ignored by Google bots, thus adding only SEO-friendly URLs to the index. Simply put, by modifying the robots.txt file, you steer clear of all the problems caused by Joomla automatic content duplication.
Naturally, you may leave this task to your web developer, yet if you have spare 5 minutes and access to the files on your FTP, you may try doing this yourself.
To get access to your FTP, use either Total Commander or FileZilla. Since I prefer the former, I’m going to use this file manager to find the exact location of robots.txt in the directory structure.
Here’s how you can do it:
STEP 1 Open Total Commander and log into your FTP.
STEP 2 Choose the folder containing your website files.
I had to cover up the filenames but I’m sure you know which folder you keep the website files in.
STEP 3 Open the public_html file.
STEP 4 Find robots.txt. Use Notepad to open the file. The updated version of robots.txt will be automatically uploaded to the FTP.
In the below screenshot, you can see the basic version of robots.txt. It contains only a sitemap and the denied access – bots aren’t allowed to access CMS.
STEP 5 To deny crawlers access to the pages with “bad” URL structures (the ones responsible for content duplication) you need to use the disallow: function.
To do it globally, you need to find the mutual elements that connect the “bad” addresses, which at the same time don’t appear in the “good” URLs.
In the example below, there were a few mutual elements, so every single one of them had to be introduced to the robots.txt.
STEP 6 In Notepad type disallow: and follow it with the element that denies crawlers access to certain pages.
In my case, the part that denies crawlers access to the duplicated pages looks like this:
That’s it. You’ve just disallowed the bots to crawl duplicate pages.
As you can see, in our case there are three elements: index.php?itemid, view, and task that block the crawlers from indexing. These three parameters were automatically given to the duplicate pages, causing the duplicate content problem.
Effects of Solving the Issue of Joomla Automatic Content Duplication
Coming back to my client’s case, the redirections and the disallow command were introduced at the beginning of June 2022. Removing duplicate content and using SEO-friendly URLs to facilitate the identification of the target keywords brought out almost immediate effects.
As you can see in the screenshot below, it took us just a couple of weeks to see a significant increase in organic traffic. It remains high even during the low summer season -for example, when comparing last year’s July to this year’s July, the increase in organic traffic improved by 19%.
What’s more, we see the constant growth of the website visibility (see the second chart), which wouldn’t take place if the problem of duplicate content wasn’t dealt with.
The green line marks the moment we started working on the client’s website.
Even if you’re an amazing driver, you may not be aware of all the quirks and features of your car that may knock you for a loop when least expected. Analogously, even if you’re an amazing online business owner, the quirks and features of your CMS, which you’re unaware of, may cause a sudden decrease in visibility, unexplained conversion rate fall, or unexpected website rankings drop.
You may do everything by the book: publish unique and value-adding content on your website, make sure all your URLs are SEO-friendly, and include keywords in every single heading that one can find on your website. Yet, there may be some issues with your content management system that scuttles all your hard work.
Therefore, if you want to go a long way with your online business strategy, you need to get your website checked by the SEOs – the same way you get your car checked by an auto mechanic before setting off on a long journey.
You may also try checking it yourself. If you have a suspicion that Joomla content management system is messing up with your website by duplicating the web content, you now know what to do. You know the tools to use and the steps to take in order to get back on track.