Cleaning the sitemap file

Blog tutorial Cleaning the sitemap file
Broom standing next to the wall

How to check sitemap

Spring is the time of various clean-up tasks. As we clean our apartments, garages, and gardens, we should also think about cleaning up our website. Detailed instructions on how to conduct an SEO audit can be found in our article: ‘How to perform a website on-page SEO audit’. In this text, we will focus on cleaning a sitemap file - a file containing the URLs of all pages of a website.


Mapping the link structure

The sitemap file should map a website’s link structure. It should not contain any redirects, broken links, broken hosts, links to different domains or non-canonical addresses. Let’s start with looking for unwanted redirects in the sitemap file.


Redirects in the sitemap file

There are two ways redirects can be detected in Pulno. One way is to use the filter that is available in the panel. Click on the advanced filter option, choose pages linked in the sitemap file only and search by response code: 301, 302, 307, 308. Response codes that start with number ‘3’ should be removed from sitemap.

sorting by 301 and 302 response code

A link to the list of addresses with redirects can also be found in the sidebar.

information about redirects in sitemap

Redirects lead to the loss of crawl budget and make it difficult for search engines to understand the structure of the website. Google recommends placing only canonical versions of the URLs in the sitemap file. Placing redirects is strongly advised against.


Broken pages

The next step is finding broken pages. Such pages return either the 404 or the 410 response code.

sorting by 404 and 410 response code

Analogously to the previous example, links to broken pages can be found in the sidebar in the ‘Visibility’ section.

information about broken links in sitemap

Page errors are similar to redirects. Each broken link in sitemap wastes the resources of search engines to visit non-existent pages. For that reason, removing the URLs of broken pages from the sitemap file is strongly recommended.


Lack of https in sitemap

Based on data coming from a million of the most popular websites, it appears that the traffic of more than half of the domains comes through the https encrypted protocol. In the period of six months, from February to August 2018, the number of websites containing https in the URL address increased by over 13%. Unfortunately, switching to https does not always go in hand with a sitemap file update. Pulno analyzes the links in sitemap and notifies the user in case of the presence of obsolete URLs.

information about links without https in sitemap


Canonical pages and the blocking of indexing

Per Google’s recommendation, the sitemap file should include links to canonical versions of pages. For that reason, ensuring that there are no links to pages with canonical pointing to a different page in the sitemap is highly advised. Additionally, we check if pages are blocked in robots.txt, robots meta tag and x-robots-tag header. When checking your robots.txt file, you should make sure that it contains a link to the sitemap file. Instructions on what the link should look like can be found in this article.



External links

The sitemap file should not link to websites other than the one on which it is located. This applies to both with www. and without www. versions. Removing any addresses to external websites from the sitemap file is strongly recommended.


Conclusion

The sitemap file strongly indicates to search engines which pages should be indexed. Website analysis with Pulno facilitates maintaining order on a website and thus, sending clearer signals to Google robots. This translates into more effective use of the budget crawl and better website optimization.



Jacek Wieczorek is the co-founder of Pulno. Since 2006, he has been optimizing and managing websites that generate traffic counted in hundreds of thousands of daily visits. 


Get in touch:   



×

Start your sitemap clean-up today!

Cleaning a sitemap file is important and it doesn't have to be difficult. Check how to do it with Pulno!

Enter valid URL
Enter valid e-mail
You have to accept the terms.