Spring is the time of various clean-up tasks. As we clean our apartments, garages, and gardens, we should also think about cleaning up our website. Detailed instructions on how to conduct an SEO audit can be found in our article: ‘How to perform a website on-page SEO audit’. In this text, we will focus on cleaning a sitemap file - a file containing the URLs of all pages of a website.
The sitemap file should map a website’s link structure. It should not contain any redirects, broken links, broken hosts, links to different domains or non-canonical addresses. Let’s start with looking for unwanted redirects in the sitemap file.
There are two ways redirects can be detected in Pulno. One way is to use the filter that is available in the panel. Click on the advanced filter option, choose pages linked in the sitemap file only and search by response code: 301, 302, 307, 308. Response codes that start with number ‘3’ should be removed from sitemap.
A link to the list of addresses with redirects can also be found in the sidebar.
Redirects lead to the loss of crawl budget and make it difficult for search engines to understand the structure of the website. Google recommends placing only canonical versions of the URLs in the sitemap file. Placing redirects is strongly advised against.
I'd list the URLs you want indexed. We use sitemaps in part to select canonical URLs; if you tell us about one URL in the sitemap, but redirect to another URL, how are we supposed to know which one you really want? pic.twitter.com/dMdeWHIIDj— 🍌 John 🍌 (@JohnMu) 21 sierpnia 2018
The next step is finding broken pages. Such pages return either the 404 or the 410 response code.
Analogously to the previous example, links to broken pages can be found in the sidebar in the ‘Visibility’ section.
Page errors are similar to redirects. Each broken link in sitemap wastes the resources of search engines to visit non-existent pages. For that reason, removing the URLs of broken pages from the sitemap file is strongly recommended.
Based on data coming from a million of the most popular websites, it appears that the traffic of more than half of the domains comes through the https encrypted protocol. In the period of six months, from February to August 2018, the number of websites containing https in the URL address increased by over 13%. Unfortunately, switching to https does not always go in hand with a sitemap file update. Pulno analyzes the links in sitemap and notifies the user in case of the presence of obsolete URLs.
Per Google’s recommendation, the sitemap file should include links to canonical versions of pages. For that reason, ensuring that there are no links to pages with canonical pointing to a different page in the sitemap is highly advised. Additionally, we check if pages are blocked in robots.txt, robots meta tag and x-robots-tag header. When checking your robots.txt file, you should make sure that it contains a link to the sitemap file. Instructions on what the link should look like can be found in this article.
#TFW you're checking a sitemap for a client site audit and find:— Ⓞⓟⓣⓘⓜⓘⓢⓔⓨ (@Optimisey) 18 kwietnia 2019
▪️None of the sitemap URLs are indexable... 😐
▪️Because they're canonicalised... 🤨
▪️To a series of very similar URLs... 😣
▪️That all return 404s... 😖 pic.twitter.com/U6YcIfwSm6
The sitemap file should not link to websites other than the one on which it is located. This applies to both with www. and without www. versions. Removing any addresses to external websites from the sitemap file is strongly recommended.
The sitemap file strongly indicates to search engines which pages should be indexed. Website analysis with Pulno facilitates maintaining order on a website and thus, sending clearer signals to Google robots. This translates into more effective use of the budget crawl and better website optimization.
Cleaning a sitemap file is important and it doesn't have to be difficult. Check how to do it with Pulno!