Why server log analysis is so important for SEO
We’ve created this guide to help our readers understand what a server log is and why it is so important for SEO and how it contributes to the success of your website.
The Technical Bit
What is a server log?
A server log is a file that is automatically generated by your server, logging every entry website. It’s basically a register of who or what is visiting your website. What is great about this file is that you are able to export the data for analysis. When the data is exported it can look a little bit confusing, but it’s very valuable and you just need a technical SEO expert to decipher it for you. You’ll see a file full of data, displayed fairly similarly to our example below.
Why server log analysis is important
Search Engine Identification
Server logs are tremendously helpful when conducting a technical review of the site, as they help you to understand which search engines are accessing your site. This could be Google, Bing, Baidu, Yandex, or any number of search engines. This is useful as it can help you understand which search engines you need to better optimise your site for.
Maximising Page Indexation
Once you know which search engines are viewing your site you can also see which pages they are looking at and indexing. This helps you know which pages the search engines have found and which ones they haven’t. You can then look at why they haven’t found certain pages, get these fixed and then indexed. More indexed pages means more traffic and hopefully more sales / leads.
Crawl Budget Discovery
You can use the server log file to work out what your crawl budget is by seeing how many of your pages get indexed per day by Google, or any search engine. This allows you to see how valuable your site is to Google by working out the percentage of your pages that get indexed. This means that once you make changes to your site you can see if your crawl budget has increased or decreased, and therefore whether your website is deemed more, or less valuable after your changes. The better your crawl budget is the better your site is performing.
Removing Temporary Redirects
You can use the server log file to quickly identify all of the temporary redirects on your site, also known as 302 redirects. Having temporary redirects on your site is bad for SEO as it means that the value of the old page does not flow to the new target page because the link is temporary. Using the server log you’ll be able to identify these issues and turn the 302 redirects into 301 redirects. This is often a lot quicker than crawling your site with a crawler.
Page Status Identification
In an ideal world, every page would give a 200-response code. If you’re not sure what a response code is, it is a numeric code that allows users to know the status of a page. You’ve probably encountered a 404-error page before. You can analyse the server log to find the response codes of all your pages and this will help you find the pages that aren’t working properly. Check out our list of common response codes below. Check out this complete list of response codes.
200 – Webpage is fine
301 – Permanent redirects (Page value passed on)
302 – Temporary redirects (Page value not passed on)
404 – Broken page
500 – Website under maintenance
Stopping Crawl Budget Waste
Using server log analysis we are able to identify which unnecessary pages have been crawled by search engines and put these into either the robots.txt file, if we don’t want them indexed at all, or update the xml sitemap to crawl them less regularly. This means you don’t waste our crawl budget and you ensure the most important pages are regularly crawled.
Deleting Duplicate URLs
If there are duplicate URLs, usually caused by parameters, it can again affect the amount of crawl budget you are given. Each search engine views the duplicate parameter pages as useless and therefore your site in the same light. If there are a high volume of pages that have parameters, creating duplicate pages, this can potentially lead to a penalty by any search engine or at the very least a reduction in the amount of crawl budget you get.
Working Out Page Crawl Priority
Having an understanding of what content has a higher crawl priority in comparison to other web pages is essential. The server log allows us to do this by seeing the most regularly indexed pages. Giving us a better idea of which content is being crawled the most regular, and what may need to be re-submitted or edited. It might the case that you only need to fix technical issues on those pages rather than change the content. However, fixing the problem, whatever it is, will help improve the pages crawl priority.
Understanding Bot Behaviour
Server logs allow you to figure out the behaviour of each of the crawl bots, allowing you to identify which ones prefer our site and which ones you need to optimise for. By doing this we create pages which are friendly for all crawl bots. Google have themselves said “having fewer crawl issues on a site helps Googlebot understand that the website is a healthy site, resulting in more crawl budget.” So, by optimising your website for all search engines, particularly Google, your site is going to perform better in the rankings.
What Should I Do Next?
If you’re interested in someone taking a look at your server log and analysing your site then please contact us or give us a call +44 (0)20 8977 2994. We can then send you a report of how your site is doing based on the server log analysis. We can also combine this with a full analysis of the site if you’d like a complete MOT. Off the back of this we can create a strategy for you on what to fix, how to improve your overall website, to attract more traffic.
If you want to do a server log analysis yourself please have a look at the next section for more information on how to do it.
Analysing Server Log Files By Yourself
How do we turn a server log file into something more understandable?
There are multiple tools that can help you understand log files shown below.
How are we able to get the log file data?
As there are several types of servers and the data is stored differently. Here are the general guides to finding and managing log data on three of the most-popular types of servers.
Accessing Apache log files (Linux) http://httpd.apache.org/docs/2.4/logs.html
Accessing NGINX log files (Linux) http://nginx.com/resources/admin-guide/logging-and-monitoring/
Accessing IIS log files (Windows) http://www.iis.net/learn/manage/provisioning-and-managing-iis/configure-logging-in-iis
Please be aware that if your site receives a high volume of traffic, the log files are likely to be very large and your computer might not be able to cope with them.