, search engine spider crawling statistical analysis of the necessity:
is a spider crawling smoothly "web search engines premise, search engine is crawling the site, which pages to crawl and spider returns what information can we have mastered the situation for improvement of the website, because this view the spider crawling log is a very important but very the pain, especially SEOER and new webmaster. For example, from the Internet to see the spider crawling a page if the return 200064, said "this is likely to be search engine delete, HEAD 404 also said the request returns to delete", if we can find this phenomenon according to the log can make adjustment according to the actual situation in a timely manner. Also, spider crawling, 301302 redirects, and 404 errors are also issues that webmasters need to be concerned about. Therefore, it is necessary to analyze spider crawler logs.
two, spider crawling statistics:
because of the spider crawling robot on the site is not crawling JS (number of reptiles only 0 or one), flash, IMG and other labels, therefore, the third party statistical software (such as a river, Chinese station, YAHOO, Google and other statistical systems) to statistics to spider crawling records. The analysis of the spider crawling through the following methods: 1, using the PHP ASP browser, according to the returned USER_AGENT to dynamic track record, it can indeed achieve the purpose, but its shortcomings are obvious:
a) heavier server burden, for more content and weight of the web site, spider crawling is very frequent. Inserting code into a web page adds extra load to the server.
b) because search engines are more like static pages, many websites use CMS to generate static files, so it can’t be counted. A Hunan SEO company to introduce the use of img tag or script, namely in the static file statistics using img script or script tags call, after I test this method, a month can not be achieved, not spiders crawl the page.
2, the use of third party log analysis tools: such as Linux under awstats and windows under Webalizer, its shortcomings are also obvious. For example, if you’re a virtual host user, since there are so many logs generated every day, it’s very painful to download a log file every time you analyze it. At the same time, these software is too professional, not suitable for general webmaster use.
3, if you have better spider crawling analysis methods, please share with the webmaster.
three, dedicated to search engine spiders crawling statistical log analysis tool development summary:
1, we analyze log >