Sunday, May 29, 2005

Did Google visit your website today?

I'm always curious to know how often Googlebot is visiting my website. I get maximum traffic from Google Search Engine, so it becomes very important that Googlebot pays frequent visits to my website and indexes maximum number of pages. Googlebot is Google's web-crawling robot. It collects documents from the web to build a searchable index for the search engine.

Since this blog is hosted on Blogger, I do not have any access to their webserver logs and the only way to find out if Google visited my site is check the date on Google Cache.

But looks like there exists a better way of doing things - I just came across an undocumented but very powerful syntax called "daterange" - Google did mention it in the API documentation but very few know about it.

Remember: A date-range search has nothing to do with the creation date of the content and everything to do with the indexing date of the content. And this is exactly what I was looking for.

If you want to limit your results to documents that were published within a specific date range, then you can use the "daterange:" query term to accomplish this. The "daterange:" query term must be in the following format: daterange:<start_date>-<end date>

where <start_date> = Julian date indicating the start of the date range
<end date> = Julian date indicating the end of the date range

The catch is that the date must be expressed as a Julian date that is calculated by the number of days since January 1, 4713 BC. For example, the Julian date for August 1, 2001 is 2452122. You can use this online tool to Convert calendar date to Julian Date

This simple form allows you to do a date range search using google. Rather than constructing fancy queries such as " life daterange:2453461-2453491", simply put in the # of days back. e.g. if you want to do a search for life in the past 20 days, type in life in the query box and 20 in the days back box.

I used the query below to find the pages on my site that were indexed by Google a day before.

http://www.google.com/search?q=site:http://labnol.blogspot.com%20daterange:2453517-2453518

The number of results retrieved are actually the number of files that were indexed by Google yesterday.

And don't forget that there are a few simple things you can do to help the Googlebot understand your web site as fully as possible. Read this great article at Scribbling.net.