11.02.2004

FW: Robots.text At Whitehouse.gov

Very interesting reading - the whitehouse robots.txt file.
This is a text file that prevents robots (web search sites use spiders or 'bots' to crawl web sites and index information).  More info at www.robotstxt.org.


From: J-Walk Blog
Posted At: Friday, October 29, 2004 12:16 PM
Posted To: J-Walk Blog
Conversation: Robots.text At Whitehouse.gov
Subject: Robots.text At Whitehouse.gov

From The Inquirer: White House site has oddities, like Bush site.

According to Internet consultant Dave Bender, from Minnesota, Bush's web team have done some strange things to the White House web site (www.whitehouse.gov). It has apparently been configured to prevent Internet search engines from capturing historic snapshots of what is posted on the site.

The technical details are in a file that web sites often have in their uppermost directory called 'robots.txt'. It contains directives that Internet search engines, like Google and Yahoo, read to determine what the site owner would like indexed by the search engine.

Here's a link the the robots.txt file at whitehouse.gov. It's a list of all the directories that they don't want indexed by search engines.

Why so many?

He said that the only reason he could thing of is that it is designed to prevent a plugged-in reporter could check a page on a site and compare it with the cached version to see what's changed.

A concerned voter might want to see if the White House has changed its position on one thing or another.

0 Comments:

Post a Comment

Links to this post:

Create a Link

<< Home