It seems lately that everyone is talking about duplicate content penalties, and how they only have adverse effects on your site, be it AOM or anything else. I’ve talked before about the DCP, and made some suggestions to help. Another thing that you should have in your arsenal is a robots.txt file, so we’ll discuss that in a little more detail now.
Search engines use spiders, which are as you should know by now, are little programs that traverse the web, looking for links, examining content and reporting what they find back to the mainframe from whence they come. The programs are also known as ‘bots (short for robots) because they act much like a robot – they tirelessly do their job, looking and reporting.
As such, a robots.txt file is a way of communicating with these programs. It contains instructions about which bots are allowed to search a directory, and which files (if any) they are allowed to examine in a directory. The point is that you can exclude certain files or URLs from their gaze. This can be an important tool to help you avoid a duplicate content penalty, at least for some pages of your AOM site.
(A side note to satisfy the more technical of you out there: While bots are supposed to read and respect a robots.txt file, it’s common knowledge that many don’t. Usually bots that have a less than honest reason to be burrowing into your files. Trying to ban harmful bots that may be eating into your bandwidth, etc. is important, but not part of this discussion. The main search engine bots (Google, Yahoo, MSN, etc.) by and large respect robots.txt. Since we’re discussing dupe content, let’s leave evil bots out of this for now.)
There are lot of sites that can explain how to create a robots.txt file, with all kinds of little subtle commands and exclusions, etc. Let’s just look at a very basic structure that you can use with AOM.
The first thing you would need would be a notice that the following commands should be obeyed by all bots that visit:
The command ‘User-agent’ is robot-speak for ‘Dear robots’. The asterick is a wildcard command for ‘This Means You’.
This is important; it means Do NOT examine whatever follows this. With Associate-O-Matic, you would include any page you don’t want the bot to read. Good examples are the View Cart and Advanced Search pages. These generally don’t vary much from site to site, and could mean you’re penalized for having the same shopping cart page as thousands of other AOM sites. So…
I’m including both the PHP file containing the info, and the resulting AOM page, just to make doubly sure. Put it all together and you get something like this:
Often, when a bot accesses your site, not having a robots.txt file will generate a message in your error log. It’s not a critical situation, but having the file in place will at least prevent the error message from appearing in your logs. One less thing to have to read. And more importantly, it may be one less thing to make your AOM site look like so many others, at least to the search engines. It’s no substitute for original content, but why take a hit just for your shopping cart or shipping info pages?