Search

Robots to the Rescue!

Robots to the Rescue!

It seems lately that everyone is talking about duplicate content penalties, and how they only have adverse effects on your site, be it AOM or anything else. I’ve talked before about the DCP, and made some suggestions to help. Another thing that you should have in your arsenal is a robots.txt file, so we’ll discuss that in a little more detail now.

Search engines use spiders, which are as you should know by now, are little programs that traverse the web, looking for links, examining content and reporting what they find back to the mainframe from whence they come. The programs are also known as ‘bots (short for robots) because they act much like a robot – they tirelessly do their job, looking and reporting.

As such, a robots.txt file is a way of communicating with these programs. It contains instructions about which bots are allowed to search a directory, and which files (if any) they are allowed to examine in a directory. The point is that you can exclude certain files or URLs from their gaze. This can be an important tool to help you avoid a duplicate content penalty, at least for some pages of your AOM site.

(A side note to satisfy the more technical of you out there: While bots are supposed to read and respect a robots.txt file, it’s common knowledge that many don’t. Usually bots that have a less than honest reason to be burrowing into your files. Trying to ban harmful bots that may be eating into your bandwidth, etc. is important, but not part of this discussion. The main search engine bots (Google, Yahoo, MSN, etc.) by and large respect robots.txt. Since we’re discussing dupe content, let’s leave evil bots out of this for now.)

There are lot of sites that can explain how to create a robots.txt file, with all kinds of little subtle commands and exclusions, etc. Let’s just look at a very basic structure that you can use with AOM.

The first thing you would need would be a notice that the following commands should be obeyed by all bots that visit:

User-agent: *

The command ‘User-agent’ is robot-speak for ‘Dear robots’. The asterick is a wildcard command for ‘This Means You’.

Disallow:

This is important; it means Do NOT examine whatever follows this. With Associate-O-Matic, you would include any page you don’t want the bot to read. Good examples are the View Cart and Advanced Search pages. These generally don’t vary much from site to site, and could mean you’re penalized for having the same shopping cart page as thousands of other AOM sites. So…

Disallow: /shop.php?a=cartview
Disallow: /shop.php?a=advanced

(Note that each command must be on a separate line, and the domain name is not needed.) Other good things to include would be if you use the same privacy policy or shipping information pages on every site:

Disallow: /ship.php
Disallow: /page-shippinginfo.html
Disallow: /ppolicy.php
Disallow: /page-privacypolicy.html

I’m including both the PHP file containing the info, and the resulting AOM page, just to make doubly sure. Put it all together and you get something like this:

User-agent: *
Disallow: /shop.php?a=cartview
Disallow: /shop.php?a=advanced
Disallow: /ship.php
Disallow: /page-shippinginfo.html
Disallow: /ppolicy.php
Disallow: /page-privacypolicy.html

That would be a good start. You can always add to it later if need be. It tells every bot that visits your site not to bother with the View Cart, Advanced Search, Shipping Info and Privacy Policy pages. You would save this as a text file with the filename: robots.txt in the main directory of your site (where the admin.php and shop.php files are located).

Often, when a bot accesses your site, not having a robots.txt file will generate a message in your error log. It’s not a critical situation, but having the file in place will at least prevent the error message from appearing in your logs. One less thing to have to read. And more importantly, it may be one less thing to make your AOM site look like so many others, at least to the search engines. It’s no substitute for original content, but why take a hit just for your shopping cart or shipping info pages?

Latest posts

5 thoughts on “Robots to the Rescue!

  1. Nipon Ekanarongpun

    One more question. If I put

    Disallow: /aom/

    What’s happen?

  2. mcarp555

    The simple answer is bots that obey robots.txt won’t look in the /aom directory.

  3. […] robots.txt – The file that contains the list of files and/or directories you do not want scanned. You can exclude some or all bots from some or all files. A basic overview of this file can be read here. […]

  4. Basically do we need to put disallow aom and ioncube folder in robots.txt?

    For easy guidance, can you give us in details should we use robots.txt for AOM V.5.3.0?
    So we can just copy paste and upload to our server host then no bothered with robots.txt any more. I need robots.txt from successful AOM user please.

    Another question, how to get faster search engine indexed, seems that my domain not yet indexing.

    Thanks

  5. mcarp555

    You can disallow the /aom and /ioncube folders if you want, but I don’t think it’s crucial to do so.

    I would recommend using a robots.txt file with any version of AOM. You can look through the AOM forum for information on what other users are putting into theirs.

Leave a Comment