Sitemaps, the DCP and G00gle

Sitemaps, the DCP and G00gle

There was a post in the AOM forum recently about sitemaps and the dreaded duplicate content penalty. It made me stop and think that there are several conflicting schools of thought regarding these areas, and I’ve touched on my perspective a few times in the forum. Perhaps I should expand somewhat on my viewpoint here.

First, it’s heresy to the commonly-held view, but I do not believe in sitemaps for AOM that run into the hundreds of thousands of pages (or items). In fact, I think it’s counter-productive. Here are several reasons why:

  1. It’s a waste of time and space; creating a sitemap that basically is a duplicate of Amazon’s entire catalog is a horrible chore that can take hours, if not close to days. And it takes up considerable space on your server that could be used for other purposes.
  2. Creating a sitemap this big will increase your chance for a duplicate content penalty (DCP), because it gives search engine spiders thousands of chances to compare pages in your site with Amazon and all the other AOM sites out there.
  3. And if you are targeting a specific niche, say the famous ‘red widget’, why have thousands of pages of products that have nothing to do with red widgets? This will dilute the effectiveness of your keywords, since they’ll show up on pages that are totally non-relevant to your target niche. Why would pages from a Wii site be featuring refrigerators? How well will those pages rank against your competition?
  4. Something that search engines love to see is organic growth in sites. It shows expansion, and is what you would expect to see in sites that are meant to be viewed by humans, not set up solely for spiders or bots. A domain that suddenly appears one day with a sitemap weighing in at 400MB or 2GB will look suspiciously like an attempt to ‘game’ the system. A small sitemap will allow spiders to search out dynamic links, giving your site the appearance of natural growth.
  5. As product goes in and out of stock on Amazon, you will eventually accumulate an ever-growing percentage of broken links or ‘this product is no longer available’ pages. How will links to software for Windows 95 help your sales? How often can you generate a new sitemap to keep the links fresh, especially if it’s several MB or more in size?

My sitemaps only contain the home page, category pages and perhaps the ‘View Cart’ link. I take out things like the privacy policy and shipping info, since those are basically generic pages that don’t vary much from one site to another. I also block them with my robots.txt file to keep them from showing up as a potential DCP area. I let the spiders do their job and hunt for any other links. I don’t need 2,400 pages indexed to make a site successful.

Okay, you may say; all well and good, but even so, what prevents my sites from being hit with DCPs once the spiders do start to winnow out product links? A good question, and I admit that it’s impossible to keep an AOM site from being totally immune to the potential. After all, I’m selling the same product as thousands of other sites, many with the same exact code. And I have had sites plummet from page 1 to some online black hole after a re-indexing. What can you do?

AOM gives you several options – one is to keep all the ‘nofollow’ links checked, to prevent getting non-relevant pages lumped into your site (see point #2, above). But one of the best things to do is use the custom box. All my sites since the beginning of this year have had several paragraphs of original text on the home page, sprinkled with juicy keywords, and every category has several sentences or a paragraph of original text targeting that category, also with keywords included.

If you want to get picky, you can’t avoid duplicate content; we all use the same English words in all our sites (non-English sites have the same situation in whatever language they’re in). Does the big “G” penalize you for having “the” or even “I” on your page? Of course not. Nobody knows exactly where they draw the line, tho. So having some content that’s the same as other sites is not really the problem. It’s all in the ratios. If all your content is clearly a dup of someone else’s, then somebody is copying somebody, or that’s how it will be seen. But if only part of it is a copy, then perhaps it’s coincidence. Or even a requirement. How many sites that discuss Shakespeare include copies of his work?

If you add original content into a custom box, you decrease the ratio of ‘obviously duplicate content’ to less than 100%. What the allowable limit is, I don’t know. Probably very few actually do. But certainly anything less than 100 is better than 100. A block of original, keyword-laden text will help in making your page look less like a copy of thousands of others. And with luck, it will appear relevant to the niche you’re aiming at. Having your main keywords at the beginning of your title will also be very important.

So, those are some of the principles I stick to with my sites. But why should you listen to me instead of the established orthodoxy that says you need a huge sitemap? Right now I’ve got at least three stores ranked #1 in Google for their respective keywords. None of them have sitemaps of more than a dozen links. They all have unique category content that I wrote. I don’t worry about how many pages I have indexed. And I get cart reports and orders every day.

Latest posts

One thought on “Sitemaps, the DCP and G00gle

  1. Great suggestion, I understand more about sitemap.

    Thank You.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.