Thursday, May 5, 2011

Fat Pandas and Thin Content

sad pandaIf you’ve been hit by the Panda update or are just worried about its implications, you’ve probably read a lot about “thin” content. We spend our whole lives trying to get thin, and now Google suddenly hates us for it. Is the Panda update an attempt to make us all look like Pandas? Does Google like a little junk in the trunk?

It’s confusing and it's frustrating, especially if you have real money on the line. It doesn’t help that “thin” content has come to mean a lot of things, and not every definition has the same solution. To try to unravel this mess, I'm going to present 7 specific definitions of “thin” content and what you can do to fatten them up.

Quality: A Machine’s View

To make matters worse, “thin” tends to get equated with “quality” – if you’ve got thin content, just increase your quality. It sounds good, on the surface, but ultimately Google’s view of quality is defined by algorithms. They can’t measure the persuasiveness of your copy or the manufacturing standards behind your products. So, I’m going to focus on what Google can measure, specifically, and how they might define “thin” content from a machine’s perspective.

1. True Duplicates (Internal)

True, internal duplicates are simply copies of your own pages that make it into the search index, almost always a results of multiple URLs that lead to the same content. In Google’s eyes, every URL is a unique entity, and every copy makes your content thinner:

internal duplicates

A few duplicates here and there won’t hurt you, and Google is able to filter them out, but when you reach the scale of an e-commerce site and have 100s or 1000s of duplicates, Google’s “let us handle it” mantra fails miserably, in my experience. Although duplicates alone aren’t what the Panda update was meant to address, these duplicates can exacerbate every other thin content issue.

The Solution

Get rid of them, plain and simple. True duplicates should be canonicalized, usually with a 301-redirect or the canonical tag. Paths to duplicate URLs may need to be cut, too. Telling Google that one URL is canonical only to link to 5 versions on your own site will only prolong your problems.

2. True Duplicates (Cross-site)

Google is becoming increasingly aggressive about cross-site duplicates, which may differ by their wrapper but are otherwise the exact same pieces of content across more than one domain:

cross-site duplicates

Too many people assume that this is all an issue of legitimacy or legality – scrapers are bad, but syndication and authorized duplication are fine. Unfortunately, the algorithm doesn’t really care. The same content across multiple sites is SERP noise, and Google will try to filter it out.

The Solution

Here’s where things start to get tougher. If you own all of the properties or control the syndication, then a cross-domain canonical tag is a good bet. Choose which version is the source, or Google may choose for you. If you’re being scraped and the scrapers are outranking you, you may have to build your authority or file a DMCA takedown. If you’re a scraper and Panda knocked you off the SERPs, then go Panda.

3. Near Duplicates (Internal)

Within your own site, “near” duplicates are just that – pages which vary by only a small amount of content, such as a couple of lines of text:

internal near duplicates

A common example is when you take a page of content and spin it off across 100s of cities or topics, changing up the header and a few strategic keywords. In the old days, the worst that could happen is that these pages would be ignored. Post-Panda, you risk much more severe consequences, especially if those pages make up a large percentage of your overall content.

Another common scenario is deep product pages that only vary by a small piece of information, such as the color of the product or the size. Take a T-shirt site, for example – any given style could come in dozens of combinations of gender, color, and size. These pages are completely legitimate, from a user perspective, but once they multiple into the 1000s, they may look like low-value content to Google.

The Solution

Unfortunately, this is a case where you might have to bite the bullet and block these pages (such as with META NOINDEX). For the second scenario, I think that can be a decent bet. You might be better off focusing your ranking power on one product page for the T-shirt instead of every single variation. In the geo-keyword example, it’s a bit tougher, since you built those pages specifically to rank. If you’re facing large-scale filtering or devaluation, though, blocking those pages is better than the alternative. You may want to focus on just the most valuable pages and prune those near duplicates down to a few dozen instead of a few thousand. Alternatively, you’ve got to find a way to add content value, beyond just a few swapped-out keywords.

4. Near Duplicates (Cross-site)

You can also have near duplicates across sites. A common example is a partnered reseller who taps into their customers’ databases to pull product descriptions. Add multiple partners, plus the original manufacturer’s site, and you end up with something like this:

cross-site near duplicates

While the sites differ in their wrappers and some of their secondary content, they all share the same core product description (in red). Unfortunately, it’s also probably the most important part of the page, and the manufacturer will naturally have a ranking advantage.

The Solution

There’s only one viable long-term solution here – if you want to rank, you’ve got to build out unique content to support the borrowed content. It doesn’t always take a lot, and there are creative ways to generate content cost-effectively (like user-generated content). Consider the product page below:

unique content illustration

The red text is the same, but here I’ve supplemented it with 2 unique bits of copy: (1) a brief editorial description, and (2) user reviews. Even a unique 1-2 sentence lead-off editorial that’s unique to your site can make a difference, and UGC is free (although it does take time to build).

Of course, the typical argument is “I don’t have the time or money to create that much unique content.” This isn’t something you have to do all at once – pick the top 5-10% of your best sellers and start there. Give your best products some unique content and see what happens.

5. Low Unique Ratio

This scenario is similar to internal near-duplicates (#3), but I’m separating it out because I find it manifests in a different way on a different set of sites. Instead of repeating body content, sites with a low ratio of unique content end up with too much structure and too little copy:

low unique content

This could be a result of excessive navigation, mega-footers, repeated images or dynamic content – essentially, anything that’s being used on every page that isn’t body copy.

The Solution

Like internal near-duplicates, you’ve got to buckle down and either beef up your unique content or consider culling some of these pages. If your pages are 95% structure with 1-2 sentences of unique information, you really have to ask yourself what value they provide.

6. High Ad Ratio

You’ve all seen this site, jam-packed with banners ads of all sizes and AdSense up and down both sides (and probably at the top and bottom):

too many ads

Of course, not coincidentally, you’ve also got a low amount of unique content in play, but Google can take an especially dim view of loading up on ads with nothing to back it up.

So, how much is too much? Last year, an affiliate marketer posted a very interesting conversation with an AdWords rep. Although this doesn’t technically reveal anything about the organic algorithm, it does tell us something about Google’s capabilities and standards. The rep claims that Google views a quality page as having at least 30% unique content, and it can only have as much space devoted to ads as it does to unique content. More importantly, it strongly suggests that Google can algorithmically measure both content ratio (#5) and ad ratio.

The Solution

You’ve got to scale back, or you’ve got to build up your content. Testing is very important here. Odds are good that, if your site is jammed with ads, some of those ads aren’t getting much attention. Collect the data, find out which ones, and cut them out. You might very well find that you not only improve your SEO, but you also improve the CTR on your remaining ads.

7. Search within Search

Most large (and even medium-sized) sites, especially e-commerce sites, have pages and pages of internal search results, many reachable by links (categories, alphabetical, tags, etc.):

search within search

Google has often taken a dim view of internal search results (sometimes called “search within search”, although that term has also been applied to Google’s direct internal search boxes). Essentially, they don’t want people to jump from their search results to yours – they want search users to reach specific, actionable information.

While Google certainly has their own self-interest in mind in some of these cases, it’s true that internal search can create tons of near duplicates, once you tie in filters, sorts, and pagination. It’s also arguable that these pages create a poor search experience for Google users.

The Solution

This can be a tricky situation. On the one hand, if you have clear conceptual duplicates, like search sorts, you should consider blocking or NOINDEXing them. Having the ascending and descending version of a search page in the Google index is almost always low value. Likewise, filters and tags can often create low-value paths to near duplicates.

Search pagination is a difficult issue and beyond the scope of this post, although I’m often in favor of NOINDEXing pages 2+ of search results. They tend to convert poorly and often look like duplicates.

A Few Words of Caution

Any change that would massively reduce your search index is something that has to be considered and implemented carefully. While I believe that thin content is an SEO disadvantage and that Google will continue to frown on it, I should also note that not all of these scenarios are necessarily reflected in the Panda update. These issues do reflect longer-standing Google biases and may exacerbate Panda-related problems.

Unfortunately, we’ve seen very few success stories of Panda recovery at this stage, but I strongly believe that addressing thin content, increasing uniqueness, and removing your lowest value pages from the index can have a very positive impact on SEO. I’d also bet good money that, while the Panda algorithm changes may be adjusted and fine-tuned, Google’s attitude toward thin content is here to stay. Better to address content problems now than find yourself caught up in the next major update.

Sad panda image licensed from iStockPhoto (©2010).


Do you like this post? Yes No



"

No comments:

Post a Comment

Disqus for ully's online marketing

Disqus for ully's online marketing