# Thin content: how to identify and fix it using Google Analytics

In my experience, severe panda-related hits tend to boil down to a root cause of either duplicate content, thin content, or extremely poor user experience.

As I’ve already covered many of the other areas involved in recovering from panda this month, I wanted to focus on thin content – what it is, how to spot it, and most importantly, how to fix it using Google Analytics

Here is an illustration of what thin content looks like. When I search for a two bedroom house in East Sussex, and scroll down to the one hundredth result, I begin to see results like this one where I’m directed to a page that offers absolutely no value whatsoever.

So how can you find out whether your site is being affected by thin content?

Well, as of a few days ago you can now find out in the Webmaster Tools ‘manual actions’ tab. However, I have to admit I am skeptical of this – I’ve looked at several sites in WMT that I know have thin content issues, and yet no notifications have come up.

I recently had the challenge of fixing thin content issues on a 1.5m page site with approximately 75,000 pages of what I would describe as low quality thin content.

While I’m sure there are numerous clever ways of coming to the same conclusion, I’m going to share my approach which I hope will at least provide a starting point to help you identify and rectify your thin content.

## 1. Define thin content quantitatively

The hardest part about identifying thin content is getting past the subjective nature of what is or isn’t considered ‘thin’.

In my analysis I decided to create a weighted formula that shortlisted a page for being considered thin if it had all of the following characteristics:

• A bounce rate between 95 and 99.99% (here’s why I excluded pages with a 100% bounce rate).
• An average time on page between 0.1 and five seconds.

You can use the following formula to work this out

=IF(CELL WITH BOUNCE RATE < 95%, "Not Thin", IF(CELL WITH AVERAGE TIME ON SITE < 5, "Thin", "Not Thin"))

Once you’ve got this shortlist of pages that are performing poorly, you can begin looking for trends. Which types of pages, or sections of your site are causing trouble?

Try to find common patterns in the URL structure, and get an understanding from a user’s perspective why these pages might be causing people to bounce straight away.

## 2. Rectifying thin content

There is no right or wrong way to rectify thin content, so let me go through various options with an example.

Below are the metrics for a page on MusicJobBoard.com, a site that I use for testing purposes from time to time. As you can see, this page has a combined high bounce rate and low time on page, qualifying this page to be shortlisted as ‘thin content’ by my definition above.

Here is the page, looking rather thin.

Job boards like this one are typically susceptible to thin content, as if no one posts a job in a certain category, the category page can remain indexed despite providing a poor result for someone looking for what the page would usually offer.

In this case, we could apply a rule that would noindex the page if it reached 0 results. I personally don’t like doing this, but on larger sites I’ve seen it work as a pretty effective strategy for keeping low quality results out of the SERPs.

Alternatively, we could design the page in a way that provided value even if no jobs were present – e.g. providing links to see similar jobs in audio production, or even offering some cool information on average salaries for this type of job, as Indeed does.

Another option, in this instance, would be to try and find an ongoing job listing for every category page i.e. a recording studio that is will to receive CVs on an ongoing basis.

One interesting option, which I’ve seen used by several property aggregator services is to redirect people to the next best result i.e. rather than showing me an empty page with 0 property listings in street X, send me to the page on street Y 200 meters away.

A more agreed-upon approach is to merge your pages. Rather than having a page on every single street in the UK, with many being empty, you could merge your street pages to post code pages, or town pages.

This goes against the idea of targeting the long tail with dedicated pages, but without quality in place I think it’s fair to say that that ship has sailed anyway.

One final option is to simply remove your poor performing pages. If there really is no point merging the pages or trying to improve them, it may be worth considering hacking off the low quality content and investing your efforts on improving your best content instead.

## Final thoughts

Thin content is a tricky issue to define and tackle, which is perhaps why it’s not covered in quite as much detail as more objective site quality issues. I’d love to hear how others are tackling it, and whether there are any other creative solutions that I’ve missed above.

Feel free to leave a comment below, send me a tweet, or drop me an email on marcus (at) ventureharbour.com.

Econsultancy's Crunch - Data, Analytics and the Rise of the Marketing Geek, takes place on October 10 at Truman Brewery, London. Crunch is the event for the analysts, strategists and boffins who turns raw numbers into insight, then revenue. This event is one of five that make up our week-long Festival of Marketing

Marcus Taylor is Director at Venture Harbour and a guest blogger on Econsultancy. You can follow Marcus on Twitter or Google Plus

1. Carmen Mardiros

Digital Analytics Consultant at Clear Clues

11:01AM on 9th September 2013

For those thinking of carrying out this I'd recommend first creating a segment as follows:

Source / medum contains google / organic
Keyword does not contain Your Brand Name
Visitor Type is New Visitor

This ensures you're removing biases from your data which can paint a misleading picture as to what counts as "thin" content.

Example: Say some of your landing pages attract a high volume of branded traffic and/or returning visitors. These visitors already have a high propensity for engagement. If you include this segment in your analysis then your pages will appear to perform better than they are.

The goal is to simply determine whether your pages pass the "first impression test" for organic, brand unaware visitors. Other visitor types muddle the picture. By excluding these you are simply raising the bar for what counts as quality.

I have been working on a Google Apps Scripts that scans a list of pages and returns:

- word count in content div area
- any ordered/unordered lists (this is a biggie. Pages with lots of links but "thin" content tend to have a lower bounce rate because they naturally encourage clickthrough. They still count as "thin" content but you won't see the Bounce Rate symptom in GA)
- number of total links within content area

I then cross reference this with GA data for more context. I find it paints a more complete picture.

Thanks for sharing this Marcus.

2. Nick Stamoulis of Brick Marketing

2:21PM on 9th September 2013

I think your formula is a great way of quantifying something that is hard to measure. Thin content tends to be some of the deeper pages that got put up and forgotten about; the pages you never really rely on. Just keep in mind that you don't want to look at data in a silo. If one of your key pages has a higher bounce rate it might not be because the content is thin. Your reference post is great.

3. Ant Robinson

7:29PM on 9th September 2013

Great article however the opening paragraphs suggesting that WMT could identify thin content is misguided. Panda is algorithmic in its approach to penalising for thin content and therefore by definition won't be picked up by a 'Manual Action' report!

4. Marcus Taylor

Director at Venture Harbour

8:00PM on 9th September 2013

Great tip, Carmen. Very handy.

Ant - the change is only a few days old, I personally haven't seen the notifications in WMT, but apparently they are coming up for some people. See here: http://searchenginewatch.com/article/2292495/Thin-Content-With-Little-or-No-Added-Value-Manual-Action-Google-on-How-to-Fix-It

8:11PM on 9th September 2013

Great article Marcus. It amazes me how often knowledge of excel can help marketers make better use of exported data.

I have a big question about SEO and jobs boards, as I've been tinkering in this industry myself.

You mention one of the issues with Panda is duplicate content.

However, when it comes to running jobs boards, I find that recruiters will often paste the exact same ad copy when posting jobs on websites. For example, if you compare job post descriptions on Guardian Jobs, Monster.com and Reed and run them through copyscape then you'll find most of them have 90% duplicate content.

This then means that even if you're lucky enough to grow a successful job board with 100s or 1,000's of job postings, you will have a HUGE amount of duplicate content on your site.

What's the answer to this? No-index? Hope you have enough authority to beat the Panda issue (like the other sites do), or force recruiters to write original job descriptions (which would likely turn many of them off, especially for a smaller site).

6. Visakan Veerasamy

Marketing at ReferralCandy

4:49AM on 10th September 2013

Wow, I love that people are concerned enough about this to approach it with such a systematic approach. I can't imagine having to work with 1.5 million pages. What a challenge! Humbling to think about. Thanks for sharing.

7. mahender nath

Internet marketing at www.emperiumepos.com

3:33PM on 11th September 2013

Great tips! book marked

8. Josh Trenser

5:47AM on 12th September 2013

Hopefully these will help me fix my panda penalized sites. Thanks for sharing!!!