Market Place

Breaking News

How to Decide – Robots.txt Or Meta Robots?

Robots.txt & Meta robots, two very similar ways to achieve a similar task – but in effect they're significantly different!

Robots.txt was my first introduction to blocking pages from Google, it'd take a relatively small change to implement and it was very simple to explain if you were working with third-party developers. It's too easy to use this badly however, and robots.txt became known as the "lazy option". The obvious knee-jerk reaction? Meta robots (noindex, follow) everything – something made A LOT easier with Plugins like Yoast (for those WordPress Users out there).

The result? Neither method selected due to their effectiveness, but rather because it was the easiest option. Nightmare!

The problem? Many people I've worked with never truly understood what the problem they really need to solve was, therefore couldn't select the correct solution.

But it can be done really, really easily.

Crawl or Index Issue?

When deciding how to block/limit Google (or other search engines) from your site, this is really the most crucial question for you to ask. Do you have an issue with crawl budget? Or is too much of your site being indexed?

Put even more simply, is Google seeing too little or too much of your site?

I've got to be a bit careful oversimplifying too much, in fact it's a pretty tricky process identifying & fixing these problems.

The above is made worse by the fact it usually impacts very large sites. In an effort to rectify the situation you have to make changes which impact a lot of pages, meaning it takes longer to take effect. This is worse news if you do something wrong – putting it right again will take even longer!

You have been warned.

Is it a Crawl Budget Issue?

Much has been written about this over the last 12-18months – Dawn Anderson or Barry Adams are good people to start with if you want a more in-depth introduction. In a nutshell, you have have a big crawl budget issue when your site is x7-x10 larger than is being crawled daily. Easy enough to explain.

Common issues:

  • Google has told us what crawl budget is/acknowledge it is indeed an important thing to be aware of – but is typically sparse on details
  • The majority of us don't understand this as well as we should
  • Establishing your own crawl budget is very hard
  • Selling the concept of crawl budget to a client is simple… until they start demanding specifics
  • Once you identify it, you're assuming Google crawls the site in a linear fashion from top, to bottom – it doesn't.
  • Elements which create issues with crawl budget are sometimes needed – & AMP links, hreflang can both eat major crawl budget – blocking these off defeats the point of using them.
  • That said, they think you're making it up as a tall story when you start talking about robots and crawlers. You're just the daft geek then

    — Dawn Anderson (@dawnieando) August 28, 2017

    The good news? If you have under 1,000 pages you probably don't have a problem – Google should be able to handle it.

    Establishing Crawl Budget Issues

    Spotting where you have a crawl budget issue is relatively straightforward – start a crawl using your favourite software and look at where it gets stuck. If you've ever used Screaming Frog or similar, you'll see this most commonly with layered navigation, filters/sorting options and with some event/calendar functionality.

    If you see this, you have potentially got some crawl issues

    Logfiles are one of the most sure-fire ways to determine whether you have an issue or not – you just need a Log File Analyser, Deep Crawl/Botify or some Grep skills. It's too far beyond this post to go into the process – read here if you're interested – but like the above, you need to monitor where Google spends its time when it's not needed.

    For the "quick & dirty" way to understand if you need to be worrying about crawl budget Joost has you covered:

  • Look at GSC Crawl stats average pages crawled per day
  • Take your site pages number & divide it by the average crawled per day
  • If you have 10, you have x10 the pages than Google is crawling -a pretty big crawl problem
  • So the above example:

    9,781 / 1,466 = 6.6 – nearly x7 times the daily crawl.

    As a rough guide – if your site is over 1,000 crawled pages and you're scoring over x5, you're going to want to investigate further.

    Do You Have Index Issues?

    This one is altogether easier to diagnose, using search operators (site:, inurl:, intitle: etc) to see where key, top-level duplication may be taking place.

    Also most decent SEO audit tools can get you closer to this – when I'm feeling lazy I usually turn to Siteliner. The real key question here is, is your site bigger than it should be? If it's a yes and Google's index is more bloated than it should be, you've got an index problem!

    Crawl/Index Problems Aren't Mutually Exclusive

    Having made this relatively easy so far, here's the curve-ball – you can have crawl & index issues at the same time.

    Click To Tweet

    No comments