Crawling & Indexing: Technical SEO Basics That Drive Revenue (Case Study)

Robin Rozhon
Feb 14, 2018
5 min read

Updated: Sep 21, 2024

Getting technical aspects of SEO right can be rewarding and have a significant impact on your bottom line. The bigger the site is, the more important technical SEO becomes. I’m going to show you how deindexing over 80% of indexed pages leads to great results.

Just to be clear, this is not an ultimate guide of technical SEO because we would need several days to go through that.

KPI? Revenue!

It’s nice to improve crawling speed and similar metrics that will make you a rockstar at conferences and in SEO podcasts, but the revenue is what matters the most to all ecommerce websites.

In 2017, our primary focus was to get crawling and indexing under control because we didn’t want to rely on Google’s judgment (which is often right but can also be wrong), with a goal of increasing the revenue from organic traffic to category pages.

We started with technical changes in early spring and saw our first results in the summer. Category pages (PLP) generated 23.5% more revenue from organic traffic than they did during the same period last year.

As the year progressed, the revenue was increasing, which resulted in 54.9% increase in the revenue from organic traffic to category pages during the holidays. The positive trend kept going and we are up over 70% YOY in the first weeks of 2018.

Yes, we saw an increase in organic traffic, but that’s not your ultimate goal so I’m not going to talk about it at all.

Revenue from organic traffic to category pages 2016 vs. 2017

Content is not everything. Manage crawling and indexing.

At the beginning of the year, we had over 500,000 URLs indexed by Google but we decided to deindex over 400,000 URLs (mainly category pages) during the year and finished the year with only about 100,000 URLs indexed.

We deindexed over 80% of the URLs.

Why did we do that? Because search engines indexed tons of useless and duplicate category URLs.

We wanted to help search engines understand the structure of the site.

Total indexed pages in Google (screenshot taken from Google Search Console)

Before you decide to deindex URLs, look into Google Analytics to see if these URLs drive organic traffic.

How did we know it’s the right thing to do? We simply looked at the percentage of indexed URLs that generates organic traffic and the number was depressing: Only 8.55% indexed URLs had generated at least one session in a month. That’s a painfully low number.

After several months of hard work, the percentage has grown to 49.7%. There’s still work to be done and our goal is somewhere around 60%, but we’re getting there.

Every website has the sweet spot somewhere else. This website is strongly affected by seasonality and a part of the assortment aims at summer and another part aims at winter, so it would be foolish to expect that number to reach 90%. However, you can target 80-90% for certain types of non-seasonal businesses.

Wild Query Strings

As with every ecommerce website, the site had used URL parameters extensively. After digging into Google Analytics data, we discovered that some category pages had 58 unique URLs. Most of them were crawlable, indexable and without a clear canonical strategy.

If that’s the case, the first thing you want to do is collect all parameters used on the site. Collect as much data as you can (Google Search Console, Google Analytics, log files, Screaming Frog, etc.) and extract all parameters. Then spend time with your developers and write down the functionality of every single one. We found almost 150 URL parameters.

URL Parameters Tool in Google Search Console

You will find that some of them are not needed, some of them are a result of legacy systems, etc.

Does this parameter change content seen by the user?

If the answer is no, there’s a high chance that you don’t need that parameter because there are better ways to track things. Usually, you don’t want these URLs to be crawled and indexed by search engines.

If the answer is yes, and the change is meaningful, then these should be crawlable and indexable. You don’t want search engines to discover these parameters if the change is not significant and they make only small changes such as reordering products on category pages.

Once mapping and classifying was done, we configured all URL parameters in Google Search Console (Crawl >> URL Parameters) and Bing Webmaster Tools (Configure My Site >> Ignore URL Parameters).

These tools are powerful and allow you to provide clear instructions to these search engines in no time. But use them more like a short-term solution; I still recommend taking care of these issues directly in the code of the site.

There’s no one solution that fits all. It may make sense to prevent crawling a new parameter that hasn’t been discovered by search engines. If there are already thousands of indexed URLs with that parameter, you should be thinking about using the “noindex” or “canonical” tag instead of crawling restrictions.

Faceted Navigation

Faceted navigation is another common troublemaker on ecommerce websites we have been dealing with.

Every combination of facets and filters creates a unique URL. This is a good thing and a bad thing at the same time, because it creates tons of great landing pages but also tons of super specific landing pages no one cares about.

You can easily get thousands of URLs if you apply facets and filters to a category page.

Seven brands (Adidas, Nike, Puma …)
Four genders (Men, Women, Boys, Girls)
Five average ratings (1 star, 2 stars, 3 stars …)
Ten colors (White, Black, Red …)

This simple example doesn’t offer to refine based on specific features of a product and already creates 1,400 URLs.

We used the following tree to decide if a specific refinement should be a facet or a filter.

Knowing the negative impacts of having each of these URLs indexed, we made sure that facets and filters were treated differently.

Facets

Are discoverable crawlable and indexable by search engines;
Contain self-referencing canonical tags;
Are not discoverable if multiple items from the same facet are selected (e.g. Adidas and Nike t-shirts).

Filters

Are not discoverable;
Contain a “noindex’ tag;
Use URL parameters that are configured in Google Search Console and Bing Webmaster tools.

One may argue that a canonical tag referring to a category is a better solution, but it didn’t work for us because we had other issues with canonical tags and Google tended to ignore them.

Because we already had thousands of URLs with filters indexed, we couldn’t prevent crawling of these URLs (search engines wouldn’t have discovered noindex tags). You can block crawling via robots.txt if search engines are yet to discover those URLs.

Every website is different, and there’s no solution that fits all.

TL;DR

There are tons of websites trying to apply more advanced technical SEO, but they should be trying to get the basics right.

Getting rid of duplicate pages and consolidating signals to one canonical URL is not rocket science and doesn’t sound as sexy as structured data, RankBrain or voice search, but it’s still a great way to improve rankings, traffic, and ultimately revenue.