If you’ve worked with any type of site that deals with a large number of listings, you’ve likely come across faceted navigation.
Faceted navigation is a widely accepted UX pattern that helps users discover what they’re looking for in less time. The downside is that it comes with many potential SEO complications that you’ll need to mitigate.
In this guide, you’ll learn:
Faceted navigation (or faceted search) is a type of navigation found on the category/archive pages of sites that deal with many listings. Its purpose is to help users find what they’re looking for more easily using multiple filters based on listings attributes.
Many people refer to faceted navigation as simply ‘filters.’
You will most commonly find this type of navigation on the category pages of:
But it’s common on many other large websites, too.
Faceted navigation works by filtering listings on category pages by their attributes. As mentioned, listings will often be:
Attributes vary by site, but common examples include:
Once site admins have given relevant attributes to listings, the site then displays these attributes to the user in a list:
What happens when a user selects a filter varies, but generally one of four things will happen.
The first two options have a similar UX but a different UX pattern to option three.
Which UX pattern you use depends on whether a user is likely to use more than one filter. If users tend to apply multiple filters, it makes sense to only apply the filters and update the listings once they opt to apply them.
Once filters are applied, the URL can also optionally update to reflect the selection. What happens with the URL at this stage can also vary:
The types of issues you’ll be looking to prevent or fix with a faceted navigation implementation include:
Unfortunately, faceted navigation potentially creates a near-infinite number of facet combinations and indexable URLs. If you have issues with one of these, the SEO impact tends to be high.
Below are some examples of how these issues occur and their impact on your site’s SEO.
Duplicate content is when the same or similar content is accessible at multiple URLs. Filters are notorious for creating URLs with duplicate content en masse. Duplication is mainly due to filter pages being close copies of the original page, just with different listings.
While duplicate content isn’t necessarily a negative ranking signal, it can cause issues with:
See the ecommerce site currys.co.uk as an example. We start on their HP PC Monitors page. It’s a reasonably standard ecommerce layout, with a header, listings, and faceted search above the fold:
And then below the product listings, some content about HP monitors:
Now apply a filter for ‘4k monitors’.
You’ll see the product listings update, the H1 change, and the URL go from:
But if you scroll back to the bottom of the page, the same block of content exists below the listings.
This is just one example of duplication on the site. Scale this across every filter available, and you’ll quickly have millions of duplicate pages for Google to try and consolidate into one canonical page.
Index bloat is when search engines index pages on your site that don’t have search value.
Only allowing Google to index quality pages is critical as having low-quality pages indexed can impact Google’s overall view of your site, as explained by John Mueller in this video:
Faceted navigation can potentially create millions of indexable URLs with no unique content to them. It can also create page variants that provide no value to users using search engines.
Here’s an example:
AO.com has a category page dedicated to freestanding washing machines:
A user may visit this page and decide they want to filter for:
Thanks to the filters, the site has returned precisely the washing machine that suits that user’s needs.
But would a user ever search in Google for something that precise?
The answer is resoundingly a no.
We know this because there are only an estimated 90 searches per month for ‘freestanding washing machines’ in the UK, so it’s extremely unlikely that there’ll be many (if any) searches for something even more specific like ‘large silver samsung freestanding washing machine with quick wash feature and energy rating A.’
Having pages like this indexed that do not cater to search demand and are low quality can put your site at risk of being impacted negatively by an algorithm.
Google can only dedicate a finite amount of resources to crawling the pages on your site. This is known as your crawl budget.
Managing crawl budget isn’t something that Google deems a priority unless you have a large site (1M+ unique pages) or medium site (10K+ unique pages) with very rapidly changing content.
Given that advice, if you only have a few thousand categories and products, you might think you don’t need to worry about crawl budget management.
That could be very wrong.
Some faceted navigation implementations will create a crawlable link for each facet combination available.
Ignoring the potential index bloat issues, this also means you’re potentially generating millions of URLs for Google to crawl, so you’ll quickly make crawl budget management something to consider.
You can find an example of this on the next.co.uk site:
When you inspect the HTML of a facet, you’ll see a link in the HTML:
Once you’ve followed that link, you can then check the HTML of another facet like the blue one:
You can see how the facets combine to create an entirely new URL to be crawled.
Now consider all the potential combinations of different filters. You can quickly see how crawling a site with facet issues would cause issues for a search engine.
Faceted navigation can also dilute the PageRank passed around your site.
This is because PageRank is divided by the total number of links on the page. This presents an inherent issue with faceted navigation as a large number of them generate many internal links.
So rather than PageRank passing to important product or category pages, it’ll pass to the links found within your filters, which in most cases won’t help improve search traffic.
Recommended reading: Google PageRank is NOT Dead: Why It Still Matters
There are always obvious telltale signs of faceted navigation issues; here are some steps to discover if your filters could be impacting your SEO.
A great tactic to check for signs of index bloat quickly is to use the site: search operator. While not the most accurate way, it’s quick and easy to do.
It works by simply prepending ‘site:’ before your domain name, like below.
Take note of the number of results Google returns. Does that seem higher than the number of URLs you know to be available on your site?
If it is, that’s the first sign you have issues with index bloat.
GSC’s coverage report is another great way to uncover crawling and indexing issues quickly.
Just head to the ‘Coverage’ report within GSC and select ‘Valid’ on the chart for a more accurate figure on the number of pages Google has indexed:
If this seems high, or you’ve recently implemented faceted search, and it’s shot up, this points towards the index bloat issues mentioned previously.
But how do we know if filters caused it?
Accurate XML sitemaps help diagnose issues here. If you’ve uploaded those to GSC, the table below the chart will split the indexed URLs down into:
That means we can look at ‘Indexed, not submitted in sitemap’ pages to see unwanted pages Google is indexing:
This example is for a betting site that lets you filter for locations and tournaments. We can see here Google is indexing unwanted URLs.
Another helpful way to discover potential issues is to filter for ‘Excluded’ URLs:
Investigating ‘Crawled — currently not indexed’ URLs can give you insights into pages Google is discovering but has decided not to index.
Google won’t index everything they crawl. If the page is of low quality like many facet pages are, they may decide not to index it.
In this example, we know there are 1,000 additional pages Google has discovered that they may index in the future. You can also view the URLs table to see faceted URLs by clicking on this report.
The above is a relatively mild example of issues with faceted navigation highlighted in GSC. Over time, these issues can scale to hundreds of thousands of URLs being discovered but not indexed (showing the potential crawling problems):
Or potentially hundreds of thousands of URLs being indexed when they shouldn’t be:
Using a site search and GSC is a great way to quickly get data on an issue, but neither will surface all indexable/indexed URLs, making it hard to spot trends and understand the scale of the problem.
Site auditing tools like the Ahrefs’ Site Audit can help remedy that by giving you detailed information on the URLs discovered from crawling the site.
The below example is a site with faceted navigation issues causing crawl budget wastage, and you can spot that with only a couple of clicks.
First, head to the Indexability report in the left sidebar.
Next, take a glance at the ‘Indexability distribution’ chart, and you’ll see if something looks off.
From a partial crawl, Site Audit found 39 non-indexable URLs for every indexable URL. Given that this isn’t a full crawl of the site, we could expect that the ratio of indexable to non-indexable URLs will likely worsen as the crawl continues.
The above highlights a tremendous amount of crawl budget wastage, and it’s also an excellent example of a crawler trap—where technical issues create an almost infinite amount of irrelevant URLs for search that a bot will crawl.
If your faceted navigation is causing index bloat, the chart you’ll see here will look a bit different. Rather than a large amount of non-indexable URLs, you’ll see vast amounts of indexable URLs on the chart like the below.
To confirm this is a faceted navigation issue, select the non-indexable portion of the chart and scan the list. You’ll now see a table of all the non-indexable pages crawled.
Here is where you’ll need to spot a pattern.
What’s causing crawlers to find all of these non-indexable pages?
If the vast majority of the URLs returned in the table are faceted URLs, you’ve found yourself a faceted navigation issue.
Now that you know how to check for faceted navigation problems, here’s how to fix them.
If you’re facing indexing issues but no alarming crawl budget issues (and don’t have a huge site), the best solution is arguably to use the canonical tag. It consolidates link signals for similar/duplicate pages into the URL you specify as the canonical.
If you have links to a facet page, which then canonicalizes to the non-facet page, those link signals aren’t lost; search engines will pass them to the category page, which may help it to rank.
Here’s an example of how to implement this…
Say this is the URL of your category page:
Your facet URLs work with parameters, so when someone applies some filters, the URL looks like this:
On the above facet URL above, you’d simply add a canonical tag pointing back to the category page, so your canonical tag would look like this:
<link rel="canonical" href="https://example.com/washing-machines/samsung/" />
Or like this in your HTTP headers:
Link: <https://example.com/washing-machines/samsung/>; rel="canonical"
While this seems like a nice and easy fix for a serious SEO problem, as always, there are some potential issues, the main one being that Google may ignore your canonical tag.
This is simply because canonical tags are suggestions to search engines, not directives. So if Google, for some reason, thinks that you’ve implemented the tag incorrectly, they may decide to ignore it.
The common reasons Google will decide to ignore your canonical tag suggestions are:
If you don’t see the number of valid URLs in your coverage reports decreasing after implementing the canonical tag, move onto step two.
If canonicalization didn’t fix the indexing issues, the URL parameters report in GSC is arguably the best way to optimize crawling. It lets you tell Google how to handle the parameters in your URLs and helps them crawl more efficiently.
The downside is that this method only works if your faceted navigation uses URL parameters. (If that’s not the case for you, go to step three).
Using the URL parameters report is pretty straightforward. Just add a parameter, then tell Google how it affects page content and if there are any exceptions to the rule that they should crawl.
If you’re already blocking them from being crawled via robots.txt, this won’t make any difference.
If you’re facing crawl budget issues and you don’t need signals to consolidate, you’ll want to use robots.txt to block Google from crawling any faceted URLs.
To block crawling of a URL with the robots.txt, add a disallow rule like the below:
In the example above, I’ve added two wildcards (*) around the parameter. If your faceted navigation works by appending directories, your rule will look like this:
There are two instances when the robots.txt doesn’t work well:
You should also be aware blocking crawling doesn’t necessarily prevent Google from indexing the blocked URLs. Generally speaking, Google will drop blocked URLs from the index—but only if they have no backlinks and/or many followed internal links pointing to them. In other words, as long as nothing else is signalling to Google that those URLs are valuable.
If blocking crawling doesn’t fully eliminate indexing issues caused by faceted search, nofollowing internal links to those URLs may solve the problem.
There are typically two sources of these links:
For faceted search links, applying a blanket nofollow is easy enough with a bit of basic coding. However, this probably isn’t the best idea if you have canonical tags on faceted URLs and/or faceted URLs that you want Google to index. Reason being, if Google ends up not crawling these links because they’re nofollowed, it can cause other indexing issues.
The alternative is to pick and choose the facets that you nofollow. That’s a bit harder to implement from a technical standpoint, but it can be worth it if you want to target long-tail queries with faceted search (more on that later).
The main downside of this approach is that it’s less useful after Google started treating rel=’nofollow’ as a hint, meaning it’s not a directive like the robots.txt is.
However, Google will use an internal nofollow to indicate that the URL within the href attribute isn’t that important and Google should deprioritize crawling it.
John Mueller has confirmed this:
[…] we will continue to use these internal nofollow links as a sign that you’re telling us:
- These pages are not as interesting.
- Google doesn’t need to crawl them.
- They don’t need to be used for ranking, for indexing.
This approach doesn’t correct the dilution of PageRank. PageRank is still distributed between all links on the page, even those with the nofollow attribute. If you want to fix that, you’ll need to implement proper canonicalization.
For links elsewhere on your website, your best bet is to just remove them.
You can find internal links to problematic faceted URLs using Ahrefs’ Site Explorer:
You can then simply look for ‘followed’ internal links elsewhere on your site and remove them.
If you’re still facing indexing issues after following the steps above, then your last port of call is the noindex tag.
The benefit of the noindex tag is that it’s a surefire way to prevent the indexing of facet pages. The downside is you don’t consolidate ranking signals, and over time, Google may stop crawling internal links on a noindexed page, meaning no passage of ranking signals.
Still, this is a good way of getting faceted URLs out of Google’s index if all else fails.
To implement this, simply add either a meta robots tag in the <head> of a faceted URL:
<meta name="robots" content="noindex">
Or the X‑Robots header within your HTTP headers of a faceted URL:
You then need to remove/adjust any crawl blocks for the URL in robots.txt or the URL Parameters tool. Fail to do this, and Google will never see the noindex directive—meaning that the page will stay indexed.
Recommended reading: Robots Meta Tag & X‑Robots-Tag: Everything You Need to Know
From the previous section, you would have realized that correcting all the potential issues faceted navigation can create isn’t easy.
Every approach to fix both indexing and crawling has some downsides or complications.
But there is a better way.
Suppose you’re implementing a new faceted navigation configuration or creating one for the first time. In that case, you can circumvent all of the above issues while still making the most of the UX benefits.
Here’s how to do that.
First, build your faceted navigation with AJAX and don’t add any <a href=…> internal links.
By doing that, users get a great experience due to the page not reloading whenever they filter, and Google won’t see any internal links to facet pages, meaning:
Here’s an example.
I’ve implemented faceted navigation with the WP Grid Builder WordPress plugin on a resource I created called SEO Toolbelt.
It looks like this:
When you right-click and inspect element on any of the checkboxes to apply a filter, you’ll see they don’t include a <a href=…> link on them, preventing Google from crawling any additional URLs.
Because of that, I’ve circumvented having to even think about crawl budget wastage from faceted navigation.
Next, we need to make sure that when a user clicks a filter, the URL changes.
I recommend doing this as we’ve materially changed the contents of the page, and ideally, if a user bookmarks, links to the page, or shares a URL with a friend, the contents of the URL will still reflect the filters they applied when they bookmarked/shared/linked the page.
There are two ways to do this:
The best solution is URL hashes, as Google tends to ignore anything after the hash in the URL.
WP Grid Builder uses parameters, so after applying the filter, the URL changes to be something like this:
If you access that URL, you’ll see the filtered grid of tools is updated to reflect the applied filters.
In this instance, as I’m using URL parameters, I’ll also need to add a canonical tag to the version of the URL without parameters, so this URL:
Given that these parameter versions of URLs aren’t internally linked to and are much less likely to receive external links from other sites (which is the only way Google would discover them), we’re at low risk of them being ignored.
In some cases, a filtered version of a page may be helpful for search.
For example, there are filters for ‘Firefox’ and ‘Chrome’ on my SEO browser extensions page. Both of these pages have some search potential.
So we want to make sure that they have indexable URLs created. The best way to do that is by making sure you have alternate crawl paths to those pages. I’ve done that by adding sub-navigation links to indexable versions of those filter pages at the top of the page.
Those sub-collections are generated based upon the same attributes that create the faceted version of the page, but I have to ‘opt-in’ to making them.
This implementation has achieved a few things:
As you can see, this is significantly simpler to manage SEO-wise but doesn’t have any drawbacks.
So far, I’ve positioned faceted navigation as something that just causes SEO complications. However, you can also use faceted navigation as a way to get more traffic by pairing it with a long-tail keyword strategy.
I can’t understate how incredibly beneficial getting this right can be. Ahrefs data shows that 99.84% of keywords get fewer than 1,000 searches per month, and account for 39.33% total search demand:
Facet URLs are ideal for capturing long-tail traffic, given how facets create more specific versions of pages targeting broader queries.
First, I will run you through the steps to spot opportunities to capture more long-tail traffic with faceted navigation; then, I’ll explain some implementation considerations.
To start with, you’ll need to identify keyword opportunities with Ahrefs Keyword Explorer. Doing this is incredibly easy.
Enter the name of a category you already have on your site, like ‘high rise jeans.’
Head to the ‘Matching Terms’ report.
Use the terms sidebar and change over to ‘Parent Topics.’
By doing this, the tool will group all keywords with a similar SERP together. You can then scan this list and pick out potential facet pages that’d be worth making indexable. Here are some I’ve spotted from checking the screenshot above:
Next, we need to make these pages both crawlable and indexable to Google.
This can work in a few different ways depending on your type of faceted navigation.
If you’ve implemented faceted navigation that isn’t the ideal setup and does have internal links to each facet, for these URLs you’ll need to make sure that:
Precisely what you need to do above depends on your implementation, but the important part is that search engines can both crawl and index these pages.
You’ll need to create a sub-category page for the ideal faceted navigation setup mentioned in the previous section.
You’ll need to do this because the faceted navigation isn’t generating internal links, so you can’t use it to create these pages for you.
Most ecommerce platforms support creating sub-categories, but ideally, you want additional functionality to base the sub-categories products upon a filtered version of the parent category, mainly to save having to merchandise each sub-category manually. This way, you get the benefits of quickly generating pages like faceted navigation does while still circumventing SEO complications.
For example, if we’re creating a ‘high rise skinny jeans’ sub-category, we’d want to inherit the ‘high rise jeans’ product listings but only show products that also have the ‘skinny’ attribute applied.
This is an obvious one, but you’ll want to do the fundamental SEO optimizations, such as:
The main complications here tend to be with configurations where you’re opting a facet page out of the default indexing and crawling controls in place.
This is simply because, technically, facet pages are inherently dynamic and aren’t the same as creating a new sub-category.
Custom functionality would be required to ensure critical on-page optimizations are possible with faceted URLs.
Hopefully, now you’ve fully understood not just the inherent risks with faceted navigation for SEO but also the significant opportunities it presents to optimize for long-tail search.
Got a question on faceted navigation? Tweet me.
Source: ahrefs.com, originally published on 2021-09-09 11:00:30