Why Google Search Console Indexes Your Filter Pages as Separate URLs (And How to Stop It)

If you've built a Next.js blog with tag or category filters, you've probably opened Google Search Console one morning and found a pile of URLs you never meant to index: things like /blog?tag=react, /blog?category=nextjs, /blog?series=fullstack. GSC is reporting them as "Discovered - currently not indexed" or "Crawled - currently not indexed," and you're wondering how they got there.

This is a crawl budget leak, and it's more common than you'd think. Here's what's actually happening and the exact fixes I applied to my own portfolio.

Why This Happens

Googlebot is a link-following machine. When it crawls one of your blog posts, it reads every link on the page. If your blog post has a list of clickable tag chips that look like this:

<Link href={`/blog?tag=${encodeURIComponent(tag)}`}>
  #{tag}
</Link>

…then from Google's perspective, those are valid pages worth crawling. Every tag on every post becomes a new URL to discover and evaluate. If you have 30 posts with an average of 5 tags each, that's potentially 150 parameterized URLs appearing in GSC, all of them duplicates of /blog with a filter applied.

This isn't a bug in Googlebot. It's doing exactly what it's designed to do. The problem is that the links look like navigation to real, distinct pages.

The Wrong Fixes (and Why They Don't Work)

Before getting to the solution, it's worth understanding two approaches that seem right but create their own problems.

robots.txt Disallow

Adding disallow: /blog?* to your robots.txt looks like it should prevent Google from crawling filter URLs. And technically, it does. But it creates a worse problem.

If you're also using noindex meta tags or robots metadata in your Next.js generateMetadata to mark filtered pages as non-indexable, Google can only read those tags if it can actually crawl the page. Block it via robots.txt and you've made the noindex invisible. Google logs the URL as "Blocked by robots.txt" and may still list it in GSC, just without being able to confirm the noindex.

The two signals fight each other. Google's guidance is explicit: don't use robots.txt to block pages you're trying to noindex. Google needs to crawl the page to read the instruction not to index it.

noindex Alone (Without Stopping Discovery)

Setting index: false in your generateMetadata when filters are present is correct and necessary, but it doesn't stop Google from discovering the URLs. It only stops them from appearing in search results. GSC will still show them in coverage reports as "Excluded: noindex," which is less alarming but still wasted crawl budget.

If you have thousands of possible tag/category combinations, this compounds over time.

The Right Fix: Stop Discovery at the Source

The actual problem is that Googlebot is following links it shouldn't. The fix is rel="nofollow" on tag links, which tells crawlers not to follow the link at all, so the URL is never discovered.

<Link
  href={`/blog?tag=${encodeURIComponent(tag)}`}
  rel="nofollow"
>
  #{tag}
</Link>

Human users still click through to filtered views; Next.js client-side routing handles it normally. But Googlebot won't follow these links during crawling, so it never discovers the parameterized URLs.

This is the correct layer to fix it. Filter URLs aren't real pages with unique content. They're UI states. They should never have been crawlable in the first place.

Keep noindex As a Backstop

Even with rel="nofollow" on links, Google can still discover these URLs through other means: someone sharing a filter link on a forum, a sitemap misconfiguration, or a tool that follows JavaScript navigation. So you should still have noindex as a safety net.

In Next.js App Router, generateMetadata makes this clean:

export async function generateMetadata({ searchParams }: BlogPageProps): Promise<Metadata> {
  const params = await searchParams
  const isFiltered =
    params.tag ||
    params.category ||
    params.series ||
    (typeof params.page === 'string' && parseInt(params.page) >= 1)

  return {
    ...createPageMetadata({
      title: 'Blog',
      description: 'Technical articles on web development...',
      path: '/blog',
    }),
    ...(isFiltered && {
      robots: { index: false, follow: true },
    }),
  }
}

A few things to notice here:

follow: true on the noindex: You're telling Google not to index the page but still follow links on it. This is correct for filter pages because they contain links to real blog posts you do want indexed.

>= 1 not > 1 for page param: /blog?page=1 is identical content to /blog, so it should get noindex too. An easy mistake is checking > 1 which leaves page one as an indexable duplicate.

The canonical from createPageMetadata: The canonical URL always points to /blog regardless of what params are present. This is a belt-and-suspenders signal reinforcing the noindex.

Don't Use robots.txt to Disallow Filter URLs

After fixing the actual causes, remove any disallow: /blog?* from your robots.txt:

// robots.ts
disallow: [
  '/api/',
  '/_next/',
  // Do NOT add /blog?* here. It prevents Google from reading noindex.
]

Let the noindex metadata do this job. It was designed for it.

What to Expect in GSC After Deploying

GSC doesn't update overnight. Here's the rough timeline:

Days 1–7: No change. Google hasn't re-crawled the affected URLs yet.
Weeks 2–4: Parameterized URLs start appearing in "Excluded" with "noindex" reason instead of "Discovered." This is progress: Google crawled them, read the noindex, and respected it.
Month 2–3: As Google re-crawls your blog posts and finds no followed links to filter URLs, it stops seeing them as worth revisiting. They drop out of the coverage report.

If users have shared filter links externally (on Reddit, in newsletters, etc.), those URLs may persist longer in GSC since Google has external signals pointing to them. That's fine. The noindex will prevent them from appearing in search results regardless.

Summary

The root cause is that tag/category filter links look like regular navigation links to Googlebot. Three things fix it properly:

rel="nofollow" on filter links: stops discovery at the crawl layer
noindex in generateMetadata for filtered/paginated views: catches any URLs Google finds through other means
Remove disallow on query params from robots.txt: so noindex is readable when needed

Don't fight GSC with robots.txt. Tell it what you mean at the metadata layer and stop pointing it at URLs you don't want it to see.

This is a crawl budget leak, and it's more common than you'd think. Here's what's actually happening and the exact fixes I applied to my own portfolio.

Why This Happens

Googlebot is a link-following machine. When it crawls one of your blog posts, it reads every link on the page. If your blog post has a list of clickable tag chips that look like this:

<Link href={`/blog?tag=${encodeURIComponent(tag)}`}>
  #{tag}
</Link>

This isn't a bug in Googlebot. It's doing exactly what it's designed to do. The problem is that the links look like navigation to real, distinct pages.

The Wrong Fixes (and Why They Don't Work)

Before getting to the solution, it's worth understanding two approaches that seem right but create their own problems.

robots.txt Disallow

Adding disallow: /blog?* to your robots.txt looks like it should prevent Google from crawling filter URLs. And technically, it does. But it creates a worse problem.

The two signals fight each other. Google's guidance is explicit: don't use robots.txt to block pages you're trying to noindex. Google needs to crawl the page to read the instruction not to index it.

noindex Alone (Without Stopping Discovery)

If you have thousands of possible tag/category combinations, this compounds over time.

The Right Fix: Stop Discovery at the Source

The actual problem is that Googlebot is following links it shouldn't. The fix is rel="nofollow" on tag links, which tells crawlers not to follow the link at all, so the URL is never discovered.

<Link
  href={`/blog?tag=${encodeURIComponent(tag)}`}
  rel="nofollow"
>
  #{tag}
</Link>

This is the correct layer to fix it. Filter URLs aren't real pages with unique content. They're UI states. They should never have been crawlable in the first place.

Keep noindex As a Backstop

In Next.js App Router, generateMetadata makes this clean:

export async function generateMetadata({ searchParams }: BlogPageProps): Promise<Metadata> {
  const params = await searchParams
  const isFiltered =
    params.tag ||
    params.category ||
    params.series ||
    (typeof params.page === 'string' && parseInt(params.page) >= 1)

  return {
    ...createPageMetadata({
      title: 'Blog',
      description: 'Technical articles on web development...',
      path: '/blog',
    }),
    ...(isFiltered && {
      robots: { index: false, follow: true },
    }),
  }
}

A few things to notice here:

>= 1 not > 1 for page param: /blog?page=1 is identical content to /blog, so it should get noindex too. An easy mistake is checking > 1 which leaves page one as an indexable duplicate.

The canonical from createPageMetadata: The canonical URL always points to /blog regardless of what params are present. This is a belt-and-suspenders signal reinforcing the noindex.

Don't Use robots.txt to Disallow Filter URLs

After fixing the actual causes, remove any disallow: /blog?* from your robots.txt:

// robots.ts
disallow: [
  '/api/',
  '/_next/',
  // Do NOT add /blog?* here. It prevents Google from reading noindex.
]

Let the noindex metadata do this job. It was designed for it.

What to Expect in GSC After Deploying

GSC doesn't update overnight. Here's the rough timeline:

Days 1–7: No change. Google hasn't re-crawled the affected URLs yet.
Weeks 2–4: Parameterized URLs start appearing in "Excluded" with "noindex" reason instead of "Discovered." This is progress: Google crawled them, read the noindex, and respected it.
Month 2–3: As Google re-crawls your blog posts and finds no followed links to filter URLs, it stops seeing them as worth revisiting. They drop out of the coverage report.

Summary

The root cause is that tag/category filter links look like regular navigation links to Googlebot. Three things fix it properly:

rel="nofollow" on filter links: stops discovery at the crawl layer
noindex in generateMetadata for filtered/paginated views: catches any URLs Google finds through other means
Remove disallow on query params from robots.txt: so noindex is readable when needed

Don't fight GSC with robots.txt. Tell it what you mean at the metadata layer and stop pointing it at URLs you don't want it to see.

Why Google Search Console Indexes Your Filter Pages as Separate URLs (And How to Stop It)

Why This Happens

The Wrong Fixes (and Why They Don't Work)

robots.txt Disallow

noindex Alone (Without Stopping Discovery)

The Right Fix: Stop Discovery at the Source

Keep noindex As a Backstop

Don't Use robots.txt to Disallow Filter URLs

What to Expect in GSC After Deploying

Summary

Share this article

Recommended Reading

IONOS: Three Months, Two Websites, and a Cancellation That Wouldn't End

Syndicating Your Personal Portfolio Blog to Dev.to Without Losing SEO Credit

Building an Intelligent Portfolio Filtering System with Next.js and React Context

Why Google Search Console Indexes Your Filter Pages as Separate URLs (And How to Stop It)

Why This Happens

The Wrong Fixes (and Why They Don't Work)

robots.txt Disallow

noindex Alone (Without Stopping Discovery)

The Right Fix: Stop Discovery at the Source

Keep noindex As a Backstop

Don't Use robots.txt to Disallow Filter URLs

What to Expect in GSC After Deploying

Summary

Share this article

Recommended Reading

IONOS: Three Months, Two Websites, and a Cancellation That Wouldn't End

Syndicating Your Personal Portfolio Blog to Dev.to Without Losing SEO Credit

Building an Intelligent Portfolio Filtering System with Next.js and React Context