
Why Google Search Console Indexes Your Filter Pages as Separate URLs (And How to Stop It)
If GSC is reporting /blog?tag=react and /blog?category=nextjs as separate discovered pages, your site has a crawl budget leak. Here's exactly what causes it in Next.js and how to fix it properly.
If you've built a Next.js blog with tag or category filters, you've probably opened Google Search Console one morning and found a pile of URLs you never meant to index: things like /blog?tag=react, /blog?category=nextjs, /blog?series=fullstack. GSC is reporting them as "Discovered - currently not indexed" or "Crawled - currently not indexed," and you're wondering how they got there.
This is a crawl budget leak, and it's more common than you'd think. Here's what's actually happening and the exact fixes I applied to my own portfolio.
Why This Happens
Googlebot is a link-following machine. When it crawls one of your blog posts, it reads every link on the page. If your blog post has a list of clickable tag chips that look like this:
<Link href={`/blog?tag=${encodeURIComponent(tag)}`}>
#{tag}
</Link>
…then from Google's perspective, those are valid pages worth crawling. Every tag on every post becomes a new URL to discover and evaluate. If you have 30 posts with an average of 5 tags each, that's potentially 150 parameterized URLs appearing in GSC, all of them duplicates of /blog with a filter applied.
This isn't a bug in Googlebot. It's doing exactly what it's designed to do. The problem is that the links look like navigation to real, distinct pages.
The Wrong Fixes (and Why They Don't Work)
Before getting to the solution, it's worth understanding two approaches that seem right but create their own problems.
robots.txt Disallow
Adding disallow: /blog?* to your robots.txt looks like it should prevent Google from crawling filter URLs. And technically, it does. But it creates a worse problem.
If you're also using noindex meta tags or robots metadata in your Next.js generateMetadata to mark filtered pages as non-indexable, Google can only read those tags if it can actually crawl the page. Block it via robots.txt and you've made the noindex invisible. Google logs the URL as "Blocked by robots.txt" and may still list it in GSC, just without being able to confirm the noindex.
The two signals fight each other. Google's guidance is explicit: don't use robots.txt to block pages you're trying to noindex. Google needs to crawl the page to read the instruction not to index it.
noindex Alone (Without Stopping Discovery)
Setting index: false in your generateMetadata when filters are present is correct and necessary, but it doesn't stop Google from discovering the URLs. It only stops them from appearing in search results. GSC will still show them in coverage reports as "Excluded: noindex," which is less alarming but still wasted crawl budget.
If you have thousands of possible tag/category combinations, this compounds over time.
The Right Fix: Stop Discovery at the Source
The actual problem is that Googlebot is following links it shouldn't. The fix is rel="nofollow" on tag links, which tells crawlers not to follow the link at all, so the URL is never discovered.
<Link
href={`/blog?tag=${encodeURIComponent(tag)}`}
rel="nofollow"
>
#{tag}
</Link>
Human users still click through to filtered views; Next.js client-side routing handles it normally. But Googlebot won't follow these links during crawling, so it never discovers the parameterized URLs.
This is the correct layer to fix it. Filter URLs aren't real pages with unique content. They're UI states. They should never have been crawlable in the first place.
Keep noindex As a Backstop
Even with rel="nofollow" on links, Google can still discover these URLs through other means: someone sharing a filter link on a forum, a sitemap misconfiguration, or a tool that follows JavaScript navigation. So you should still have noindex as a safety net.
In Next.js App Router, generateMetadata makes this clean:
export async function generateMetadata({ searchParams }: BlogPageProps): Promise<Metadata> {
const params = await searchParams
const isFiltered =
params.tag ||
params.category ||
params.series ||
(typeof params.page === 'string' && parseInt(params.page) >= 1)
return {
...createPageMetadata({
title: 'Blog',
description: 'Technical articles on web development...',
path: '/blog',
}),
...(isFiltered && {
robots: { index: false, follow: true },
}),
}
}
A few things to notice here:
follow: true on the noindex: You're telling Google not to index the page but still follow links on it. This is correct for filter pages because they contain links to real blog posts you do want indexed.
>= 1 not > 1 for page param: /blog?page=1 is identical content to /blog, so it should get noindex too. An easy mistake is checking > 1 which leaves page one as an indexable duplicate.
The canonical from createPageMetadata: The canonical URL always points to /blog regardless of what params are present. This is a belt-and-suspenders signal reinforcing the noindex.
Don't Use robots.txt to Disallow Filter URLs
After fixing the actual causes, remove any disallow: /blog?* from your robots.txt:
// robots.ts
disallow: [
'/api/',
'/_next/',
// Do NOT add /blog?* here. It prevents Google from reading noindex.
]
Let the noindex metadata do this job. It was designed for it.
What to Expect in GSC After Deploying
GSC doesn't update overnight. Here's the rough timeline:
- Days 1–7: No change. Google hasn't re-crawled the affected URLs yet.
- Weeks 2–4: Parameterized URLs start appearing in "Excluded" with "noindex" reason instead of "Discovered." This is progress: Google crawled them, read the noindex, and respected it.
- Month 2–3: As Google re-crawls your blog posts and finds no followed links to filter URLs, it stops seeing them as worth revisiting. They drop out of the coverage report.
If users have shared filter links externally (on Reddit, in newsletters, etc.), those URLs may persist longer in GSC since Google has external signals pointing to them. That's fine. The noindex will prevent them from appearing in search results regardless.
Summary
The root cause is that tag/category filter links look like regular navigation links to Googlebot. Three things fix it properly:
rel="nofollow"on filter links: stops discovery at the crawl layernoindexingenerateMetadatafor filtered/paginated views: catches any URLs Google finds through other means- Remove
disallowon query params from robots.txt: so noindex is readable when needed
Don't fight GSC with robots.txt. Tell it what you mean at the metadata layer and stop pointing it at URLs you don't want it to see.
Share this article

Ryan VerWey
Full-stack developer, Army veteran, and founder of Echo Effect LLC. Currently serving as CTO at Ratespedia and building enterprise software for USSOCOM. Nearly two decades of shipping real products across defense, fintech, and the open web. More about Ryan or see the work.
Recommended Reading

IONOS: Three Months, Two Websites, and a Cancellation That Wouldn't End
After 3+ months of emails and phone calls, I finally escaped IONOS. Here's a first-hand account of their cancellation process and why the pattern of complaints against them isn't just noise.

Syndicating Your Personal Portfolio Blog to Dev.to Without Losing SEO Credit
How to distribute your portfolio blog posts to Dev.to using RSS, canonical tags, and a repeatable publish-then-syndicate workflow that protects your original content from being outranked by its own reposts.

Building an Intelligent Portfolio Filtering System with Next.js and React Context
A deep dive into creating a sophisticated, user-centric filtering system that uses intent-based presets, persistent state management, and subtle UI patterns to guide visitors through portfolio content without overwhelming them.