When AI Bots Can’t See You: Debugging AI Crawl & Indexing Issues

As a web developer, getting your site indexed by search engines is critical. But what happens when even an AI assistant, built to explore the open web, can’t “see” your content? That was the bizarre and frustrating journey I recently embarked on while trying to ensure aiprofiles.co.uk was fully accessible to AI crawlers.

AI Profiles is built to deliver semantically rich, structured data for LLMs like Gemini and ChatGPT, creating detailed, AI-friendly business profiles, so the irony of it being invisible to some AI crawlers wasn’t lost on me.

In the end, I had to move my DNS from Cloudflare to Amazon Route 53 to make it work.

Chapter 1: The Mysterious “Content Not Available” Error

My attempts to ask Gemini if it could access the site was met with a polite but unhelpful reply along the lines of:

I am sorry, but I was unable to access the URL you provided. The website may be restricted by a paywall, require a login, or have other technical barriers that prevent me from accessing its content.

The most likely cause is that your site’s server-side rendering is tied to a session and is setting cookies for every request. My tool, like many crawlers, does not maintain a session or accept cookies, which prevents the page from fully rendering.

To ensure your site is crawlable, it’s crucial that it serves a static, cookie-less, and session-independent version of the homepage to bots.

The site was live, working in every browser, and ChatGPT could access it just fine. This discrepancy was the first clue that something unusual was happening.

Initial Suspects (False Leads)

robots.txt
This is always the first place to check. To my surprise, the robots.txt served online contained numerous rules I hadn’t defined—some explicitly blocking bots. My local file was correct, which meant something upstream was altering the response. A closer look revealed a Cloudflare beta feature called “Instruct AI bot traffic with robots.txt”. It had been automatically enabled.

While the intention (helping manage AI traffic) is understandable, silently injecting rules into my robots.txt felt like overreach. I disabled the feature immediately. Problem solved? Not quite.

Gemini still couldn’t access the site. Digging deeper into the Cloudflare dashboard, I found another setting: Block AI bots. I disabled it too—even though ChatGPT hadn’t been blocked by it. Cloudflare offers several new tools to control AI bots, and for most site owners these are welcome protections. But for a site designed to feed structured data to LLMs, they were working against me.

I’m a big fan of Cloudflare. It’s an outstanding service, and the quality of what they provide, even on the free plan, is astonishing. I particularly love the aggressive caching options and the powerful DDoS/security features that are just a few clicks away. It was an obvious choice to manage DNS for aiprofiles.co.uk.

(For a good explainer of Cloudflare’s AI-bot features, see this video.)

Chapter 2: The Cache-Control Conundrum and `mod_pagespeed`

Next, I turned to the HTTP headers. My pages were serving two conflicting Cache-Control headers:

1. Cache-Control: max-age=3600, public
2. Cache-Control: max-age=0, no-cache, s-maxage=10

When multiple headers conflict, no-cache usually wins, telling browsers and crawlers not to cache the content.

After some digging, the culprit emerged: Apache’s mod_pagespeed module was injecting the second, problematic header.

Fixes

Disable mod_pagespeed: Turning it off immediately removed the extra header. I might re-enable at some point and specifically configure it to just turn off these expires headers. But for now, switching off seemed the easiest fix.

Use Laravel’s built-in middleware: Instead of custom code, I used Laravel’s SetCacheHeaders middleware for a single, unambiguous header:

Route::get('/', [PageController::class, 'home'])
    ->name('home')
    ->middleware('cache.headers:public;max_age=3600');

With mod_pagespeed disabled and a clean header in place, Cloudflare began returning cf-cache-status: HIT. Unfortunately, Gemini still couldn’t reach the site.

Chapter 3: Cookies, Livewire, and Phantom Sessions

Even with proper caching and a friendly robots.txt, Gemini still reported “content not available.” My browser’s network tab revealed a Set-Cookie header for XSRF-TOKEN and aiprofiles_session, showing that Laravel was creating a session for every request—even for bots.

Gemini suggested this could be the reason for it being unable to access the site. Crawlers don’t like content that they consider might be personal (I’m not sure that’s true, but that’s what they told me), and the presence of session cookies suggested the content was specific to a user.

I considered serving a completely static version of the page before Livewire or session logic even loaded. A custom middleware might have looked like this:

class BotResponse
{
    public function handle(Request $request, Closure $next)
    {
        $userAgent = $request->header('User-Agent');
        $knownBots = ['Googlebot', 'ChatGPT-User', 'meta-externalagent'];

        foreach ($knownBots as $bot) {
            if (str_contains($userAgent, $bot)) {
                return response()
                    ->view('pages.static-bot-profile')
                    ->header('Cache-Control', 'public, max-age=3600');
            }
        }

        return $next($request);
    }
}

While technically valid, this felt too close to cloaking, which search engines can penalize. I kept looking.

Chapter 4: The Final Cloudflare Battle

To isolate the problem, I created a simple test.html page—no Laravel, no cookies. Gemini still couldn’t access it. That pointed squarely to Cloudflare itself.

Checking Security → Events in the Cloudflare dashboard revealed the smoking gun: Facebook’s crawler (meta-externalagent) had been blocked by Managed Rules → Suspicious Activity. Cloudflare’s WAF and bot-management features were silently intercepting requests from newer AI crawlers.

I tried creating custom rules to override the managed ones, but Cloudflare’s default posture toward emerging AI traffic is understandably cautious. For my use case—where open access is essential—this was an uphill battle.

The Ultimate Solution: DNS Migration

In the end, I bypassed Cloudflare entirely. I changed my domain’s nameservers to Amazon Route 53, removing Cloudflare’s proxy and security layer. Once DNS propagation completed, Gemini confirmed full access.

Lessons Learned

Cloudflare remains an excellent service, but its strength in security can be a double-edged sword. If your business model depends on open AI or bot access, Cloudflare’s default rules may work against you.

Always test with multiple crawlers (Googlebot, Gemini, ChatGPT, etc.).
Check the served robots.txt, not just the file on disk.
Inspect headers for unexpected Cache-Control values.
Review Cloudflare’s AI/bot settings and managed firewall logs.
Don’t assume that “it works for me” means it works for everyone.

For this particular site, Route 53 was the simplest path to reliability. But for most projects, Cloudflare is still a fantastic platform—just be aware of the invisible battles it might be fighting on your behalf.

Get In Touch

When AI Bots Can’t See You: Debugging AI Crawl & Indexing Issues

Chapter 1: The Mysterious “Content Not Available” Error

Initial Suspects (False Leads)

Chapter 2: The Cache-Control Conundrum and `mod_pagespeed`

Fixes

Chapter 3: Cookies, Livewire, and Phantom Sessions

Chapter 4: The Final Cloudflare Battle

The Ultimate Solution: DNS Migration

Lessons Learned

Further Reading

ChatGPT Vs Gemini: Which coding assistant is better? [2025 update]

AIProfiles Launches to Help Businesses Become AI-Ready and More Discoverable

How We Helped a Client Cut Their AWS Bill by Over 85%

Services

Ready to
get started

When AI Bots Can’t See You: Debugging AI Crawl & Indexing Issues

Chapter 1: The Mysterious “Content Not Available” Error

Initial Suspects (False Leads)

Chapter 2: The Cache-Control Conundrum and mod_pagespeed

Fixes

Chapter 3: Cookies, Livewire, and Phantom Sessions

Chapter 4: The Final Cloudflare Battle

The Ultimate Solution: DNS Migration

Lessons Learned

Further Reading

Related Articles

Share

ChatGPT Vs Gemini: Which coding assistant is better? [2025 update]

AIProfiles Launches to Help Businesses Become AI-Ready and More Discoverable

How We Helped a Client Cut Their AWS Bill by Over 85%

Chapter 2: The Cache-Control Conundrum and `mod_pagespeed`