As a web developer, getting your site indexed by search engines is critical. But what happens when even an AI assistant, built to explore the open web, can’t “see” your content? That was the bizarre and frustrating journey I recently embarked on while trying to ensure aiprofiles.co.uk was fully accessible to AI crawlers.
AI Profiles is built to deliver semantically rich, structured data for LLMs like Gemini and ChatGPT, creating detailed, AI-friendly business profiles, so the irony of it being invisible to some AI crawlers wasn’t lost on me.
In the end, I had to move my DNS from Cloudflare to Amazon Route 53 to make it work.
Chapter 1: The Mysterious “Content Not Available” Error
My attempts to ask Gemini if it could access the site was met with a polite but unhelpful reply along the lines of:
I am sorry, but I was unable to access the URL you provided. The website may be restricted by a paywall, require a login, or have other technical barriers that prevent me from accessing its content.
The most likely cause is that your site’s server-side rendering is tied to a session and is setting cookies for every request. My tool, like many crawlers, does not maintain a session or accept cookies, which prevents the page from fully rendering.
To ensure your site is crawlable, it’s crucial that it serves a static, cookie-less, and session-independent version of the homepage to bots.
The site was live, working in every browser, and ChatGPT could access it just fine. This discrepancy was the first clue that something unusual was happening.
Initial Suspects (False Leads)
robots.txt
This is always the first place to check. To my surprise, the robots.txt
served online contained numerous rules I hadn’t defined—some explicitly blocking bots. My local file was correct, which meant something upstream was altering the response. A closer look revealed a Cloudflare beta feature called “Instruct AI bot traffic with robots.txt”. It had been automatically enabled.

While the intention (helping manage AI traffic) is understandable, silently injecting rules into my robots.txt felt like overreach. I disabled the feature immediately. Problem solved? Not quite.
Gemini still couldn’t access the site. Digging deeper into the Cloudflare dashboard, I found another setting: Block AI bots. I disabled it too—even though ChatGPT hadn’t been blocked by it. Cloudflare offers several new tools to control AI bots, and for most site owners these are welcome protections. But for a site designed to feed structured data to LLMs, they were working against me.
I’m a big fan of Cloudflare. It’s an outstanding service, and the quality of what they provide, even on the free plan, is astonishing. I particularly love the aggressive caching options and the powerful DDoS/security features that are just a few clicks away. It was an obvious choice to manage DNS for aiprofiles.co.uk.
(For a good explainer of Cloudflare’s AI-bot features, see this video.)
Chapter 2: The Cache-Control Conundrum and mod_pagespeed
Next, I turned to the HTTP headers. My pages were serving two conflicting Cache-Control
headers:
1. Cache-Control: max-age=3600, public
2. Cache-Control: max-age=0, no-cache, s-maxage=10
When multiple headers conflict, no-cache
usually wins, telling browsers and crawlers not to cache the content.
After some digging, the culprit emerged: Apache’s mod_pagespeed
module was injecting the second, problematic header.
Fixes
Disable mod_pagespeed
: Turning it off immediately removed the extra header. I might re-enable at some point and specifically configure it to just turn off these expires headers. But for now, switching off seemed the easiest fix.
Use Laravel’s built-in middleware: Instead of custom code, I used Laravel’s SetCacheHeaders
middleware for a single, unambiguous header:
Route::get('/', [PageController::class, 'home'])
->name('home')
->middleware('cache.headers:public;max_age=3600');
With mod_pagespeed
disabled and a clean header in place, Cloudflare began returning cf-cache-status: HIT
. Unfortunately, Gemini still couldn’t reach the site.
Chapter 3: Cookies, Livewire, and Phantom Sessions
Even with proper caching and a friendly robots.txt, Gemini still reported “content not available.” My browser’s network tab revealed a Set-Cookie
header for XSRF-TOKEN
and aiprofiles_session
, showing that Laravel was creating a session for every request—even for bots.
Gemini suggested this could be the reason for it being unable to access the site. Crawlers don’t like content that they consider might be personal (I’m not sure that’s true, but that’s what they told me), and the presence of session cookies suggested the content was specific to a user.
I considered serving a completely static version of the page before Livewire or session logic even loaded. A custom middleware might have looked like this:
class BotResponse
{
public function handle(Request $request, Closure $next)
{
$userAgent = $request->header('User-Agent');
$knownBots = ['Googlebot', 'ChatGPT-User', 'meta-externalagent'];
foreach ($knownBots as $bot) {
if (str_contains($userAgent, $bot)) {
return response()
->view('pages.static-bot-profile')
->header('Cache-Control', 'public, max-age=3600');
}
}
return $next($request);
}
}
While technically valid, this felt too close to cloaking, which search engines can penalize. I kept looking.
Chapter 4: The Final Cloudflare Battle
To isolate the problem, I created a simple test.html
page—no Laravel, no cookies. Gemini still couldn’t access it. That pointed squarely to Cloudflare itself.
Checking Security → Events in the Cloudflare dashboard revealed the smoking gun: Facebook’s crawler (meta-externalagent
) had been blocked by Managed Rules → Suspicious Activity. Cloudflare’s WAF and bot-management features were silently intercepting requests from newer AI crawlers.
I tried creating custom rules to override the managed ones, but Cloudflare’s default posture toward emerging AI traffic is understandably cautious. For my use case—where open access is essential—this was an uphill battle.
The Ultimate Solution: DNS Migration
In the end, I bypassed Cloudflare entirely. I changed my domain’s nameservers to Amazon Route 53, removing Cloudflare’s proxy and security layer. Once DNS propagation completed, Gemini confirmed full access.
Lessons Learned
Cloudflare remains an excellent service, but its strength in security can be a double-edged sword. If your business model depends on open AI or bot access, Cloudflare’s default rules may work against you.
- Always test with multiple crawlers (Googlebot, Gemini, ChatGPT, etc.).
- Check the served
robots.txt
, not just the file on disk. - Inspect headers for unexpected
Cache-Control
values. - Review Cloudflare’s AI/bot settings and managed firewall logs.
- Don’t assume that “it works for me” means it works for everyone.
For this particular site, Route 53 was the simplest path to reliability. But for most projects, Cloudflare is still a fantastic platform—just be aware of the invisible battles it might be fighting on your behalf.
Further Reading
- Cloudflare Docs: Cloudflare bot solutions – Official guide to controlling AI bots and understanding Cloudflare’s new bot management tools.
- Cloudflare Community Discussions on AI Bot Blocking – Real-world cases of Cloudflare unintentionally blocking AI crawlers.
- AWS Guide: Migrating DNS from Cloudflare to Amazon Route 53 – Step-by-step instructions for moving your DNS to Route 53.
- Amazon Route 53 Developer Guide – In-depth documentation on managing DNS in Route 53 after migration.
- OpenAI GPTBot Documentation – Learn how to allow or block OpenAI’s web crawler for AI model training.
- Google Search Central: Overview of Google Crawlers – Understand how different bots (including AI-related ones) index websites.
- List of Google’s common crawlers – Google’s common crawlers are used to find information for building Google’s search indexes, perform other product specific crawls, and for analysis.
- Cloudflare Blog: Bots & AI Scraping – Cloudflare’s own perspective on blocking, managing, and allowing AI scraping.