I was browsing Instagram recently and a reel popped up that grabbed my attention. It was all about NLWeb and how it was the future of the web. I don’t remember specifically why, but the more I heard about the project, the more it piqued my interest.
I’d also seen another reel suggesting that websites might not be important going forward on the agentic web. I’ve noticed traffic from Google has been dropping over the past year, and I think a large part of that is probably due to people’s changing habits for finding the information they want. More people (myself included) are going to chat clients like ChatGPT to find both information and resources.
So, what is NLWeb?
“NLWeb is an open project developed by Microsoft that aims to make it simple to create a rich, natural language interface for websites using the model of their choice and their own data. Our goal is for NLWeb, short for Natural Language Web, to be the fastest and easiest way to effectively turn your website into an AI app, allowing users to query the contents of the site by directly using natural language, just like with an AI assistant or Copilot.”
Sounds good so far. The project itself is developed by the guy behind the RSS, RDF and Schema.org web standards – R.V. Guha. What I also really liked was the fact it uses existing structured data, such as Schema.org, which felt both familiar and easy to understand.
“Every NLWeb instance is also a Model Context Protocol (MCP) server, allowing websites to make their content discoverable and accessible to agents and other participants in the MCP ecosystem if they choose. Ultimately, we believe NLWeb can play a similar role to HTML in the emerging agentic web.”
As a big-time user of AI chat clients myself, I’m very interested in optimising our sites to maximise their appearance in AI responses, so this second part sounds ideal.
Getting Started
The whole project is available to clone in Github. The suggested first step is to get the project set up locally. It’s largely a Python project, so for me, that meant installing Python. Thanks to ChatGPT instructions, this was fairly straightforward. I’m on Windows 11.
✅ Step 1: Download Python
Go to the official Python website:
https://www.python.org/downloads/windows/
✅ Step 2: Update the PATH
I wasn’t asked during install if I wanted to add Python to the PATH. However, I knew I needed to in order to run Python from anywhere on my system.
For me it installed in:
C:\Users\myuser\AppData\Local\Programs\Python\Python313
Once that was in my path, I was able to run Python from the command prompt.
What to Put in Your .env File
Step 4 of their guide tells you to copy the .env.template
file and create your own .env
file – fine. However, it happily asks you to complete the required variables without really providing much help. I’ve used the OpenAI API before, so this wasn’t a massive jump, but perhaps a little more of a guide about how to get an API key might have been useful.
If you don’t know how to get an API key:
- Sign up or log in at https://platform.openai.com/signup
- Go to API Keys
- Click Create new secret key. Copy and store it safely.
- Paste it into your
.env
file:
OPENAI_API_KEY=your-key-here
Missing Requirements
Step 5 and 6 is where the problems began. There are a couple of gotchas if you’re not using the default Azure OpenAI LLM.
Update your config files in code/config
:
config_llm.yaml
: Update the first line toopenai
.config_embedding.yaml
: Also update the provider.config_retrieval.yaml
: Change this toqdrant_local
.
I ran check_connectivity.py
, but got:
Error importing required libraries: No module named 'openai'
Please run: pip install -r requirements.txt
I had already run it, but you need to uncomment this line in requirements.txt
:
# openai>=1.12.0
I did this and re-ran pip install -r requirements.txt
Missing module named ‘pywintypes’
I ran the check_connectivity.py
script again. Another error:
Failed to load client for qdrant: No module named 'pywintypes'
Solution:
- Go to
...\site-packages\pywin32_system32
- Copy
pywintypes313.dll
andpythoncom313.dll
- Paste them into
...\site-packages\win32\
Then test in Python:
python
import pywintypes
I did this from inside the code subdirectory as described in the setup guide.
Retriever API connectivity check failed for qdrant_local
The check failed because the collection nlweb_collection
doesn’t yet exist.
Despite failing the check_connectivity.py
test, I felt the following step in the guide, explaining how to ingest data, would probably offer the solution, so I pressed on with the guide to ingest data:
python -m tools.db_load https://feeds.libsyn.com/121695/rss Behind-the-Tech
Then you can fire up the webserver:
python app-file.py
So, What Did I Think?
The webserver was fired up at localhost:8000, and I visited it in Chrome to check it out. A cool little chat window was presented to me, and I found myself completely at a loss as to what to ask… I checked the source material and thought a question like “Tell me something about Kevin Scott…” might be a good start. I got a bunch of results back, but rather than a single nicely worded response in natural language, it seemed more like a list of search results I had to try and interpret myself. I suspect this particular UI was geared more towards transparency than polish and was more search-like than chat-like. It was also really slow to respond. I tried a few other queries and got no response at all.
At this point – after all the promise and effort of getting it set up – I was a little underwhelmed. There are a number of other static UIs available, so once I’d managed to locate the filenames manually (by finding the document root and seeing the names of the .html files), I tried a few out, asking similarly basic questions about Kevin and sometimes Bill (this is a Microsoft project, after all).
My favourite interface was static/str_chat.html
. It allowed me to choose the mode of response and attempted to provide a summary at the top, rather than just a random list of “results.”
A Word of Warning
What I didn’t realise, however, was that behind the scenes, my console was going crazy with errors about reaching rate limits on the API. That’s why it was going slow:
LLM Error (openai): RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4.1 in organization org-sdfsdfsd on tokens per min (TPM): Limit 30000, Used 30000, Requested 623. Please try again in 1.246s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}
2025-06-18 10:25:58,497 - llm_wrapper - ERROR - error:151 - Error with provider openai: LLM call failed: RateLimitError: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4.1 in organization org-sdfsdfsd on tokens per min (TPM): Limit 30000, Used 30000, Requested 623. Please try again in 1.246s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}
It was also racking up the usage in my OpenAI account. I’d ever so quickly used up over 1M+ tokens across over 1,000 requests – just from a few quick test searches. I wasn’t aware how much data would get passed with each query. The token concept in general is confusing, but I hadn’t expected it to hit the API this hard – straight out of the box. My usage was showing around $0.50 at this point, and it kept going up as I refreshed the page.
Slightly alarmed, I quickly disabled my API key and stopped the webserver. I checked my OpenAI billing account in a slight panic and noticed there was at least a $120 hard usage limit, so presumably that would be the worst-case scenario. It’s been a few hours now and the usage is at $0.85. So I think I’ve hopefully got away with it. I’m just glad I noticed before hitting many more pointless queries about Kevin and Bill.
What Have I Learned?
This isn’t something I’ll be rolling out anytime soon, and my foray into the world of NLWeb might go on pause for the moment. I love the concept, and I’ll continue to embrace and enhance the schema.org markup we have on our sites, making that available to any bot that wishes to utilise it. But there is no way I could ever give users on my website such an easy way to abuse my credit card. Bots alone could drain even a healthy budget dry in hours. I appreciate this is just a test environment, and on production you could put safeguards in place to limit quota. But I’m not even sure I have a budget for this, let-alone one that costs this much.
It also shows me how big corporations – and the cloud infrastructure in general – are happy to think of new ways to separate you from your hard-earned cash. Maybe this rate of spending is fine for big corporations like Microsoft, but it’s certainly nothing I, or most of our clients, would be happy with.
I think the project has great potential, and the idea in general is a good one – but I’m going to need to learn a lot more about solutions that are far more cost-effective, or perhaps entirely free, before we’re able to launch these kind of features on our sites. One possible solution might be to not use an LLM API at all, but instead run a small language model locally, something like Mistral, Gemma, or LM Studio, which would be free once set up.
This is a new area for me, and there’s clearly still lots to learn. I’ll be watching this project going forward to see how things develop.