Skip to main content

Adding Knowledge to Your Agent

Out of the box, AI models only know what they were trained on β€” general knowledge up to their training cutoff date. But what if you want your AI to answer questions about YOUR company's policies, YOUR product manual, or YOUR research papers? That's where Document Stores come in.

What is a Document Store?​

A Document Store is like a smart filing cabinet for your AI. You put documents in, and the AI can quickly search through them to find relevant information when answering questions. This technique is called RAG (Retrieval-Augmented Generation).

The RAG Process in Plain English​

Here's what happens behind the scenes:

1. πŸ“„ You upload a document (e.g., a 50-page PDF)
↓
2. βœ‚οΈ The document gets split into small chunks (like paragraphs)
↓
3. πŸ”’ Each chunk gets converted into numbers (embeddings)
↓
4. πŸ“¦ Those numbers are stored in a vector database
↓
5. ❓ When someone asks a question...
↓
6. πŸ” The question is also converted to numbers
↓
7. 🎯 The system finds the chunks most similar to the question
↓
8. 🧠 The AI reads those chunks and writes an answer

Step-by-Step: Creating a Document Store​

Step 1: Create the Store​

  1. Click "Document Stores" in the left sidebar
  2. Click "+ Add New"
  3. Give it a descriptive name (e.g., "Company HR Policies" or "Product Manual")
  4. Click "Save"

Step 2: Add a Document Loader​

  1. Click on your newly created Document Store to open it

  2. Click "+ Add Document Loader"

  3. Choose the type that matches your document:

    • PDF File: For PDF documents
    • DOCX File: For Word documents
    • Text File: For plain text files
    • Web Scraper (Cheerio/Playwright): For web pages
    • CSV File: For spreadsheet data
    • And many more...
  4. Upload your file or enter the URL

Adding Metadata

You can add custom metadata to your documents (like department: "HR" or version: "2024"). This helps you filter and organize your data later.

Step 3: Choose a Text Splitter​

The text splitter breaks your document into smaller, searchable pieces. This is important because:

  • AI models have a limit on how much text they can process at once
  • Smaller chunks mean more precise search results
  • Overlap between chunks ensures no information falls through the cracks

Recommended settings for most documents:

  • Splitter: Recursive Character Text Splitter
  • Chunk Size: 1500 characters (about 1-2 paragraphs)
  • Chunk Overlap: 200 characters (ensures context isn't lost between chunks)
How to Choose Chunk Size
  • Short, factual content (FAQs, product specs): Use smaller chunks (500-1000)
  • Long, narrative content (policies, manuals): Use larger chunks (1500-2000)
  • Technical documentation: Use medium chunks (1000-1500)

Step 4: Preview Your Chunks​

Before processing, click "Preview" to see how your document will be split. Check that:

  • Important information isn't cut in half
  • Chunks are a reasonable size
  • Metadata is attached correctly

If the chunks don't look right, adjust the chunk size and overlap, then preview again.

Step 5: Process the Document​

Click "Process" to split your document into chunks. You'll see a list of all the chunks with their content and metadata. You can:

  • Edit individual chunks to fix errors
  • Delete irrelevant chunks
  • Add new chunks manually

Step 6: Configure Upserting​

Now you need to store these chunks in a searchable format:

  1. Click "Upsert" (or the upsert configuration button)
  2. Select Embeddings: Choose an embedding model
    • OpenAI Embeddings (text-embedding-ada-002): Most popular, works great
    • Google Embeddings: Good alternative
  3. Select Vector Store: Choose where to store the data
    • In-Memory: Quick for testing (data disappears on restart)
    • Pinecone: Cloud-based, production-ready
    • Qdrant: High-performance vector database
    • Upstash: Serverless, easy to set up
  4. Record Manager (optional): Prevents duplicate data when you re-upsert

Step 7: Upsert!​

Click the "Upsert" button. This will:

  1. Convert all your chunks into number representations (embeddings)
  2. Store them in your chosen vector database
  3. Make them searchable by your AI agents

Step 8: Test Your Knowledge Base​

Click the "Retrieval Query" button to test:

  1. Type a question related to your document
  2. See which chunks are returned
  3. Verify the results are relevant

Using Your Document Store in Flows​

  1. Add an Agent node to your flow
  2. In the Agent's settings, scroll to Knowledge (Document Stores)
  3. Click "+ Add" and select your Document Store
  4. Write a description of what the knowledge contains (this helps the AI know when to search it)

In a Chatflow​

  1. Add a Document Store (Vector) node or the specific Vector Store node
  2. Connect it to your chain along with the matching embedding model

Tips for Better Knowledge Bases​

1. Quality In = Quality Out​

The AI can only be as good as the documents you give it. Make sure your documents are:

  • Accurate and up-to-date
  • Well-organized with clear headings
  • Free of irrelevant content (remove headers, footers, page numbers if possible)

2. Write Good Knowledge Descriptions​

When adding a Document Store to an Agent, the description tells the AI when to search it. Be specific:

❌ Bad: "Company information" βœ… Good: "Contains the complete HR policy handbook for 2024, including leave policies, benefits, code of conduct, and employee onboarding procedures"

3. Use Multiple Document Stores​

Instead of putting everything in one store, organize by topic:

  • "Product Catalog" β€” for product questions
  • "HR Policies" β€” for employee questions
  • "Technical Documentation" β€” for technical support

4. Keep Documents Updated​

When your source documents change, re-upsert to keep the knowledge base current. You can use the Refresh API to automate this.

5. Test with Real Questions​

After setting up, test with the actual questions your users will ask. If the AI gives wrong answers, check:

  • Are the right chunks being retrieved? (Use the Retrieval Query test)
  • Is the chunk size appropriate?
  • Does the document actually contain the answer?

What's Next?​

Your agent now has knowledge! Next, let's give it the ability to take action by connecting tools and APIs.

πŸ‘‰ Connecting Tools & APIs