Skip to main content

Train on your website

Train on Your Website

Train your AI agent using web-based content from any website. You have two flexible approaches to add URLs for training:

Quick Start: Add URLs Directly

The fastest way to begin training is by adding URLs directly:

  1. Enter the website URL you want to train on (must start with https://)
  2. Click the "+" button to add a new URL source
  3. Click "Start Training" to begin immediately

This method is perfect when you know exactly which pages you want to include and want to start training right away.

URL quick traing

Advanced: Website Crawling

For more comprehensive training, use the website crawler to automatically discover and select multiple pages from a website:

Step 1: Enter Base URL

  1. Enter the base URL of the website you want to crawl
  2. Click "Crawl Website" to open the crawler settings
URL crawling

Step 2: Configure Crawl Settings

The crawler offers several configuration options:

Sitemap Options

  • Use Sitemap: Enable this to automatically find and use the website's sitemap
  • Custom Sitemap URL: Provide a specific sitemap URL if you know it
  • If no sitemap is provided, the system will attempt to guess the sitemap location

Crawling Options

  • Enable Crawling: Toggle this to crawl linked pages from the base URL
  • URL Match Patterns: Specify patterns to match specific URLs (e.g., /blog/*, /docs/*)
  • Cookies: Add authentication cookies if needed to access protected content
  • Max Pages: Set the maximum number of pages to crawl (adjust based on your needs)
chatislav url crawl modal

Step 3: Review and Select URLs

After the crawler completes:

  1. Review the discovered URLs in the results list
  2. Use search to filter through the found URLs
  3. Select/deselect URLs using:
    • Individual checkboxes for specific URLs
    • "Select All" to choose all URLs at once
    • Range selection (From/To) to select URLs by position
    • "URL Only" filter to exclude file downloads
  4. Click "Add" to add selected URLs to your training sources
ULR list

Step 4: Final Training Setup

Once URLs are added to your sources:

  1. Review your URL list - all added URLs will appear in your training sources
  2. Configure individual URL settings if needed (match patterns, language, etc.)
  3. Click "Start Training" to begin the training process
ULR added

Best Practices

URL Selection

  • Start with key pages: Focus on the most important content first
  • Use match patterns: Leverage URL patterns to include/exclude specific sections
  • Review content quality: Ensure selected pages contain relevant, high-quality information

Crawl Settings

  • Reasonable page limits: Don't crawl more pages than necessary to avoid overwhelming the system
  • Use sitemaps when available: Sitemaps provide the most efficient way to discover all relevant pages
  • Test with small batches: Start with a smaller number of pages to test the quality of extracted content

Content Optimization

  • Check accessibility: Ensure URLs are publicly accessible or provide necessary authentication
  • Avoid duplicate content: Remove similar or duplicate pages to improve training efficiency
  • Focus on text content: Pages with primarily text content work best for training