Train on your website
Train on Your Website
Train your AI agent using web-based content from any website. You have two flexible approaches to add URLs for training:
Quick Start: Add URLs Directly
The fastest way to begin training is by adding URLs directly:
- Enter the website URL you want to train on (must start with
https://
) - Click the "+" button to add a new URL source
- Click "Start Training" to begin immediately
This method is perfect when you know exactly which pages you want to include and want to start training right away.

Advanced: Website Crawling
For more comprehensive training, use the website crawler to automatically discover and select multiple pages from a website:
Step 1: Enter Base URL
- Enter the base URL of the website you want to crawl
- Click "Crawl Website" to open the crawler settings

Step 2: Configure Crawl Settings
The crawler offers several configuration options:
Sitemap Options
- Use Sitemap: Enable this to automatically find and use the website's sitemap
- Custom Sitemap URL: Provide a specific sitemap URL if you know it
- If no sitemap is provided, the system will attempt to guess the sitemap location
Crawling Options
- Enable Crawling: Toggle this to crawl linked pages from the base URL
- URL Match Patterns: Specify patterns to match specific URLs (e.g.,
/blog/*
,/docs/*
) - Cookies: Add authentication cookies if needed to access protected content
- Max Pages: Set the maximum number of pages to crawl (adjust based on your needs)

Step 3: Review and Select URLs
After the crawler completes:
- Review the discovered URLs in the results list
- Use search to filter through the found URLs
- Select/deselect URLs using:
- Individual checkboxes for specific URLs
- "Select All" to choose all URLs at once
- Range selection (From/To) to select URLs by position
- "URL Only" filter to exclude file downloads
- Click "Add" to add selected URLs to your training sources

Step 4: Final Training Setup
Once URLs are added to your sources:
- Review your URL list - all added URLs will appear in your training sources
- Configure individual URL settings if needed (match patterns, language, etc.)
- Click "Start Training" to begin the training process

Best Practices
URL Selection
- Start with key pages: Focus on the most important content first
- Use match patterns: Leverage URL patterns to include/exclude specific sections
- Review content quality: Ensure selected pages contain relevant, high-quality information
Crawl Settings
- Reasonable page limits: Don't crawl more pages than necessary to avoid overwhelming the system
- Use sitemaps when available: Sitemaps provide the most efficient way to discover all relevant pages
- Test with small batches: Start with a smaller number of pages to test the quality of extracted content
Content Optimization
- Check accessibility: Ensure URLs are publicly accessible or provide necessary authentication
- Avoid duplicate content: Remove similar or duplicate pages to improve training efficiency
- Focus on text content: Pages with primarily text content work best for training