What Is a Robots.txt File?
A robots.txt file is a plain text file placed in your website's root directory that instructs search engine crawlers and bots which pages or sections they can access. It follows the Robots Exclusion Protocol, originally proposed in 1994 and formally standardized as RFC 9309 in 2022. Every major search engine — Google, Bing, Yahoo, Yandex, and DuckDuckGo — reads and respects robots.txt directives before crawling your site.
The file is always located at https://yourdomain.com/robots.txt and is the first file crawlers check when they visit your website. Without a robots.txt file, crawlers assume they can access everything. With one, you can control crawl budgets, protect private areas like admin panels and staging environments, prevent duplicate content from being indexed, and — increasingly important — block AI bots from scraping your content for training data.
Our free robots.txt generator lets you create properly formatted robots.txt files using a visual editor — no need to memorize the syntax. Choose from presets, add custom rules for 20+ crawlers, and download your file ready to upload. You can verify your site's HTTP response headers with our HTTP Header Checker and detect your website's CMS platform using the CMS Detector.

How to Create a Robots.txt File
Our robots.txt generator makes creating a properly formatted file easy in four steps:
Choose a Preset or Start from Scratch
Select from four quick presets — Allow All, Block All, Standard (blocks /admin/ and /api/), or Block AI Bots — or start with a blank slate and build your rules from scratch.
Add Crawler Rules
Select a user-agent (Googlebot, Bingbot, GPTBot, ClaudeBot, etc.) and set Allow or Disallow directives for specific paths. Add as many rules as you need for fine-grained control.
Set Sitemap URL and Crawl Delay
Optionally add your XML sitemap URL so crawlers can discover all your pages. Set a crawl-delay value to limit how frequently bots request pages from your server.
Download and Upload to Your Server
Preview the generated output, then copy to clipboard or download the robots.txt file. Upload it to your website's root directory so it's accessible at yourdomain.com/robots.txt.
Robots.txt Directives Reference
Every robots.txt file uses a set of directives to communicate with crawlers. Here are the six core directives you need to know:
User-agent
Specifies which crawler the rules apply to. Use * for all bots, or a specific name like Googlebot, Bingbot, or GPTBot. Each rule block starts with a User-agent directive.
User-agent: GooglebotDisallow
Tells the crawler not to access a specific path or directory. An empty Disallow (Disallow:) means nothing is blocked. This is the most commonly used directive in robots.txt.
Disallow: /admin/Allow
Explicitly permits access to a path, overriding a broader Disallow rule. Useful for allowing specific files within a blocked directory. Supported by Google and Bing.
Allow: /admin/public/Crawl-delay
Requests crawlers to wait N seconds between requests. Helps reduce server load. Google ignores this directive (use Search Console instead), but Bing and Yandex respect it.
Crawl-delay: 10Sitemap
Points crawlers to your XML sitemap for better page discovery. Placed outside any User-agent block. You can include multiple Sitemap directives for separate sitemaps.
Sitemap: https://example.com/sitemap.xmlHost
Historically used by Yandex to specify the preferred domain version (www vs non-www). Yandex deprecated this directive in 2018 in favor of canonical tags and 301 redirects. Rarely needed today.
Host: https://example.com
Common Robots.txt Examples
Here are four common robots.txt configurations you can use as starting points. Our generator includes these as one-click presets:
Allow All Crawlers
Allows all bots to crawl your entire site. This is the most permissive configuration.
User-agent: * Allow: /
Block All Crawlers
Blocks all bots from crawling any page. Useful for staging or development sites.
User-agent: * Disallow: /
Block AI Bots Only
Allows search engines but blocks AI training crawlers from scraping your content.
User-agent: * Allow: / User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: CCBot Disallow: /
WordPress Standard
Common setup for WordPress sites blocking admin, API, and common private paths.
User-agent: * Allow: / Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-json/ Disallow: /trackback/ Sitemap: https://example.com/sitemap.xml
How to Block AI Bots with Robots.txt
With the rise of AI language models, many website owners want to prevent AI companies from scraping their content for training data. Robots.txt is the primary way to communicate this preference. Major AI companies have created specific user-agent identifiers for their crawlers, and most respect robots.txt directives.
Here are the AI bot user-agents you can block using our robots.txt generator:
GPTBot
OpenAIUsed to crawl pages for GPT model training. Blocking this prevents your content from being used in future GPT models.
ChatGPT-User
OpenAIUsed when ChatGPT fetches pages via Browse mode. Blocking this prevents ChatGPT from reading your content live.
ClaudeBot
AnthropicAnthropic's web crawler for Claude AI training. Respects robots.txt directives for content exclusion.
Claude-User
AnthropicUsed when Claude fetches pages via user-initiated browsing. Block both ClaudeBot and Claude-User for complete Anthropic coverage.
Google-Extended
GoogleControls whether Google uses your content for AI training (separate from Googlebot). Blocking this stops AI training while keeping search indexing.
CCBot
Common CrawlCommon Crawl's bot that builds open datasets used by many AI companies. Widely used for AI training data.
PerplexityBot
PerplexityPerplexity AI's web crawler for their AI-powered search engine. Respects robots.txt directives.
Bytespider
ByteDanceByteDance's crawler used for AI training and content indexing. Associated with TikTok and other ByteDance products.
Use our "Block AI Bots" preset above to add all these rules with one click. Note that blocking AI bots does not affect your search engine rankings — Google-Extended is separate from Googlebot, so blocking AI training does not impact Google Search indexing.
Robots.txt vs Other Access Control Methods
Robots.txt is just one way to control crawler access. Here's how it compares to other methods:
| Method | Type | Scope | Enforcement | Best For |
|---|---|---|---|---|
| robots.txt | Text file | Site-wide | Advisory | Controlling crawl behavior, blocking sections |
| .htaccess | Server config | Directory | Mandatory | Hard blocking by IP, user-agent, or pattern |
| Meta robots | HTML tag | Per page | Advisory | Noindex, nofollow on specific pages |
| X-Robots-Tag | HTTP header | Per response | Advisory | Noindex for PDFs, images, non-HTML files |
For most websites, a combination of robots.txt (for crawler guidance) and meta robots tags (for per-page indexing control) provides the best coverage. Use our HTTP Headers tool to check if your server sends X-Robots-Tag headers, and our DNS Lookup to verify your domain's DNS configuration.

Robots.txt Best Practices
Follow these best practices when creating and maintaining your robots.txt file:
Always Include a Sitemap
Add a Sitemap directive pointing to your XML sitemap. This helps crawlers discover all your important pages, even those with few internal links.
Block Sensitive Directories
Block paths like /admin/, /api/, /private/, /staging/, and /tmp/. While robots.txt isn't a security measure, it prevents these paths from appearing in search results.
Use Specific User-Agent Rules
Instead of blocking everything for all bots, use targeted rules. For example, block AI bots specifically while allowing search engines full access for better SEO.
Test Before Deploying
Use Google Search Console's robots.txt tester to verify your rules work as expected. A single typo can accidentally block your entire site from being indexed.
Don't Block CSS or JavaScript
Googlebot needs access to CSS and JS files to render your pages correctly. Blocking these can hurt your SEO rankings because Google can't see your page as users do.
Keep It Simple and Maintain It
Avoid overly complex rules. Review your robots.txt periodically as your site structure changes. Remove rules for paths that no longer exist.
Related Tools
Complement your robots.txt configuration with these free tools for SEO, security, and website analysis:
Check HTTP response headers including security headers, X-Robots-Tag, and caching directives.
Detect the CMS platform, JavaScript frameworks, and technologies used by any website.
Check all DNS records (A, AAAA, CNAME, MX, NS, TXT) for any domain name.
Verify SSL certificate validity, expiration, and security configuration of any website.
Trace HTTP redirect chains and verify 301/302 redirects are configured correctly.
Analyze internal and external links on any web page for SEO and broken link detection.
Create DMARC records for email authentication with an interactive wizard.
Detect your browser user agent string, OS, device type, and rendering engine.
Frequently Asked Questions About Robots.txt
What is a robots.txt file?
A robots.txt file is a plain text file in your website's root directory that tells crawlers which pages they can and cannot access. It follows the Robots Exclusion Protocol. All major search engines respect robots.txt directives.
How do I create a robots.txt file?
Use our free generator above: choose a preset or add custom rules, set your sitemap URL and crawl-delay, then download the file and upload it to your website's root directory at yourdomain.com/robots.txt.
Where should I place the robots.txt file?
It must be in your website's root directory, accessible at https://yourdomain.com/robots.txt. Each subdomain needs its own robots.txt file. Placing it in a subdirectory won't work.
How do I block AI bots like GPTBot and ClaudeBot?
Add User-agent: GPTBot with Disallow: / for each AI bot. Our generator has a 'Block AI Bots' preset that adds rules for GPTBot, ClaudeBot, Google-Extended, and CCBot with one click.
What is the difference between Allow and Disallow?
Disallow blocks crawlers from a path. Allow explicitly permits access, overriding a broader Disallow rule. When both match, the most specific (longest path) rule wins. When specificity is equal, the Allow directive takes precedence.
Does robots.txt actually block crawlers?
It's advisory, not mandatory. Major search engines respect it, but malicious bots may ignore it. For hard blocking, use .htaccess rules, authentication, or firewalls. Robots.txt is a polite request, not a security measure.
What is crawl-delay in robots.txt?
Crawl-delay asks crawlers to wait N seconds between requests. Useful for servers with limited resources. Google ignores it (use Search Console instead), but Bing and Yandex respect it.
Should I include a Sitemap directive?
Yes, it's a best practice. It helps crawlers discover your XML sitemap. The Sitemap directive goes outside any User-agent block and you can include multiple sitemaps.
Can I use wildcards in robots.txt?
Yes, Google and Bing support * (matches any characters) and $ (matches end of URL). For example, Disallow: /*.pdf$ blocks all URLs ending in .pdf. Not all crawlers support wildcards.
How do I test if my robots.txt is working?
Use Google Search Console's robots.txt Tester, visit yourdomain.com/robots.txt in a browser, or use our HTTP Headers tool to check the response. Google's URL Inspection tool also shows if pages are blocked.