DNS RobotDNS Propagation Checker
HomeDNS LookupWHOIS LookupIP LookupSSL Check
DNS RobotDNS Propagation Checker

Next-generation DNS propagation toolkit

Privacy PolicyTerms of ServiceAbout UsContact

DNS Tools

DNS LookupDomain to IPNS LookupMX LookupCNAME LookupView all

Email Tools

SPF Record CheckerDMARC CheckerDKIM CheckerSMTP Test ToolEmail Header AnalyzerView all

Website Tools

WHOIS LookupDomain AvailabilitySubdomain FinderCMS DetectorLink AnalyzerView all

Network Tools

Ping ToolTraceroutePort CheckerHTTP Headers CheckSSL Certificate CheckView all

IP Tools

IP LookupWhat Is My IPIP Blacklist CheckIP to HostnameASN LookupView all

Utility Tools

QR Code ScannerQR Code GeneratorMorse Code TranslatorText to Binary ConverterSmall Text GeneratorView all
© 2026 DNS Robot. Developed by ❤ Shaik Brothers
All systems operational
Made with
Home/All Tools/Robots.txt Generator

Robots.txt Generator

Create, customize, and download robots.txt files with our free visual editor. Choose from presets, add rules for 20+ crawlers, block AI bots like GPTBot and ClaudeBot, and generate your robots.txt file in seconds.

Free SEO ToolRobots.txt GeneratorAI Bot SupportVisual Editor
Quick Presets

Start with a common configuration and customize as needed

Crawler Rules
Define which paths each crawler can or cannot access
Additional Options
Applied to the wildcard (*) user-agent only. Google ignores crawl-delay.
Generated robots.txt
# robots.txt generated by DNS Robot (https://dnsrobot.net/robots-txt-generator)
# Generated: 2026-02-26T23:58:51.421Z

User-agent: *
Allow: /
1 rule6 lines

What Is a Robots.txt File?

A robots.txt file is a plain text file placed in your website's root directory that instructs search engine crawlers and bots which pages or sections they can access. It follows the Robots Exclusion Protocol, originally proposed in 1994 and formally standardized as RFC 9309 in 2022. Every major search engine — Google, Bing, Yahoo, Yandex, and DuckDuckGo — reads and respects robots.txt directives before crawling your site.

The file is always located at https://yourdomain.com/robots.txt and is the first file crawlers check when they visit your website. Without a robots.txt file, crawlers assume they can access everything. With one, you can control crawl budgets, protect private areas like admin panels and staging environments, prevent duplicate content from being indexed, and — increasingly important — block AI bots from scraping your content for training data.

Our free robots.txt generator lets you create properly formatted robots.txt files using a visual editor — no need to memorize the syntax. Choose from presets, add custom rules for 20+ crawlers, and download your file ready to upload. You can verify your site's HTTP response headers with our HTTP Header Checker and detect your website's CMS platform using the CMS Detector.

Robots.txt generator showing Allow and Disallow rules controlling access for Googlebot, Bingbot, GPTBot, and ClaudeBot
A robots.txt file controls which crawlers can access your website — allow search engines while blocking AI bots and scrapers

How to Create a Robots.txt File

Our robots.txt generator makes creating a properly formatted file easy in four steps:

1

Choose a Preset or Start from Scratch

Select from four quick presets — Allow All, Block All, Standard (blocks /admin/ and /api/), or Block AI Bots — or start with a blank slate and build your rules from scratch.

2

Add Crawler Rules

Select a user-agent (Googlebot, Bingbot, GPTBot, ClaudeBot, etc.) and set Allow or Disallow directives for specific paths. Add as many rules as you need for fine-grained control.

3

Set Sitemap URL and Crawl Delay

Optionally add your XML sitemap URL so crawlers can discover all your pages. Set a crawl-delay value to limit how frequently bots request pages from your server.

4

Download and Upload to Your Server

Preview the generated output, then copy to clipboard or download the robots.txt file. Upload it to your website's root directory so it's accessible at yourdomain.com/robots.txt.

Robots.txt Directives Reference

Every robots.txt file uses a set of directives to communicate with crawlers. Here are the six core directives you need to know:

User-agent

Specifies which crawler the rules apply to. Use * for all bots, or a specific name like Googlebot, Bingbot, or GPTBot. Each rule block starts with a User-agent directive.

User-agent: Googlebot

Disallow

Tells the crawler not to access a specific path or directory. An empty Disallow (Disallow:) means nothing is blocked. This is the most commonly used directive in robots.txt.

Disallow: /admin/

Allow

Explicitly permits access to a path, overriding a broader Disallow rule. Useful for allowing specific files within a blocked directory. Supported by Google and Bing.

Allow: /admin/public/

Crawl-delay

Requests crawlers to wait N seconds between requests. Helps reduce server load. Google ignores this directive (use Search Console instead), but Bing and Yandex respect it.

Crawl-delay: 10

Sitemap

Points crawlers to your XML sitemap for better page discovery. Placed outside any User-agent block. You can include multiple Sitemap directives for separate sitemaps.

Sitemap: https://example.com/sitemap.xml

Host

Historically used by Yandex to specify the preferred domain version (www vs non-www). Yandex deprecated this directive in 2018 in favor of canonical tags and 301 redirects. Rarely needed today.

Host: https://example.com
Robots.txt directives reference for User-agent, Disallow, Allow, Crawl-delay, Sitemap, and Host with syntax examples
The six core robots.txt directives: User-agent, Disallow, Allow, Crawl-delay, Sitemap, and Host

Common Robots.txt Examples

Here are four common robots.txt configurations you can use as starting points. Our generator includes these as one-click presets:

Allow All Crawlers

Allows all bots to crawl your entire site. This is the most permissive configuration.

User-agent: *
Allow: /

Block All Crawlers

Blocks all bots from crawling any page. Useful for staging or development sites.

User-agent: *
Disallow: /

Block AI Bots Only

Allows search engines but blocks AI training crawlers from scraping your content.

User-agent: *
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

WordPress Standard

Common setup for WordPress sites blocking admin, API, and common private paths.

User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-json/
Disallow: /trackback/

Sitemap: https://example.com/sitemap.xml

How to Block AI Bots with Robots.txt

With the rise of AI language models, many website owners want to prevent AI companies from scraping their content for training data. Robots.txt is the primary way to communicate this preference. Major AI companies have created specific user-agent identifiers for their crawlers, and most respect robots.txt directives.

Here are the AI bot user-agents you can block using our robots.txt generator:

GPTBot

OpenAI

Used to crawl pages for GPT model training. Blocking this prevents your content from being used in future GPT models.

ChatGPT-User

OpenAI

Used when ChatGPT fetches pages via Browse mode. Blocking this prevents ChatGPT from reading your content live.

ClaudeBot

Anthropic

Anthropic's web crawler for Claude AI training. Respects robots.txt directives for content exclusion.

Claude-User

Anthropic

Used when Claude fetches pages via user-initiated browsing. Block both ClaudeBot and Claude-User for complete Anthropic coverage.

Google-Extended

Google

Controls whether Google uses your content for AI training (separate from Googlebot). Blocking this stops AI training while keeping search indexing.

CCBot

Common Crawl

Common Crawl's bot that builds open datasets used by many AI companies. Widely used for AI training data.

PerplexityBot

Perplexity

Perplexity AI's web crawler for their AI-powered search engine. Respects robots.txt directives.

Bytespider

ByteDance

ByteDance's crawler used for AI training and content indexing. Associated with TikTok and other ByteDance products.

Use our "Block AI Bots" preset above to add all these rules with one click. Note that blocking AI bots does not affect your search engine rankings — Google-Extended is separate from Googlebot, so blocking AI training does not impact Google Search indexing.

Robots.txt vs Other Access Control Methods

Robots.txt is just one way to control crawler access. Here's how it compares to other methods:

MethodTypeScopeEnforcementBest For
robots.txtText fileSite-wideAdvisoryControlling crawl behavior, blocking sections
.htaccessServer configDirectoryMandatoryHard blocking by IP, user-agent, or pattern
Meta robotsHTML tagPer pageAdvisoryNoindex, nofollow on specific pages
X-Robots-TagHTTP headerPer responseAdvisoryNoindex for PDFs, images, non-HTML files

For most websites, a combination of robots.txt (for crawler guidance) and meta robots tags (for per-page indexing control) provides the best coverage. Use our HTTP Headers tool to check if your server sends X-Robots-Tag headers, and our DNS Lookup to verify your domain's DNS configuration.

Comparison of robots.txt, .htaccess, meta robots, and X-Robots-Tag access control methods for crawlers
Robots.txt vs .htaccess vs meta robots vs X-Robots-Tag — choose the right access control method for your needs

Robots.txt Best Practices

Follow these best practices when creating and maintaining your robots.txt file:

Always Include a Sitemap

Add a Sitemap directive pointing to your XML sitemap. This helps crawlers discover all your important pages, even those with few internal links.

Block Sensitive Directories

Block paths like /admin/, /api/, /private/, /staging/, and /tmp/. While robots.txt isn't a security measure, it prevents these paths from appearing in search results.

Use Specific User-Agent Rules

Instead of blocking everything for all bots, use targeted rules. For example, block AI bots specifically while allowing search engines full access for better SEO.

Test Before Deploying

Use Google Search Console's robots.txt tester to verify your rules work as expected. A single typo can accidentally block your entire site from being indexed.

Don't Block CSS or JavaScript

Googlebot needs access to CSS and JS files to render your pages correctly. Blocking these can hurt your SEO rankings because Google can't see your page as users do.

Keep It Simple and Maintain It

Avoid overly complex rules. Review your robots.txt periodically as your site structure changes. Remove rules for paths that no longer exist.

Related Tools

Complement your robots.txt configuration with these free tools for SEO, security, and website analysis:

HTTP Headers

Check HTTP response headers including security headers, X-Robots-Tag, and caching directives.

CMS Detector

Detect the CMS platform, JavaScript frameworks, and technologies used by any website.

DNS Lookup

Check all DNS records (A, AAAA, CNAME, MX, NS, TXT) for any domain name.

SSL Checker

Verify SSL certificate validity, expiration, and security configuration of any website.

Redirect Checker

Trace HTTP redirect chains and verify 301/302 redirects are configured correctly.

Link Analyzer

Analyze internal and external links on any web page for SEO and broken link detection.

DMARC Generator

Create DMARC records for email authentication with an interactive wizard.

What Is My User Agent

Detect your browser user agent string, OS, device type, and rendering engine.

Frequently Asked Questions About Robots.txt

What is a robots.txt file?

A robots.txt file is a plain text file in your website's root directory that tells crawlers which pages they can and cannot access. It follows the Robots Exclusion Protocol. All major search engines respect robots.txt directives.

How do I create a robots.txt file?

Use our free generator above: choose a preset or add custom rules, set your sitemap URL and crawl-delay, then download the file and upload it to your website's root directory at yourdomain.com/robots.txt.

Where should I place the robots.txt file?

It must be in your website's root directory, accessible at https://yourdomain.com/robots.txt. Each subdomain needs its own robots.txt file. Placing it in a subdirectory won't work.

How do I block AI bots like GPTBot and ClaudeBot?

Add User-agent: GPTBot with Disallow: / for each AI bot. Our generator has a 'Block AI Bots' preset that adds rules for GPTBot, ClaudeBot, Google-Extended, and CCBot with one click.

What is the difference between Allow and Disallow?

Disallow blocks crawlers from a path. Allow explicitly permits access, overriding a broader Disallow rule. When both match, the most specific (longest path) rule wins. When specificity is equal, the Allow directive takes precedence.

Does robots.txt actually block crawlers?

It's advisory, not mandatory. Major search engines respect it, but malicious bots may ignore it. For hard blocking, use .htaccess rules, authentication, or firewalls. Robots.txt is a polite request, not a security measure.

What is crawl-delay in robots.txt?

Crawl-delay asks crawlers to wait N seconds between requests. Useful for servers with limited resources. Google ignores it (use Search Console instead), but Bing and Yandex respect it.

Should I include a Sitemap directive?

Yes, it's a best practice. It helps crawlers discover your XML sitemap. The Sitemap directive goes outside any User-agent block and you can include multiple sitemaps.

Can I use wildcards in robots.txt?

Yes, Google and Bing support * (matches any characters) and $ (matches end of URL). For example, Disallow: /*.pdf$ blocks all URLs ending in .pdf. Not all crawlers support wildcards.

How do I test if my robots.txt is working?

Use Google Search Console's robots.txt Tester, visit yourdomain.com/robots.txt in a browser, or use our HTTP Headers tool to check the response. Google's URL Inspection tool also shows if pages are blocked.