1. Help center
AI Shield: How to block AI training crawlers on Big Cartel

Your art, designs, and product images are the heart of your shop. That’s why Big Cartel includes AI Shield - a simple way to protect your creative work from being used to train generative AI models, while keeping your shop visible to real customers.

With a single toggle, AI Shield blocks major AI training crawlers from accessing your site. We handle the technical details behind the scenes, including maintaining and updating the crawler list, so you don’t have to.

This feature was built in direct response to feedback from artists and makers on Big Cartel. In our research, 83.3% of sellers told us they want to prevent their work from being used to train AI systems. AI Shield gives you that control - without sacrificing discoverability, search visibility, or growth.

AI Shield is available on all Big Cartel plans, including our forever-free Gold option, because protecting creative work shouldn’t come at an extra cost.

In the sections below, we’ll explain how AI training differs from AI-powered discovery tools, what AI Shield blocks (and what it doesn’t), and how to enable it for your shop.

What are AI training crawlers?

AI training crawlers are automated bots that scan the web to collect content used to train generative AI models. When companies like OpenAI train models such as GPT-5, or Google trains models like Gemini, they rely on crawlers to gather large volumes of text, images, and other creative work from across the internet - including, potentially, from online shops like yours.

Here’s how it works:

  1. A training crawler automatically visits your shop
  2. It reads and collects content such as product descriptions, photography, designs, and other creative assets
  3. That content is stored as training data
  4. AI companies use this data to teach models how to understand language, recognize images, and generate new content

AI Training Crawlers vs. AI Search Crawlers: What’s the difference?

Not all AI crawlers do the same thing. There are two distinct types, and they affect your shop in very different ways.

AI Training Crawlers (what AI Shield blocks)

Purpose: Collect content to include in datasets used to train generative AI models.

Examples: GPTBot, ClaudeBot, CCBot

What they do: These crawlers systematically scan the web to gather text, images, and other creative work that may be used to train future versions of AI models.

What this means for your content: Your work could become part of a training dataset, with no direct benefit to your shop or visibility.

AI Search Crawlers (what AI Shield does not block)

Purpose: Power AI-assisted search and discovery experiences in tools like Google, Bing, and ChatGPT.

Examples: OAI-SearchBot (ChatGPT Search), Claude-SearchBot

What they do: These crawlers index your shop so your products and pages can appear in AI-powered search results—typically with a link back to your site.

What this means for your content: Your content is indexed for discovery, helping potential customers find your shop, but it isn’t used to train AI models.

Which bots does AI Shield block?

AI Shield blocks ten AI training crawlers that are commonly used to collect web content for training generative AI models.

Major AI Companies

GPTBot (OpenAI)

  • Used to train ChatGPT, GPT-4, GPT-5, and other OpenAI models

ClaudeBot (Anthropic)

  • Used to train Claude AI models
  • Continuous web crawling for training data

Google-Extended (Google)

  • Controls whether Google can use your content to train Gemini, Bard, and Vertex AI

Applebot-Extended (Apple)

  • Controls whether Apple can use your content to train Apple Intelligence features

Meta-ExternalAgent (Meta)

  • Used to train Meta's AI models (Llama, Meta AI)

CCBot (Common Crawl)

  • Non-profit that creates open datasets of web content
  • These datasets are widely used by AI researchers and companies for training

Bytespider (ByteDance/TikTok)

  • Used for AI model training

AI2Bot

  • Allen Institute for AI's crawler used to collect data for training open source AI models like OLMo and the Dolma dataset

cohere-training-data-crawler

  • Specifically designed to collect data for training Cohere's large language models.

Kangaroo Bot

  • Used by Kangaroo LLM to download training data for AI models tailored to Australian language and culture

Why did we choose these specific bots?

We were intentional about which crawlers to include in AI Shield. Our goal was to give you meaningful protection from AI training without compromising your shop’s visibility or growth.

We block: Pure Training Crawlers

These crawlers are designed specifically to collect content for AI training datasets. Blocking them helps protect your creative work, with no impact on search visibility or discoverability.

Examples include:

  • GPTBot, which is used only to train OpenAI models
  • ClaudeBot, which is used only to train Anthropic’s AI models
  • CCBot, which collects data for large training datasets

Why this works
Because these crawlers are focused solely on training, blocking them gives you control over how your content is used—without trade-offs.

We don’t block: Dual-Purpose Crawlers

Some crawlers support both traditional search and AI-powered features. We intentionally do not block these, because they play a critical role in helping customers find your shop.

Googlebot (not blocked)

  • Powers Google Search, which is essential for SEO
  • Also supports Google’s AI features, such as AI Overviews
  • Blocking it would remove your shop from Google Search entirely

Bingbot (not blocked)

  • Powers Bing Search
  • Also supports Bing Copilot and other AI experiences
  • Blocking it would remove your shop from Bing Search

Why we don’t block them
With these crawlers, search and AI functionality can’t be separated. Blocking them would significantly reduce your shop’s discoverability, which runs counter to helping your business grow.

What AI Shield does not block

AI Shield is designed to protect your content without disrupting the tools that help your shop get found, shared, and measured.

Search engines still work

  • Googlebot – Your shop continues to appear in Google Search
  • Bingbot – Your shop continues to appear in Bing Search
  • DuckDuckGoBot – Your shop continues to appear in DuckDuckGo

Your SEO and search visibility are not affected.

Social media previews still work

  • LinkedInBot – Link previews on LinkedIn continue to display
  • Twitterbot – Link previews on X (formerly Twitter) continue to display
  • facebookexternalhit – Link previews on Facebook and Instagram continue to display

Sharing your shop on social platforms works as expected.

Analytics and tools still work

  • SemrushBot, AhrefsBot – SEO and site analysis tools continue to function
  • Google Analytics – Traffic and behavior tracking remains active
  • Monitoring services – Uptime and performance monitoring continue uninterrupted

What to Know

This is an Honor System

AI Shield uses robots.txt, an internet standard that's been around since 1994. Here's how it works:

  1. When a bot wants to crawl your site, it first checks your robots.txt file
  2. If the bot sees it's been blocked, it should respect that and not crawl

There's no enforcement mechanism, but major companies comply.

It's Not Retroactive

If AI companies already trained on your content before you enabled AI Shield, that data may already be in their models. This only prevents future crawling.

It May Not Catch Everything

New AI crawlers appear regularly. We keep our blocklist updated, but there may be a delay between when a new AI company launches a crawler and when we add it to AI Shield.

How to Enable AI Shield

  1. Go to Shop Settings.
  2. Find the AI Shield toggle.
  3. Toggle it On.

Once enabled, your robots.txt file will automatically include rules blocking the 7 AI training crawlers listed above.

Need More Help?

If you have questions about AI Shield or want to report a bot that's not respecting your robots.txt file, please contact our support team.

AI Shield FAQs

Will this hurt my SEO?

No. AI Shield only blocks AI training crawlers. Search engines like Google, Bing, and DuckDuckGo continue crawling your shop normally. Your search rankings are not affected.

Will my products still show up in Google?

Yes. We block Google-Extended (the AI training control), not Googlebot (the search crawler). Your products will continue to appear in Google Search results.

Yes. Social media bots like LinkedInBot and facebookexternalhit are not affected. When you share your shop on Instagram, Facebook, LinkedIn, etc., the previews will still work.

Does this guarantee protection?

No. This relies on bots respecting the robots.txt standard. Most major AI companies comply. And, while atypical, some bots may ignore it.

What if I want my content used for AI training?

Some sellers choose to contribute their content to AI training—whether to help advance the technology, gain broader exposure for their creative work, or have their ideas and expertise inform AI responses. If that aligns with your goals, simply keep AI Shield turned off and your shop will remain accessible to all crawlers.

Why not just block all AI bots?

Our approach is to help our sellers maximize sales while still giving them the choice whether their data is being used to train AI models. Simply put, we block AI training crawlers while keeping the search bots that bring traffic to your shop.

How helpful was this article: