How to Tell If a Website Was Built with AI
A practical guide to detecting AI-generated websites. Learn the telltale signs in code, content, and design that reveal whether a site was built by Claude, GPT, v0, Bolt, or Cursor.
Somewhere between 2023 and 2025, the baseline for “looks professional” shifted. A solo founder with zero front-end experience can now produce a polished SaaS landing page in an afternoon. Agencies are using AI to deliver client work in hours that previously took weeks. The question is no longer whether AI is involved — it almost always is, at least partially — but to what degree, and which tools were used.
That question has real stakes in several contexts:
M&A and due diligence. A startup claiming deep technical differentiation, whose entire codebase traces back to a single Cursor session with GPT-4o, is a different risk profile than one with years of hand-crafted code. Investors and acquirers are starting to ask.
Agency verification. A client paying $50,000 for a “custom-built” website deserves to know if they’re paying for 40 hours of original work or 40 hours of prompt engineering and cleanup. Neither is inherently bad, but the value proposition is different.
Competitive intelligence. Understanding whether a competitor’s site was launched via v0 and shadcn — versus a custom React build — tells you something about their engineering velocity and team composition.
Content authenticity. Publishers, journalism outlets, and professional networks increasingly need to distinguish human-written copy from AI-generated filler optimized purely for search.
This guide covers the signals that actually matter — at the code level, the content level, and the tool/builder level. At the end, we cover how Reconix automates all of it from a single URL lookup.
Code Signals
Tailwind Density and Utility Patterns
Human developers who adopt Tailwind gradually develop preferences: they tend to extract repeating utility sequences into components, use @apply for shared patterns, and leave traces of their reasoning in class ordering. AI-generated Tailwind code has a different fingerprint.
LLMs produce exhaustively verbose utility strings. A human might write class="btn" and define the styles once. An LLM writes class="inline-flex items-center justify-center rounded-md text-sm font-medium ring-offset-background transition-colors focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 disabled:pointer-events-none disabled:opacity-50" — every time, inline, from scratch. This pattern is particularly diagnostic because it reflects how the LLM was trained: on shadcn/ui component source, which uses exactly this style.
Another tell: breakpoint completeness. Humans tend to add responsive modifiers only where needed. AI tools often add sm:, md:, and lg: variants uniformly, even when the values are identical, because the training data includes comprehensive responsive patterns.
ARIA Attribute Completeness
Real-world codebases have inconsistent accessibility — developers add aria-label to some interactive elements and forget others. AI-generated code applies ARIA attributes with mechanical consistency: every <button> has aria-label, every icon has aria-hidden="true", every form field has aria-describedby. It’s more accessible than average human code, but unnaturally uniform.
When you see role="img" on every decorative SVG, aria-live="polite" on every status region, and aria-invalid wired up on every form input — even in a relatively simple page — you’re likely looking at AI output.
Comment Style and Density
Human comments explain why something is done a certain way (”// Edge case: Safari 15 doesn’t support this CSS property”). AI comments describe what the code does (”// Render the user profile section”). The distinction is subtle but consistent.
AI-generated code also has unnaturally uniform comment density — roughly one comment per logical block, always complete sentences, always with a capital letter and period. Human code either has no comments or clusters of them around the tricky parts.
TypeScript Patterns
Claude in particular produces highly defensive TypeScript: explicit return types on every function, as const assertions for literal arrays, early null checks before any property access, verbose generic constraints. This is good practice, but the uniformity is diagnostic. A codebase that applies readonly to every array parameter, uses satisfies for every object literal type check, and never has a single // @ts-ignore is almost certainly AI-assisted.
GPT-4-family models tend toward slightly looser TypeScript: more implicit any, more type assertions (as SomeType), less generic precision. The difference reflects their different training data compositions.
Component and File Naming
Claude tends toward long, fully-descriptive names: UserProfileAvatarWithDropdownMenu.tsx, useIntersectionObserverForLazyLoading.ts. GPT tends toward shorter, more conventional names: UserAvatar.tsx, useLazyLoad.ts. Copilot mirrors the codebase’s existing conventions most faithfully because it operates inline.
v0 has a very distinctive pattern: it generates components/ui/ directories containing shadcn-style primitives (button.tsx, card.tsx, dialog.tsx) alongside page-level components that import from them. The presence of this exact directory structure is a strong v0 signal.
Content Signals
Filler Phrases and Hedging Language
AI language models learned from a vast corpus of content that rewards completeness over specificity. This produces characteristic hedging: “In today’s rapidly evolving landscape…”, “It’s worth noting that…”, “This is particularly important because…”, “Let’s explore…”.
These phrases aren’t wrong — they’re just padding. Human writers with something specific to say rarely need them. AI writers use them as discourse connectors when the actual content doesn’t provide enough natural flow.
A quick test: remove every sentence that starts with “It’s important to” or “In conclusion” and see if any meaning is lost. If none is, the content was almost certainly generated.
Adverb Density
AI-generated text uses adverbs at roughly 2-3x the rate of expert human writing. “Seamlessly integrate”, “effortlessly scale”, “comprehensively covers”, “carefully considers” — these modifier stacks are a strong signal. Professional human writers tend to prefer strong verbs over weak verbs with adverb props.
Em Dash Overuse
Claude in particular — and this is well-documented — has a strong preference for em dashes as a structural device. Content with multiple em dashes per paragraph, used to introduce asides and elaborations, has a high probability of being Claude-generated. GPT models use em dashes less aggressively, preferring comma clauses. This single signal has surprisingly high precision.
Sentence Length Uniformity
Human writers naturally vary sentence length — short punchy statements followed by longer explanatory ones. AI-generated paragraphs tend toward uniform medium-length sentences. A paragraph where every sentence is 15-25 words long, with similar syntactic structure, is a statistical anomaly in human writing.
List Completeness
AI models have a strong tendency to produce exhaustive lists. Ask an AI for “reasons to use X” and it generates five; ask a human and they give you one or two genuine ones. Marketing copy that enumerates exactly five or seven perfectly parallel bullet points, where each starts with a bolded keyword, is textbook AI formatting.
LLM-Specific Signatures
Different foundation models have measurable stylistic fingerprints:
Claude (Anthropic). Long variable and function names. Extensive inline documentation. Defensive TypeScript with explicit types everywhere. Em dash preference in prose. Methodical structure that addresses edge cases proactively. Code that reads like it was written by someone who had read the entire TypeScript handbook twice.
GPT-4 / GPT-4o (OpenAI). More concise naming conventions. Template-matching behavior — the output often closely resembles the most common example of that pattern in the training data. Less defensive error handling than Claude. In prose, a tendency toward numbered lists and parallel structure. More likely to leave TODO comments.
Gemini (Google). Less distinctive than Claude or GPT in current versions. Tends toward verbose function signatures with many parameters. In content, slightly more formal register than GPT.
Copilot / inline completion tools. Much harder to detect because Copilot adapts to the existing code style. The signal is usually in the completion of repetitive patterns — boilerplate that would be tedious to write but is trivially generated, present in bulk.
AI Builder Markers
v0 (Vercel)
v0 generates React components using shadcn/ui primitives over Radix UI. The diagnostic markers are:
- A
components/ui/directory containing shadcn-standard files (button.tsx,card.tsx,badge.tsx,separator.tsx) - Import statements like
import { Button } from "@/components/ui/button"with the@/path alias cn()utility usage everywhere (the shadcn class merging helper fromclsx+tailwind-merge)- The
class-variance-authoritypackage inpackage.json(CVA is the shadcn variant system) - Components with
interface Props extends React.HTMLAttributes<HTMLDivElement>patterns
The presence of 5+ shadcn components with the exact shadcn implementation (not just the same names) is a near-certain v0 signal.
Bolt / StackBlitz WebContainer
Bolt generates full-stack applications in browser sandboxes. Its markers:
vite.config.tswith the exact StackBlitz WebContainer default configuration- Specific npm scripts matching Bolt’s scaffolding templates
- A
project.boltor.bolt/configuration directory - Comments referencing “WebContainer” or the specific template names Bolt uses
Cursor and IDE AI Tools
Cursor-generated code is harder to fingerprint than builder tools because it operates at the file level rather than the project scaffold level. The strongest signals are behavioral rather than structural:
- Inconsistent style between files (each file was generated separately)
- Long
// TODOcomments that describe planned but unimplemented features in first-person plural (“we need to handle…”) - Unusual combinations of dependencies that wouldn’t coexist in a human-designed architecture
- Test files that have 100% coverage of happy paths but zero coverage of error cases (AI testing behavior)
How Reconix Automates This
Manually checking these signals across a full website takes hours. Reconix runs 37 detection heuristics across both code and content dimensions simultaneously, assigns weighted confidence scores, and attributes the output to specific tools when the signals are strong enough.
The AI code detection module looks at Tailwind density, ARIA completeness, TypeScript defensiveness, comment style, naming convention length, and 14 other signals. The content module analyzes filler phrase density, adverb ratios, sentence length variance, em dash frequency, and hedging language patterns. Together they produce a 0-100% confidence score with a verdict from “Almost Certainly Human” to “Almost Certainly AI.”
For tool attribution, Reconix checks for v0’s shadcn footprint, Bolt’s WebContainer config, Cursor’s behavioral inconsistencies, and the stylistic fingerprints of each major foundation model. A lookup of a typical landing page takes under 10 seconds.
What to Do With the Findings
Detection is a starting point, not a verdict. A high AI confidence score has different implications depending on context:
- For due diligence: Follow up with questions about the development process. AI-assisted work isn’t disqualifying, but undisclosed AI authorship of code that was represented as proprietary might be.
- For agency verification: Use the analysis as an opening for a conversation about process, not a gotcha. Many agencies use AI legitimately; the question is whether the result meets your requirements.
- For content: AI-generated content that is accurate, useful, and maintained is not inherently inferior. AI-generated content that is generic, factually thin, and clearly not revised by a human expert is a quality signal worth acting on.
- For competitive analysis: Tool signatures tell you about engineering velocity and team composition. A competitor running entirely on v0 scaffolding and GPT-generated copy has a different scaling profile than one with a custom codebase.
The underlying question isn’t “was AI involved?” — it almost certainly was. The useful question is: “does the output meet the standard being claimed, and is the process transparent?” Detection gives you the evidence to ask that question with specifics.