Most technical SEO audits are surface-level reports from Screaming Frog dressed up as strategy. Someone runs a crawl, exports a spreadsheet with 4,000 rows, highlights the red cells, and calls it an audit.
That’s not an audit. It’s a report.
A real technical SEO audit starts with a question: why isn’t this site performing as well as it should? And it answers that question by looking at signals that crawl tools miss entirely - server logs, real user performance data at page level, equity flow through internal links, and how Googlebot actually behaves on the site vs. how you think it does.
After 18 years of doing this, I can tell you that the highest-impact findings come from exactly those places. Not from the 400-row list of meta descriptions over 160 characters.
This guide covers the complete framework I use. It’s not generic. It contains specifics that will either save you significant time or help you see problems in your own site you’ve been missing.
Why Technical SEO Is the Foundation
Content and links are the two most-cited ranking factors in SEO. Both of them require a functional technical foundation to deliver any value.
Imagine a site with 200 high-quality articles and 500 referring domains. If Googlebot can’t crawl 30% of the site due to misconfigured robots.txt rules, that content doesn’t exist to Google. If canonical tags are implemented incorrectly across a product catalog, the link equity pointing at those pages is being diluted across duplicate URLs. If Core Web Vitals scores are poor enough to trigger Google’s page experience signals, rankings that should be at position 3 sit at position 8.
Technical SEO isn’t glamorous. Clients rarely ask for it because you can’t easily explain what fixing a redirect chain “looks like.” But in every account I’ve taken over that was underperforming, technical debt was contributing to the gap. Always.
Fix the technical foundation first. Then build on it.
The Problem with Crawl-Tool-Only Audits
Screaming Frog, Sitebulb, and similar crawl tools are essential. I use them on every audit. But they have a fundamental limitation: they simulate how a crawler might visit your site, not how Googlebot actually visits it.
Server log analysis tells you what Googlebot actually did. The difference is significant.
From server logs, you can answer questions that no crawl tool can:
- Which pages is Googlebot visiting, and how often?
- Is Googlebot spending its crawl budget on pages that matter?
- Are there pages that receive no Googlebot visits at all despite being in your sitemap?
- Is there a discrepancy between pages you think are important and the pages Google is actually crawling?
On one client site - a financial news platform - server log analysis revealed that Googlebot was spending 40% of its crawl budget on pagination pages (/page/2/, /page/3/) and parameter URLs generated by a filtering system. The high-value editorial content was being crawled infrequently. Fixing this was a more meaningful intervention than anything a crawl tool would have surfaced.
Tools for server log analysis include Screaming Frog Log Analyzer and Splunk for larger log files. If you’re on a managed host that doesn’t expose server logs, this is a limitation worth raising with your hosting team.
Crawl Budget: What It Is and How to Audit It
Crawl budget is the number of pages Googlebot will crawl on your site within a given time period. For small sites (under a few thousand pages), it’s rarely a constraint. For large ecommerce sites, it’s one of the most consequential factors in whether new content gets indexed promptly.
Google determines crawl budget based on two things: crawl rate limit (how fast Googlebot crawls without overloading your server) and crawl demand (how popular and recently updated your pages are). You influence both.
To audit crawl budget, you need three data sources working together.
First, Google Search Console. Go to Settings > Crawl Stats. This shows you how many pages Googlebot crawled per day over the last 90 days, what response codes it encountered, and what file types it crawled. A site where 20% of crawl requests return 404s is wasting significant crawl budget on dead URLs.
Second, your server logs. Cross-reference which URLs Googlebot is hitting most frequently. If it’s spending budget on low-value URLs (faceted navigation, parameter pages, internal search results), that’s crawl budget that should be redirected to content that matters.
Third, your sitemap. Your XML sitemap is a signal to Google about what you consider important. If it contains 404 pages, redirected URLs, or noindexed pages, you’re sending Google confused signals. Clean sitemaps that contain only canonicalized, indexable, 200-status pages are non-negotiable.
The most common crawl budget killers I find on ecommerce sites: faceted navigation (by a wide margin), session IDs appended to URLs, parameter-based filtering with no canonical handling, and print-version URLs. I cover faceted navigation in more detail below.
Core Web Vitals: Site Average Means Nothing
Google’s Core Web Vitals are three metrics that measure real-user experience: Largest Contentful Paint (LCP), Cumulative Layout Shift (CLS), and Interaction to Next Paint (INP, which replaced FID in March 2024).
Here’s the critical thing most teams get wrong: they check Core Web Vitals at the domain level and report a single score. That’s useless for diagnosis.
Core Web Vitals are evaluated at the page-group level. Google’s page experience signals apply to individual URL patterns, not the site as a whole. A product page template with a slow-loading hero image affects every product page on the site. A homepage that passes CWV doesn’t help you if the 10,000 category pages are failing.
Use Google Search Console’s Core Web Vitals report to see page-group level data. Use PageSpeed Insights for individual URL testing with real-user field data (from Chrome UX Report) alongside lab data.
What each metric means in practice:
LCP (Largest Contentful Paint): The time until the largest visible element loads. Target: under 2.5 seconds. The most common causes of poor LCP are unoptimized hero images (wrong format, missing preload hint, served from a slow CDN), render-blocking resources, and slow server response times (TTFB over 800ms). On Shopify and WooCommerce sites, third-party scripts loaded in the <head> are frequent culprits.
CLS (Cumulative Layout Shift): Visual instability - elements jumping around as the page loads. Target: under 0.1. Common causes include images without explicit width and height attributes, dynamically injected content above the fold, and web fonts causing layout shifts on load. CLS is often introduced by ad networks and cookie consent banners that load after the initial render.
INP (Interaction to Next Paint): Responsiveness to user interactions. Target: under 200ms. This is the newest and trickiest to fix. Long JavaScript tasks that block the main thread are the primary cause. On Shopify, apps that run heavy JavaScript on interaction are frequent offenders.
Fix LCP first. It has the most direct relationship with rankings and the most actionable fixes.
Canonical Implementation: Scale-Level Mistakes
Canonical tags tell Google which version of a URL you consider the “master” version. Get them wrong at scale and you’re diluting link equity across dozens or hundreds of duplicate URLs.
The canonical audit checklist:
Self-referencing canonicals. Every indexable page should have a canonical tag pointing to itself. Pages without a canonical tag leave the choice to Google, which may select a URL you don’t want.
Cross-domain canonicals. If you syndicate content, or have content that appears on multiple domains, canonical tags should point to the original source.
Inconsistent canonicals. If page A canonicals to page B, but page B canonicals back to page A, you’ve created a canonical loop. Google will likely ignore both.
Canonicals conflicting with hreflang. If you have multilingual content, the hreflang tags and canonical tags must be consistent. A canonical pointing from the Spanish version of a page to the English version tells Google the Spanish page is a duplicate of the English one - which breaks your international targeting.
The Shopify-specific canonical problem. Shopify generates two valid URLs for every product: /products/product-name and /collections/collection-name/products/product-name. Shopify canonicals the collection URL to the standalone product URL by default. This mostly works, but can create issues when products appear in multiple collections, because the canonical URL must be consistent regardless of the entry path. I cover this in more depth in the complete Shopify SEO guide.
Audit tool: run a Screaming Frog crawl, export the Canonicals report, and filter for any canonical that isn’t a self-referencing URL pointing to a 200-status page. Every exception needs manual review.
Redirect Chains and Redirect Loops
A redirect chain is when URL A redirects to URL B, which redirects to URL C. The destination is correct, but the path wastes crawl budget and leaks link equity at each hop.
PageRank (the underlying link equity signal) passes through 301 redirects, but not at 100%. Google hasn’t publicly confirmed the exact dilution rate, but Ahrefs testing suggests measurable equity loss occurs with each redirect hop. More importantly, if external sites link to the original URL (A), and that URL has a chain, the equity they’re sending takes two hops before it reaches the live page.
The most common cause of redirect chains: URL migrations that weren’t audited. Site goes through a redesign, old-to-new redirects are set up. Then another redesign happens, and instead of updating the old redirects to point directly to the new final destination, another layer is added on top.
How to find them: Screaming Frog will flag redirect chains in its Redirects report. Export it, filter for chains of 2 or more hops, and update each one to redirect directly from the original URL to the final destination.
Redirect loops - where URL A redirects to B and B redirects back to A - cause error responses and need fixing immediately.
Schema Markup: Which Schemas Actually Matter
Schema markup is machine-readable structured data that helps search engines and AI systems understand your content. It doesn’t directly cause rankings to improve, but it enables rich results (star ratings, prices, FAQ accordions, breadcrumbs) that increase click-through rates, and it’s increasingly important for GEO and AI search visibility.
The schema audit covers two questions: what’s implemented, and is it implemented correctly?
Verification tool: Google’s Rich Results Test for individual pages, Schema Markup Validator (schema.org) for syntax checking, and Google Search Console’s Rich Results report for site-wide status.
Schemas to audit by site type:
For all sites: Organization (homepage), BreadcrumbList (all pages), SiteLinks Searchbox (optional but useful for branded search).
For content/blog sites: Article or BlogPosting (with author, datePublished, dateModified, publisher properties populated), FAQPage on FAQ sections.
For ecommerce sites: Product (with name, description, image, sku, offers, aggregateRating properties), Offer (with price, priceCurrency, availability, url), AggregateRating (requires real reviews - do not fabricate).
For local businesses: LocalBusiness with address, telephone, openingHours, geo.
Common schema errors I find on audits: missing priceCurrency on Offer schema (required by Google), AggregateRating with fewer than the minimum review count Google requires for rich results, FAQPage schema where the answers don’t match the visible content on the page, and Product schema missing the offers property entirely (renders the schema useless for Shopping results).
Faceted Navigation: The #1 Crawl Budget Killer for Ecommerce
Faceted navigation - the filtering system that lets users browse by color, size, price range, brand, and other attributes - is the most common technical SEO problem on ecommerce sites with large catalogs.
The problem: every filter combination generates a new URL. A category with 200 products, 5 color options, 4 size options, and 3 price ranges can theoretically generate thousands of unique URLs. Most of them contain near-duplicate content with minimal unique value. Google crawls them anyway.
The result: crawl budget is consumed by low-value filter pages, canonical signals get confused, and the core category pages you actually want to rank can end up devalued.
The solution isn’t simple or universal. It depends on which filter combinations have genuine search demand (brand-level filters often do; color/size combinations almost never do for most categories).
The standard approach:
- Use
noindexon filter combinations with no search demand (keeps them crawlable but out of the index) - Add canonical tags on filter pages pointing to the root category URL
- Block low-value parameter patterns via robots.txt for pages with no search intent
- For filter combinations with genuine search demand (e.g., “running shoes for women” as a category), create properly optimized landing pages rather than relying on filter-generated URLs
The robots.txt approach is blunt but fast. The canonical approach is more nuanced but can leave Googlebot crawling pages it won’t index, which still consumes budget. In production, a combination of both is usually the right answer.
For detailed ecommerce-specific handling, see the Ecommerce SEO guide.
Internal Link Equity Distribution
Internal links pass authority through the site. Pages that receive more internal links from authoritative pages rank better, all else equal. Most sites distribute internal link equity by accident rather than by design.
The audit questions:
Which pages are orphaned? An orphaned page has no internal links pointing to it. Googlebot can only find it via the sitemap or an external link. Orphaned pages don’t accumulate internal authority. Screaming Frog can identify these by crawling the site and filtering for pages with zero inlinks.
What is the crawl depth of priority pages? Crawl depth is the number of clicks from the homepage to reach a given page. Google uses crawl depth as a proxy for importance - pages closer to the homepage are considered more significant. Priority pages (core service pages, top product categories, highest-revenue articles) should be reachable within 2-3 clicks. If your most important content is buried 6 clicks deep, internal linking is failing you.
Where is equity flowing that it shouldn’t be? Footer links, navigation links, and sidebar links pass equity to wherever they point. If your global footer links to 40 different pages, each of those links carries diluted value. Prioritize internal links in body content - they pass more equity and are more topically relevant.
Is anchor text diverse and descriptive? Anchor text in internal links signals topical relevance. “Click here” and “learn more” are wasted opportunities. Use descriptive anchor text that includes the keyword you want the target page to rank for.
A practical tool: Screaming Frog’s Link report shows inlink counts per page. Sort by inlinks ascending to find underlinked pages. Cross-reference against your priority pages list. Any priority page with fewer than 5-10 contextual inlinks from related content should be a linking target in your next content update.
The Prioritized Action Framework
Every technical SEO audit surfaces more issues than any team can fix immediately. The prioritized action framework turns a list into a plan.
The scoring is straightforward. Rate each issue on two dimensions:
Impact: How much does fixing this improve organic performance? (High / Medium / Low) Effort: How much time and technical resource does fixing this require? (High / Medium / Low)
Then group into four buckets:
Fix immediately (High Impact / Low Effort): Redirect chains to existing pages, missing canonical tags on key pages, XML sitemap containing redirected or 404 URLs, robots.txt accidentally blocking important sections, missing schema on templated pages.
Schedule for next sprint (High Impact / High Effort): Faceted navigation overhaul, site migration cleanup (fixing multi-hop redirect chains at scale), Core Web Vitals fixes requiring theme-level changes, server log analysis and crawl budget reallocation.
Do when convenient (Low Impact / Low Effort): Meta description optimization, title tag tweaks, image alt text additions for non-priority pages.
Evaluate ROI (Low Impact / High Effort): Major CMS changes that fix minor SEO issues, complex hreflang restructuring for small international traffic, custom schema implementations for page types that don’t generate rich results.
The most valuable thing an audit can deliver is this prioritization. The list of issues is often predictable. Knowing which three things will move the needle most in the next 90 days is what separates a useful audit from a report.
If you want a professional technical SEO audit conducted on your site, the SEO audit service covers all of the above plus a prioritized action plan. If you’re looking for ongoing consulting rather than a one-time audit, see the SEO consulting service.
Useful References
- Google Search Essentials (SEO Starter Guide)
- Creating Helpful, Reliable, People-First Content
- Structured Data Intro
FAQ
What is included in a technical SEO audit? A thorough technical SEO audit covers crawl budget and server log analysis, Core Web Vitals (LCP, CLS, INP) at page-group level, canonical implementation, redirect chains, XML sitemap accuracy, robots.txt configuration, schema markup validity, faceted navigation handling (for ecommerce sites), and internal link equity distribution. The output should be a prioritized action list, not just a raw list of issues.
How long does a technical SEO audit take? For a site with fewer than 10,000 pages, a thorough audit typically takes 2-4 days of focused work. Large ecommerce sites with 100,000+ URLs, complex faceted navigation, and server log analysis can take 5-10 days. The time cost is front-loaded: once the audit is done, the fixes are discrete tasks that can be distributed across sprints.
How is a technical SEO audit different from an SEO audit? A technical SEO audit focuses specifically on how a site is built and how search engines interact with it: crawlability, indexation, page speed, structured data, canonical configuration, and site architecture. A broader SEO audit would also include content quality analysis, keyword targeting review, and backlink profile assessment. Technical and content audits address different problems; both are usually needed.
Do small sites need a technical SEO audit? Yes, but the scope is different. Small sites (under 500 pages) rarely have crawl budget or faceted navigation problems. But they do have canonical issues, missing schema, slow Core Web Vitals, and internal linking gaps that affect rankings. A technical audit for a small site might take half a day rather than several days, but skipping it entirely means making content and link decisions on a potentially broken foundation.
What tools do I need to run a technical SEO audit? The minimum toolkit: Google Search Console (crawl stats, coverage report, Core Web Vitals, rich results), Screaming Frog SEO Spider (crawl analysis, redirect chains, canonical audit, internal links), PageSpeed Insights or Lighthouse (Core Web Vitals lab data), and the Rich Results Test (schema validation). For server log analysis, Screaming Frog Log Analyzer handles most use cases. For enterprise-scale sites, tools like Botify or Oncrawl provide more advanced log analysis with SEO-specific reporting.
How often should you run a technical SEO audit? For active sites that publish regularly or make frequent template changes, a lightweight monthly crawl audit plus a full audit every 6 months is a reasonable cadence. Any time a site undergoes a significant change - migration, redesign, CMS change, major URL restructure - a full technical audit before and after the change is non-negotiable.
What is crawl budget and why does it matter? Crawl budget is the number of pages Googlebot will crawl on your site in a given period. For most small sites it’s not a practical concern. For large ecommerce sites with tens of thousands of URLs, Googlebot’s crawl capacity is finite, and if it’s being consumed by low-value parameter pages, pagination, or filter combinations, important new content may not be indexed promptly. Crawl budget matters most on sites with 10,000+ pages, frequent content publication, and complex URL structures generated by filtering systems.
About the Author Luciano Bonanno is an independent SEO and Growth Consultant with 18 years of experience. Founder of SameAPI and DeLeak.co. Book a strategy call →