The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. Salvaging links from WMT “Crawl Errors” list?

    Salvaging links from WMT “Crawl Errors” list?

    Technical SEO Issues
    7 3 166
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • GregB123
      GregB123 last edited by

      When someone links to your website, but makes a typo while doing it, those broken inbound links will show up in Google Webmaster Tools in the Crawl Errors section as “Not Found”. Often they are easy to salvage by just adding a 301 redirect in the htaccess file.

      But sometimes the typo is really weird, or the link source looks a little scary, and that's what I need your help with.

      First,   let's look at the weird typo problem. If it is something easy, like they just lost the last part of the URL, ( such as www.mydomain.com/pagenam ) then I fix it in htaccess this way:

      RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]

      RewriteCond %{HTTP_HOST} ^www.mydomain.com$

      RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]

      But what about when the last part of the URL is really screwed up? Especially with non-text characters, like these:

      www.mydomain.com/pagename1.htmlsale              www.mydomain.com/pagename2.htmlhttp://   www.mydomain.com/pagename3.html"             www.mydomain.com/pagename4.html/

      How is the htaccess Rewrite Rule typed up to send these oddballs to individual pages they were supposed to go to without the typo?

      Second,  is there a quick and easy method or tool to tell us if a linking domain is good or spammy? I have incoming broken links from sites like these:

      www.webutation.net      titlesaurus.com     www.webstatsdomain.com      www.ericksontribune.com        www.addondashboard.com       search.wiki.gov.cn       www.mixeet.com       dinasdesignsgraphics.com

      Your help is greatly appreciated. Thanks!

      Greg

      1 Reply Last reply Reply Quote 0
      • FedeEinhorn
        FedeEinhorn last edited by

        Although you can redirect any URL to the one you consider they wanted to link, you may end up with hundreds of rules in your htaccess.

        I personally wouldn't use this approach, instead, you can build a really good 404 page, which will look into the typed URL and show a list of possible pages that the user was actually trying to reach, while still returning a 404 as the typed URL actually doesn't exists.

        By using the above method you also avoid worrying about those links as you mentioned. No linkjuice is passed tho, but still traffic coming from those links will probably get the content they were looking for as your 404 page will list the possible URLs they were trying to reach...

        GregB123 1 Reply Last reply Reply Quote 0
        • GregB123
          GregB123 @FedeEinhorn last edited by

          Thanks Federico, I do have a good custom 404 page set up to help those who click a link with a typo.

          But I still would like to know how to solve the questions asked above...

          FedeEinhorn 1 Reply Last reply Reply Quote 0
          • FedeEinhorn
            FedeEinhorn @GregB123 last edited by

            Well, if you still want to go that way, the rewrite conds there are not needed (as it is given that the htaccess IS in your domain). Then a rewrite rule for www.mydomain.com/pagename1.htmlsale should be:

            RewriteRule ^pagename1.htmlsale$ pagename1.html [R=301,L]

            Plus a rule to cover everything that is pagename1.html*** such as pagename1.html123, pagename1.html%22, etc. can be redirected with this rule:

            RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]

            GregB123 1 Reply Last reply Reply Quote 0
            • GregB123
              GregB123 @FedeEinhorn last edited by

              Thank you Federico.  I did not know about the ability to use (.*)$ to deal with any junk stuck to the end of html

              So when you said "the rewrite conds are not needed" do you mean that instead of creating three lines of code for each 301 redirect, like this...

              RewriteCond %{HTTP_HOST} ^mydomain.com$ [OR]

              RewriteCond %{HTTP_HOST} ^www.mydomain.com$

              RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]

              ...that the first two lines can be removed? So each 301 redirect rules is just one line like this...

              RewriteRule ^pagenam$ "http://www.mydomain.com/pagename.html" [R=301,L]

              ...without causing problems if the visitor is coming into the mydomain.com version or the www.mydomain.com version?

              If so, that will sure help decrease the size of the file. But I thought that if we are directing everything to the www version, that those first two lines were needed.

              Thanks again!

              FedeEinhorn JaredMumford 2 Replies Last reply Reply Quote 0
              • FedeEinhorn
                FedeEinhorn @GregB123 last edited by

                Exactly.

                Let's do some cleanup 🙂

                To redirect everything domain.com/** to www.domain.com you need this:

                RewriteCond %{HTTP_HOST} !=www.domain.com [NC]
                RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]

                That's it for the www and non-www redirection.

                Then, you only need one line per 301 redirection you want to do, without the need of specifying those rewrite conds you had previously, doing it like this:

                RewriteRule ^pagename1.html(.*)$ pagename1.html [R=301,L]

                That will in fact redirect any www/non-www page like pagename1.htmlhgjdfh to www.domain.com/pagename1.html. The (.*) acts as a wildcard.

                You also don't need to type the domain as you did in your examples. You just type the page (as it is in your same domain, you don't need to specify it): pagename1.html

                🙂

                1 Reply Last reply Reply Quote 1
                • JaredMumford
                  JaredMumford @GregB123 last edited by

                  Hi Gregory -

                  Yes, as Frederico mentions you do not have to put the rewrite cond. before every rewrite since it the htaccess is on your root its implied. You might need to do this if you creating multiple redirects for www to non-www etc.

                  Also Frederico is right - this isnt the best way to deal with these links, but I use a different solution.  First I get a flat file of my inbound links using other tools as well as WMT, and then i run them through a test to ensure that the linking page still exist.

                  Then I go through the list and just remove the scraper / stats sites like webstatsdomain, alexa etc so that the list is more manageable. Then I decide which links are ok to keep (there's no real quick way to decide, and everyone has their own method). But the only links are "bad" would be ones that may violate Google's Webmaster Guidelines.

                  Your list should be quite small at this point, unless you had a bunch of links to a page that you subsequently moved or changed its URL. In that case, add the rewrite to htaccess.  The remaining list you can simply contact the sites and notify them of the broken link and ask to have it fixed. This is the best case scenario (instead of having it go to a 404 or even a 301 redirect). If its a good link, its worth the effort.

                  Hope that helps!

                  1 Reply Last reply Reply Quote 0
                  • 1 / 1
                  • First post
                    Last post
                  • Website crawl error
                    WeAreDigital_BE
                    WeAreDigital_BE
                    0
                    5
                    94

                  • Link Errors
                    Whebb
                    Whebb
                    0
                    4
                    39

                  • Can google bots read my internal post links if they are all listed in a javascript accordian where I list my sources?
                    JordanLowry
                    JordanLowry
                    0
                    2
                    51

                  • Has Google Stopped Listing URLs with Crawl Errors in Webmaster Tools?
                    SEOAndy
                    SEOAndy
                    0
                    2
                    129

                  • During my last crawl suddenly no errors or warnings were found, only one, a 403 error on my homepage.
                    FedeEinhorn
                    FedeEinhorn
                    0
                    2
                    130

                  • No crawl code for pages of helpful links vs. no follow code on each link?
                    GeorgeAndrews
                    GeorgeAndrews
                    0
                    4
                    338

                  • 404 crawl errors from "tel:" link?
                    loopyal
                    loopyal
                    0
                    3
                    4.0k

                  • If you add a no follow to a time sensitive link, will it get picked up as broken link 404 in WMT report?
                    RyanKent
                    RyanKent
                    0
                    2
                    718

                  Get started with Moz Pro!

                  Unlock the power of advanced SEO tools and data-driven insights.

                  Start my free trial
                  Products
                  • Moz Pro
                  • Moz Local
                  • Moz API
                  • Moz Data
                  • STAT
                  • Product Updates
                  Moz Solutions
                  • SMB Solutions
                  • Agency Solutions
                  • Enterprise Solutions
                  • Digital Marketers
                  Free SEO Tools
                  • Domain Authority Checker
                  • Link Explorer
                  • Keyword Explorer
                  • Competitive Research
                  • Brand Authority Checker
                  • Local Citation Checker
                  • MozBar Extension
                  • MozCast
                  Resources
                  • Blog
                  • SEO Learning Center
                  • Help Hub
                  • Beginner's Guide to SEO
                  • How-to Guides
                  • Moz Academy
                  • API Docs
                  About Moz
                  • About
                  • Team
                  • Careers
                  • Contact
                  Why Moz
                  • Case Studies
                  • Testimonials
                  Get Involved
                  • Become an Affiliate
                  • MozCon
                  • Webinars
                  • Practical Marketer Series
                  • MozPod
                  Connect with us

                  Contact the Help team

                  Join our newsletter
                  Moz logo
                  © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                  • Accessibility
                  • Terms of Use
                  • Privacy