The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. Competitior 'scraped' entire site - pretty much - what to do?

    Competitior 'scraped' entire site - pretty much - what to do?

    Intermediate & Advanced SEO
    29 7 4.4k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • seagreen
      seagreen last edited by

      I just discovered a competitor in the insurance lead generation space has completely copied my client's site's architecture, page names, titles, even the form, tweaking a word or two here or there to prevent 100% 'scraping'.

      We put a lot of time into the site, only to have everything 'stolen'.  What can we do about this?  My client is very upset.  I looked into filing a 'scraper' report through Google but the slight modifications to content technically don't make it a 'scraped' site.

      Please advise to what course of action we can take, if any.

      Thanks,
      Greg

      1 Reply Last reply Reply Quote 0
      • RyanKent
        RyanKent last edited by

        Hi Greg.

        Having a site scraped is unfortunately common. It is a frustrating experience which takes time and effort to address. Below are some suggestions:

        • going forward you can copyright at least some pages within your site. Even if you do not wish to copyright every page, by having some pages copyrighted you will have very clear legal rights if your entire site is scraped.

        • add the canonical tag to each page, along with various clues throughout the site to indicate it really belongs to you. Generally speaking, these operations are a bit lazy which is why they steal from others rather then create their own content. If they do not recognize the canonical tag then you might receive all the SEO credit for the second site, and either way Google will understand to index your site as the primary source of the content.

        • You might rename a random image to mysite.com.jpg as a suggestion. There are numerous other means by which you can drop indicators the content is really yours. The reason this step is helpful is clearly the site which stole your content fall into the no ethics category. They clearly know what they are doing and likely have used this practice before and will do so again. As part of the process, they often will deny everything and may even claim you stole the site from them.These clues can assist proving you are the true owner.

        • you should contact the offending site via registered mail with a "Cease and Desist" notification. Be certain to provide a deadline. I use 10 days as a timeline.

        • If the C&D does not work, contact their web host with a DMCA notice. If the host is reputable, they will honor the DMCA and take down the site. The problem is the host is required to contact the site and share your claim with the site owner. The site owner can respond with a statement saying the content is theirs, and then there is nothing further the host can do UNLESS you have a registered copyright or you have a helpful host who is willing to consider your evidence (i.e. the clues you left) and help you (their non-customer) over their paying customer. Some hosts are good this way.

        • You can always take legal action and sue the website and host in court. Again, the copyright is very important in court as it provides you with a significant advantage. Some sites will actually defend themselves in court with the intention to delay the trial as long as possible and drive up your expenses to literally tens of thousands of dollars so you give up.

        The above process will work in a lot of cases, but not all. When it doesn't work, you have to take other approaches. Sometimes the site is owned, operated and hosted in a foreign country. Sometimes the country does not have enforceable copyright laws. In these cases, and in the others above, you can file the complaint with Google and they have the ability to remove the offending site from their index.

        seagreen DanielFreedman 2 Replies Last reply Reply Quote 11
        • seagreen
          seagreen @RyanKent last edited by

          Thanks - I am going to get started on these.  Does the canonical tag work after the fact?

          Thanks,
          Greg

          RyanKent 1 Reply Last reply Reply Quote 0
          • RyanKent
            RyanKent @seagreen last edited by

            Does the canonical tag work after the fact?

            The canonical tag only works if the scraping site is dumb enough or lazy enough not to correct it. Fortunately, this applies in many circumstances.

            Also, the scraping might have been a one time thing, but often they will continue to scrap your site for updates and new content. It depends. If they return for new content, then yes it would apply.

            My suggestion would be to copyright your home page immediately. Additionally, add a new page to your site and copyright it. Then you have two pages on your site which are copyrighted which offers you a lot more protection then you presently offer.

            One item I forgot to mention, Google Authorship. Use it.

            http://googlewebmastercentral.blogspot.com/2011/06/authorship-markup-and-web-search.html

            http://www.google.com/support/webmasters/bin/answer.py?answer=1408986

            1 Reply Last reply Reply Quote 1
            • ShaMenz
              ShaMenz last edited by

              Hi Greg,

              Awesome information there from Ryan!

              Implementing the authorship markup is important in that it basically "outs" anyone who has already stolen your content by telling Google that they are not the original author. With authorship markup properly implemented, it really doesn't matter how many duplicates there are out there, Google will always see those sites as imposters, since no-one else has the ability to verify their authorship with a link back from your Google profile 🙂

              It is possible to block scrapers from your server (blacklist) using IP address or User Agent if you are able to identify them. Identification is not very difficult if you have access to server logs as there will be a number of clues in the log data. These include excessive hits, bandwidth used, requests for java and css files and high numbers of 401(unauthorized) and 403 (forbidden) HTTP error codes.

              Some scrapers are also easily identifiable by User Agent (name). Once the IP address or user agent is known, instructions can be given to the server to block it and if you wish, to serve content which will identify the site as having been scraped.

              If you are not able to specifically identify the bot(s) responsible, it is also possible to use alternatives like whitelisting bots that you know are OK. This needs to be handled carefully as ommissions from the whitelist could mean that you have actually banned bots that you want to crawl the site.

              If using a LAMP setup (Apache server), then instructions are added to the .htaccess file using PHP. For a Windows server, you use a database or text file with filesystemobject to redirect them to a dead end page. Ours is a LAMP Shop, so I am much more familiar with the .htaccess method.

              If using .htaccess you have the choice of returning a 403 FORBIDDEN HTTP error, or using the bot-response.php script to serve an image which identifies the site as scraped 🙂

              If using bot-response.php, the gif image should be made large enough to break the layout in the scraped site if they serve the content somewhere else. Usually a very large gif that reads something like: "Content on this page has been scraped from yoursite.com. If you are the webmaster please stop trying to steal our content".will do the job.

              There is one VERY BIG note of caution if you are thinking of blocking bots from your server. You really need to be an experienced tech to do this. It is NOT something that should be attempted if you don't understand exactly what you are doing and what precautions need to be taken beforehand. There are two major things to consider:

              1. You can accidentally block the bots that you want to crawl your site. (Major search engines use many different crawlers to do different jobs. They do not always appear as googlebot, slurp etc)
              2. It is possible for people to create fake bots that appear to be legitimate. If you don't identify these you will not solve the scraping problem.

              The authenticity of bots can be verified using Roundtrip DNS Lookups and WhoIs Lookups to check the originating domain and IP address range.

              It is possible to add a disallow statement for "bad bots" to your robots.txt file, but scrapers will generally ignore robots.txt by default, so this method is not recommended.

              Phew! Think that's everything covered.

              Hope it helps,

              Sha

              RyanKent 1 Reply Last reply Reply Quote 3
              • RyanKent
                RyanKent @ShaMenz last edited by

                Wow! Amazing information on the bots Sha. I never knew about this approach. My thoughts were just about how bad bots would ignore the robots.txt file and not much else a site owner can do.

                I have to think there are a high number of "bad" bots out there using various names which often change. It also seems likely the IP addresses of these bad bots change frequently. By any chance do you, or anyone else, know of some form of "bad bots" list which is updated?

                It seems like too much work for any normal site owner to compile and maintain a list of this nature.

                I know...this is a stretch but hey, it doesn't hurt to ask, right?!

                ShaMenz 1 Reply Last reply Reply Quote 1
                • ShaMenz
                  ShaMenz @RyanKent last edited by

                  Hi Ryan,

                  The major problem is that any experienced programmer can easily write their own script to scrape a site. So there could be thousands of "bad bots" out there that have not been seen before.

                  There are a few recurring themes that appear amongst suspicious User Agents that are easy to spot - generally anything that has a name including words like grabber, siphon, leach, downloader, extractor, stripper, sucker or any name with a bad connotation like reaper, vampire, widow etc. Some of these guys just can't help themselves!

                  The most important thing though is to properly identify the ones that are giving you a problem by checking server logs and tracing where they originate from using Roundtrip DNS and WhoIs Lookups.

                  Matt Cutts wrote a post a long time ago on how to verify googlebot and of course the method applies to other search engines as well. The doublecheck is to then use WhoIs to verify that the IP address you have falls within the IP range assigned to Google (or whichever search engine you are checking).

                  If you are experienced at reading server logs it becomes fairly easy to spot spikes in hits, bandwidth etc which will alert you to bots. Depending which server stats package you are using, some or all of the bots may already be highlighted for you. Some packages do a much better job than others. Some provide only a limited list.

                  If you have access to a programmer who is easy to get along with, the best way to get your head around this is to sit down with them for an hour and walk through the process.

                  Hope that helps,

                  Sha

                  PS - I'm starting to think you sleep less than I do! 🙂

                  1 Reply Last reply Reply Quote 1
                  • DanielFreedman
                    DanielFreedman last edited by

                    Excellent answers.

                    On top of everything else, how about some out of the box thinking: public shaming.

                    It's a risky strategy, so it needs careful consideration.

                    But it's pretty clear your client is the victim of dirty pool.

                    We're talking truth and justice and virtue here, folks. Forces of darkness vs. forces of light.

                    If I were still a TV news director, and someone on my staff suggested this as a story idea, I'd jump all over it.

                    And the company that copied the site would not emerge looking good.

                    RyanKent 1 Reply Last reply Reply Quote 0
                    • DanielFreedman
                      DanielFreedman @RyanKent last edited by

                      What does the C & D letter say? What is the threat? All the subsequent steps? Or do you just keep it vague and menacing (eg. "any and all remedies, including legal remedies")

                      RyanKent DanielFreedman 2 Replies Last reply Reply Quote 0
                      • RyanKent
                        RyanKent @DanielFreedman last edited by

                        It is a formal legal notification sent to the company involved. I perform research of the site information, contact information and domain registration information to determine the proper party involved. I also send the C&D via registered mail with proof of delivery. After the document has been delivered, I also sent the document to the site's "Contact Us" address. I take every step reasonably possible to ensure the document is received by the right party within the company, and I can document the date/time of receipt.

                        The letter provides the following:

                        • identifies the company which owns the copyrighted or trademarked material

                        • offers a means to contact the copyright and trademark owner

                        • identifies the copyright / trademark owner has become aware of the infringement

                        • provides proof of ownership such as the copyright number, trademark number, etc.

                        • identifies the location of the infringing content

                        • identifies my client has suffered harm as a result of the infringement. "Harm" can range from direct damages such as decreased sales, decreased website traffic, etc. or potential damage such as confusion in the marketplace.

                        Once the above points are established, the Cease and Desist demand is made.I also provide a follow up date by which the corrective action needs to be completed. Finally the specific next steps are covered with the following statement:

                        "This contact represents our goodwill effort to resolve this matter quickly and decisively. If further action is required please be advised of statute 15 U.S.C. 1117(a), sets out the remedies available to the prevailing party in trademark infringement cases. They are: (1) defendant’s profits, (2) any damages sustained by the plaintiff, (3) the costs of the action, and (4) in exceptional cases, reasonable attorney’s fees."

                        There are a couple additional legal stipulations added as required by US law. The C&D is then signed, dated and delivered.

                        This letter works in a high percentage of cases. When it fails, a slightly modified version is sent to the web host. If that fails, then the next recourse is requesting Google directly to remove the site or content from their index.

                        If all fails, you can sue the offending company. If you do go to court, the fact you went through the above process and did everything possible to avoid court action will clearly benefit your case. I have never gone to that last step and I am not an attorney but perhaps Sarah can comment further?

                        1 Reply Last reply Reply Quote 1
                        • DanielFreedman
                          DanielFreedman @DanielFreedman last edited by

                          Thanks. Very helpful.

                          1 Reply Last reply Reply Quote 0
                          • RyanKent
                            RyanKent @DanielFreedman last edited by

                            I love the idea, but there are two concerns I have about this approach. In order for this to work, the company has to be known. Usually known companies don't participate in content scraping.

                            Also, if you do launch a successful public shaming campaign, you could possibly open yourself up to legal damages. I know you are thinking "What? They stole from me!" You are taking action with the express purpose of harming another business. You need to be extremely careful.

                            There have been multiple court cases where a robber successfully sued a home or business owner when they were injured during a robbery. Of course we can agree that sounds insane, but it has really happened and this situation is much more transparent. The other company can claim you stole the content from them, and then you smeared the company. I can personally civil court cases are not set up so the good guy always wins or for principles to be upheld. Each side makes a legal case, the costs can quickly run into tens of thousands of dollars, and the side with the most money will often win. Be very careful before taking this approach.

                            DanielFreedman 1 Reply Last reply Reply Quote 0
                            • DanielFreedman
                              DanielFreedman @RyanKent last edited by

                              I agree you have to be very careful.

                              I am only suggesting this approach might be considered in certain circumstances.

                              Public shaming is an intermediate step somewhere between sending a friendly note, a C&D letter, and suing, provided:

                              • the other company's identity is known
                              • the other company cares about its reputation

                              I am not a lawyer. Nor do I play one on the Internet.

                              The other company might claim "tortious interference" in its business. (That was the claim against CBS in the tobacco case.) But it's a stretch. A truthful story in a mainstream media outlet poses little risk, IMHO.  Any competent attorney could make the case that the purpose of the story was to inform the public. As for libel, you have to prove "actual malice" or "regardless disregard for the truth" an almost impossible standard to meet of proving your were lying and knew you were lying.

                              But who wants to go to court? One company I worked for had copyright infringement issues. Enthusiastic fans were using the name and logo without consent. A friendly email was usually all it took for them to either cease and desist or become official affiliates.

                              But these were basically good people who infringed out of ignorance.

                              It's different if you're dealing with dirtbags.

                              1 Reply Last reply Reply Quote 0
                              • ShaMenz
                                ShaMenz last edited by

                                I guess the use of bot-response.php and bot-response.gif is the gentle internet version of a public shaming campaign.

                                Sometimes it's a matter of picking your battles, but engineering enough of a win to make your client feel better without launching into an all-out war that could end up costing way more than you're willing to pay.:)

                                Sha

                                bot-response.gif

                                RyanKent DanielFreedman 2 Replies Last reply Reply Quote 0
                                • RyanKent
                                  RyanKent @ShaMenz last edited by

                                  I love the idea if we can figure out a way to get it to work. It would require someone stealing your code, you discovering the theft, putting the steps in place and then the bad site coming back for more.

                                  ShaMenz RyanKent 3 Replies Last reply Reply Quote 0
                                  • DanielFreedman
                                    DanielFreedman @ShaMenz last edited by

                                    Love it!

                                    1 Reply Last reply Reply Quote 0
                                    • ShaMenz
                                      ShaMenz @RyanKent last edited by

                                      Hi Ryan,

                                      In this case Greg already knows the site has been scraped and duplicated. Blocking the scraper and serving the image via the bot-response php script is simply a "gift" to the duplicate site if they return to update their stolen content as they often do.

                                      It is entirely possible to put the solution in place for well known scrapers such as Pagegrabber etc, but there are thousands of them, the people using them can easily change the name when they have been outed and anyone can write their own.

                                      I understand that everyone wants a "list", but even if you Google "user agent blacklist" and find one, there will be problems. Adding thousands of rules to your .htaccess will eventually cause processing issues, the list will constantly be out of date etc.

                                      As I explained at the outset, the key is to be aware of what is happening on your server and respond where necessary. Unfortunately, this is not a "set and forget" issue. In my experience though, bots will likely be visible in your logs long before they have scraped your entire site.

                                      Sha

                                      1 Reply Last reply Reply Quote 0
                                      • RyanKent
                                        RyanKent @RyanKent last edited by

                                        Well darn, so there is no easy way out! I think this is a fantastic opportunity for you. You can create Sha Enterprises and offer an anti-bot copyright protection program which would protect sites.

                                        1 Reply Last reply Reply Quote 0
                                        • Distil
                                          Distil last edited by

                                          Hi All,

                                          To follow up on Ryan's last post "offer an anti-bot copyright protection program ", that is exactly what we have created at Distil.  We are the the first turnkey cloud solution that safeguards your revenue and reputation by protecting your web content from bots, data mining, and other malicious traffic.

                                          I do not mean to shamelessly advertise but it seems relevant to mention our service.  If anyone is interested in testing the solution please feel free to message me and I will be happy to extend a no obligation 30 day trial.

                                          Rami Founder, CEO 
                                          www.distil.it

                                          DanielFreedman RyanKent 2 Replies Last reply Reply Quote 0
                                          • ShaMenz
                                            ShaMenz @RyanKent last edited by

                                            hmmm...I like to pick my battles.

                                            Scumbags are scumbags and will always find a way to win in the short term.

                                            I like to live by two things my grandma taught me a long time ago...

                                            "What goes around comes around" and "revenge is a dish best served cold" 😉

                                            As to there being an easy way out - you're an SEO Ryan! You know the deal.

                                            Sha

                                            1 Reply Last reply Reply Quote 0
                                            • 1
                                            • 2
                                            • 1 / 2
                                            • First post
                                              Last post
                                            • When Mobile and Desktop sites have the same page URLs, how should I handle the 'View Desktop Site' link on a mobile site to ensure a smooth crawl?
                                              DirkC
                                              DirkC
                                              0
                                              3
                                              1.4k

                                            • How much risk would there be with this 'repeating of a sentence' situation?
                                              SEO5Team
                                              SEO5Team
                                              0
                                              2
                                              61

                                            • Our Site's Content on a Third Party Site--Best Practices?
                                              nicole.healthline
                                              nicole.healthline
                                              1
                                              4
                                              269

                                            • Can literally any site get 'burned'?
                                              EGOL
                                              EGOL
                                              0
                                              4
                                              394

                                            • Can you see the 'indexing rules' that are in place for your own site?
                                              Dr-Pete
                                              Dr-Pete
                                              0
                                              5
                                              451

                                            • What are we doing wrong with our new ecommerce site SEO vs. client's original (non-SEO'd) site?
                                              EGOL
                                              EGOL
                                              0
                                              8
                                              904

                                            • Why is my site's 'Rich Snippets' information not being displayed in SERPs?
                                              dohertyjf
                                              dohertyjf
                                              0
                                              2
                                              942

                                            • How 'Off Topic' can I go - site wide?
                                              CPU
                                              CPU
                                              0
                                              3
                                              959

                                            Get started with Moz Pro!

                                            Unlock the power of advanced SEO tools and data-driven insights.

                                            Start my free trial
                                            Products
                                            • Moz Pro
                                            • Moz Local
                                            • Moz API
                                            • Moz Data
                                            • STAT
                                            • Product Updates
                                            Moz Solutions
                                            • SMB Solutions
                                            • Agency Solutions
                                            • Enterprise Solutions
                                            • Digital Marketers
                                            Free SEO Tools
                                            • Domain Authority Checker
                                            • Link Explorer
                                            • Keyword Explorer
                                            • Competitive Research
                                            • Brand Authority Checker
                                            • Local Citation Checker
                                            • MozBar Extension
                                            • MozCast
                                            Resources
                                            • Blog
                                            • SEO Learning Center
                                            • Help Hub
                                            • Beginner's Guide to SEO
                                            • How-to Guides
                                            • Moz Academy
                                            • API Docs
                                            About Moz
                                            • About
                                            • Team
                                            • Careers
                                            • Contact
                                            Why Moz
                                            • Case Studies
                                            • Testimonials
                                            Get Involved
                                            • Become an Affiliate
                                            • MozCon
                                            • Webinars
                                            • Practical Marketer Series
                                            • MozPod
                                            Connect with us

                                            Contact the Help team

                                            Join our newsletter
                                            Moz logo
                                            © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                                            • Accessibility
                                            • Terms of Use
                                            • Privacy