Facebook ignores multiple slashes for business listing - true duplicate page issue?
-
Hi everyone,
I am doing an external link audit for a site that contains a large number of public business profiles in it. As a part of each profile, there can be a website listed for the business, as well as a FB page and twitter etc. Pretty standard.
There are thousands of business profiles on the site and I noticed that in a small group of links to FB profiles there seem to be typos. These are not the actual businesses that I found, but I have taken common public profiles and modified to show what I am seeing to give examples.
https://www.facebook.com//nissanusa/
https://www.facebook.com//pizzahutus/
https://www.facebook.com//WhiteHouse/
https://www.facebook.com//TrekBicycle/
Notice the double slash. You can put as many slashes as you want and it makes no difference, e.g.
https://www.facebook.com///////////////////////////////////////////////////////////TrekBicycle/
FB shows a 200 for all of pages when it should show a 404. There is no 301 to the correct URL. Facebook also does not canonical these other pages to the one slash URL.
You could call this a potential duplicate content issue due to typos. These types of pages would be important for brand related searches for a business. Google may be smart enough to ignore them, or maybe the typo does not happen often enough that it does not really matter. I am just surprised that FB does not 404 or 301 these pages.
When I checked my personal FB page URL and some of my friends, this does not happen. FB shows a 404 if you add extra slashes to personal pages. So, the duplicate issue seems to only be with business type FB pages.
Curious about what the group thinks or if they have seen similar situations like this one.
Thanks!
-
Hi there.
Well, it's not just Facebook. It's any website. There can be as many slashes after domain name as you wish, it'll be working no problem. Eg.:
https://m1machining.com/////m1-80-ar15-lower-receiver-billet-7075.html - works
https://m1machining.com////ar10-ar15-products/m1-80-ar15-lower-receiver-billet-7075.html - works
But https://m1machining.com////ar10-ar15-products////m1-80-ar15-lower-receiver-billet-7075.html - doesn't work
I am not sure if it's server configurations, browser configurations or all that together plus extra stuff, but it makes sense that it works due to typo, which is obviously a typo.
-
Hi CleverPhd and Dmitri,
You are both right, not only Facebook but all other servers are configured to ignore the extra slashes in the URL. This can be a problem for sure for crawlers, as they would treat each URL as different (if asked to index). Moreover, there are infinite number of URL you can produce with this example.
As per URI (http://www.ietf.org/rfc/rfc2396.txt) standard, each slash has a significance. If you test the same phenomenon with Ealier browsers of IE, you will realise that they don't reproduce the same effect, the pages would be missing elements are they fail to find the location of the resources required for the page.
The problem is with the how our servers are configured these days to ignore the extra slash and produce the same page as a result.
There a quick fix on Apache with mod_rewrite code (you can add more lines to cover 3 or more slashes) which would produce 301 redirect to the right page
<code>RewriteCond %{REQUEST_URI} ^(.*)//(.*)$ RewriteRule . %1/%2 [R=301,L]</code>I hope this helps,
Regards,
Vijay