Welcome to the Q&A Forum

daedriccarl

Thanks, will give screaming frog a go. I own the tool but rarely seem to use it. Hopefully that finds out what is amiss.

daedriccarl

Hi

Could i get a second opinion on the following please. ON a client site we seem to have had a massive drop off in google crawling in the past few weeks, this is linked with a drop in search impressions and a slight reduction in penalty. There are no warning messages in WMT to say the site is in trouble, and it shouldn't be, however cannot get to the bottom of what is going on.

In Feb the Kilobytes downloaded per day was between 2200 and about 3800, all good there. However in the past couple of weeks it has peaked at 62 and most days are not even over 3! Something odd has taken place.

For the same period, the Pages crawled per day has gone from 50 - 100 down to under 3.

At the same time the site speed hasn't changed - it is slow and has always been slow (have advised the client to change this but you know how it is....) Unfortunately I am unable to give the site url out so i understand that may impact on any advice people could offer.

Ive attached some screen shots from WMT below.

Many thanks for any assistance.

stats.png

daedriccarl

I like to use both GA and a secondary stats tool to double check the figures.

daedriccarl

Chris,

Thanks for the reply. Moments after posting (and after you replied) I discovered what it was. The site uses a bit of php code from an external site for tracking purposes. Something has broken at their end instead of my wordpress site displaying an error it is causing the 500 error and blocking the bots. I will remove the code and look for an alternative solution.

Thanks all.

Carl

daedriccarl

Hi

Wondering if anyone could help out here please. I tried to crawl my website using screaming frog and got a 500 error warning (odd considering it worked yesterday). I tried google speed test and got the following message - PageSpeed Insights received a 500 response from the server. To analyze pages behind firewalls or that require authentication. However, it also crawled the site and gave me a score of 70/100. It showed the site thumbnail, as normal. This was updated this morning and it showed the latest one so must have crawled.

I can access the site, as can my 'people', the site is getting hits from real users in Google Analytics so must be working.

Am very puzzled how the site can be working and not be working at the same time! If possible i would rather not share the url. The site is a test of concept and is very messy looking. it is more a test to see what happens if i do x...

Many thanks

Carl

daedriccarl

Hi

Thanks for looking at the issue. There should be four working nameservers. I have four set in both WHM and at my domain registrar. I added two more two (3 and 4) so maybe they are taking a while to resolve around the web.

Will look at the SOA, thanks. Server and domain set up isn't at the top of my skill set. The domain you mention in this thread is just a testing domain to see what happens with a certain kind of content so it hasn't been treated too seriously to be honest.

daedriccarl

thanks, very useful tool. None of my domains passed but i can still access them. Will try moving away from A record set up and use nameservers instead

daedriccarl

Hi

Anyone else having problems with Google's Pagespeed tool? I am trying to benchmark a couple of my sites but, according to Google, my sites are not loading. They will work when I run them through the test at one point but if I try again, say 15 mins later, they will present the following error message

An error has occured

DNS error while resolving DOMAIN. Check the spelling of the host, and ensure that the page is accessible from the public Internet. You may refresh to try again.

If the problem persists, please visit the PageSpeed Insights mailing list for support.

This isn't too much an issue for testing page speed but am concerned that if Google is getting this error on the speed test it will also get the error when trying to crawl and index the pages.

I can confirm the sites are up and running. I the sites are pointed at the server via A-records and haven't been changed for many weeks so cannot be a dns updating issue. Am at a loss to explain.

Any advice would be most welcome. Thanks.

daedriccarl

thanks for reminding me of the robots option. Overlooked that. Will propose it

daedriccarl

hi

having an issue with a client site and internal duplicate content. The client has a custom cms and when they post new content it can appear, in full, at two different urls on the site. Short of getting the client to move cms, which they won't do, I am trying to find an easy fix that they could do themselves.

ideally they would add a canonical on one of the versions but the cms does allow them to view posts in html view, also would be a lot if messing about wth posting the page and then going back to the cms and adding the tag. the cms is unable to auto generate this either. The content editors are copywriters not programmers.

Would there be a solution using wmt for this? They have the skill level to be able to add a url in wmt so im thinking that a stop gap solution could be to noindex one of the versions using the option in webmaster tools. Ongoing we will consult developers about modifying the cms but budgets are limited so looking for a cheap and quick solution to help until the new year.

anyone know of a way other than wmt to block Google from seeing duplicate content. We can block Google from folders because only a small percentage of the content in the folder would be internally duplicate.

would be very grateful for any suggestions anyone could offer.

thanks.

daedriccarl

thanks for the advice and tip on the tracking. Will give it a go

daedriccarl

Hi all

Wondering if someone could give me a pointer here please..... my client is an information resource on internet safety, it is a non profit website just blogging about internet safety and threats. To cut a long story short, the client has relationships with all the police forces and universities in the UK, they regularly republish content on their sites - with approval (and doesn't have relations with many dozens of sites which do not have approval to republish...)

Although the goal of the website is information distribution and not to raise money, the site does have a number of KPIs which it needs to meet to justify its sponsorship by the likes of facebook, google, microsoft etc.

We are looking to make the content the site publishes embeddable so rather than just republishing 'our' content and it looking like the third party sites' own work, we at least get the credit.

The issue we are trying to work out is down to stats. If site B embeds our article and it gets 1,000 views on their site, do these 1,000 people appear in our stats too? I would guess that it does as the content is being loaded from our site each time the 1,000 people visits.

In which case, would these 1,000 hits appear as direct traffic or referral traffic in the stats when they read the content on site B? We have run some tests and not seeing the test site appearing as a referrer in the stats so a little puzzled.

Many thanks for any advice

daedriccarl

Hi Adam,

Thanks for the response. I tested the canonical side of things but was finding that it was stopping the filtered pages being indexed. While we could get 'Dresses' page indexed we couldn't get 'Black Dresses' 'X retailer brand Dresses' etc indexed. We found this a bit of an issue. On the filtering page the tag always pointed back to the category root.

We are using an seo plugin for Magento so maybe i will need to go back to the dev and ask them. I accept that not putting canonical tag on the filtering could lead to internal duplicate content issues if a product can be found a dresses, red dresses, x brand dresses, x brand red dresses and via price.

Even though the side is still a work in progress we are already seeing the filtered pages getting indexed and ranking fairly well. So, for example (I don't think we rank for this one) we are ranking for term such as Black Size 12 Evening Dress. Sure, this term won't get millions of searches but long tail converts very well. As much as I would love to be no1 for Dresses we are not going to get there for a long long time. Especially given the No1 brand for the term is DA 86 and has hundreds of thousands of links and over 2.1m G+ shares.

We are in a tricky position with the website. Normally we could rank for the filtered terms by product page easy enough, however with all the product pages being pulled externally we need to find an alternative.

daedriccarl

Might be worth me adding that I'm aware that all the product pages are duplicate content from other websites. The shop section of the website is an affiliate store. However, all the product pages are set as noindex to the search engines as a result of this. The internal link between the category pages and the product pages will be made nofollow in the coming days. If the engines cannot index the individual products then little point wasting bandwidth on them crawling 200,000 products!

daedriccarl

Hi

According to the moz crawl on my website I have in the region of 800 pages which are considered internal duplicates. I'm a little puzzled by this, even more so as some of the pages it lists as being duplicate of another are not.

For example, the moz crawler considers page B to be a duplicate of page A in the urls below: Not sure on the live link policy so ive put a space in the urls to 'unlive' them.

Page A http:// nuchic.co.uk/index.php/jeans/straight-jeans.html?manufacturer=3751

Page B http:// nuchic.co.uk/index.php/catalog/category/view/s/accessories/id/92/?cat=97&manufacturer=3603

One is a filter page for Curvety Jeans and the other a filter page for Charles Clinkard Accessories. The page titles are different, the page content is different so Ive no idea why these would be considered duplicate. Thin maybe, but not duplicate.

Like wise, pages B and C are considered a duplicate of page A in the following

Page A http:// nuchic.co.uk/index.php/bags.html?dir=desc&manufacturer=4050&order=price

Page B http:// nuchic.co.uk/index.php/catalog/category/view/s/purses/id/98/?manufacturer=4001

Page C http:// nuchic.co.uk/index.php/coats/waistcoats.html?manufacturer=4053

Again, these are product filter pages which the crawler would have found using the site filtering system, but, again, I cannot find what makes pages B and C a duplicate of A.

Page A is a filtered result for Great Plains Bags (filtered from the general bags collection). Page B is the filtered results for Chic Look Purses from the Purses section and Page C is the filtered results for Apricot Waistcoats from the Waistcoat section.

I'm keen to fix the duplicate content errors on the site before it goes properly live at the end of this month - that's why anyone kind enough to check the links will see a few design issues with the site - however in order to fix the problem I first need to work out what it is and I can't in this case.

Can anyone else see how these pages could be considered a duplicate of each other please? Checking ive not gone mad!!

Thanks,

Carl

daedriccarl

thanks. I completely forgot about the robots.txt method of fixing it. Will give it a go. I will add comparison in there too just to be safe.

daedriccarl

Thanks will try and work out how to do that. Magento is a pain to make minor changes to.

Hopefully the crawl stats will look a little healthier next week !!

daedriccarl

Hi all

Wondering if anyone could help out with this one. Roger Bot crawler has just performed it's weekly error crawl on my site and I appear to have 18,613 temp redirect problems!! Rather, the same 1 problem 18,613 times.

My site is a magento store and the errors it is giving me is due to the wishlist feature on the site. For example, it is trying to crawl links such as index.php/wishlist/index/add/product/29416/form_key/DBDSNAJOfP2YGgfW (which would normally add the item to one's wishlist). However, because Roger isn't logged into the website it means that all these requests are being sent to the login url with the page title of Please Enable Cookies.

Would the best way to fix this be to enable wishlists for guests? I would rather not do that but cannot think of another way of fixing it.

Any other Magento people come across this issue?

Thanks, Carl

daedriccarl

Good tip, thanks.

daedriccarl

The 'new' site is the old site moved over so the plan is to 301 and just change the domain name so domain.com/1 would redirect newdomain.com/1 etc

Hopefully that should cover all bases

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

daedriccarl

@daedriccarl

Posts made by daedriccarl

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved