Welcome to the Q&A Forum

CleverPhD

I see the entry for the page in the XML now. I also just searched the URL in Google and see it there as well. Looks like this was just a timing issue.

CleverPhD

You can submit more than one sitemap in GWT and also, Google will read an XML sitemap and an RSS feed. I have Google reading an XML sitemap and it also found my RSS feeds. I would say, whatever feed you can control XML or RSS get that to your liking and add in GWT for Google to chew on.

CleverPhD

Thank you Gagan!

CleverPhD

Hello there!

I see that based on the date listed on the page you posted this yesterday on 7/24/2013. Depending on how often Google visits your site, it may not spider all your pages every day. It has just been 24 hours so you may need to give it more time.

Things that can help speed this up is to make sure that you have this page listed in your sitemap

http://www.howlatthemoon.com/sitemap.xml

I did not see the page listed there and that is a common place that Google looks for new additions etc.

Good luck

CleverPhD

One key point on using robots.txt vs the meta tag noindex. It is not that the noindex meta tag is "superior" they just work differently.

If you use robots.txt - it will stop the spider from visiting that page, but it will not remove the page from the index. Also, if you have a page in robots.txt and on that page have a 301 redirect, or a canonical or a meta noindex Google will not see the page (due to the robots.txt directive) and then not be able to act on the 301 or canonical or the meta noindex.

A meta noindex, because the spider crawls the page, will not only tell Google not to visit the page anymore, but also tells Google to remove the page from the index. This is key if you want the pages removed from the Google index.

The rule of thumb I use is that

If you have a page that is not in the Google index and you want to keep it out of the index put that file in robots.txt.
If you have a page that is in the Google index and you want it removed, then use the noindex meta tag, do not put it into the robots.txt for reasons mentioned above. Over time, once the pages are removed (and this may take a while depending on how often the page is cralwed) then you can put into robots.txt for good measure.

CleverPhD

Yes it helps. Some have seen 150% improvement

http://www.quicksprout.com/2012/04/27/want-a-150-boost-in-traffic-then-use-this-idiot-proof-guide-to-google-authorship-markup/

There is a great article on the Moz bog on how you even need to optimize the picture

http://moz.com/blog/google-author-photos

Don't wait - do it!

CleverPhD

What you want to do is setup the redirect for all pages "except" those pages that you want to require a person to use the https.

As an example on a site I work on, we have two areas /cart/ and /account/ that represent when someone is checking out or when they are logged into their account and want to update payment options, respectively. You would exclude these folders from the https to http 301 redirect so that users could then use that part of the site in secure mode.The rest of the site you want to have the https 301 to http. The reason you go through all this is that a http and https versions of the site, if spidered, would be considered duplicate content and you want to prevent that.

The other part of this would be that you do not want the search engines (usually) to spider the shopping cart and user login sections of a site. Nofollow noindex all links that lead to those pages and also put those folders in robots.txt - that will keep the bots out of there.

One other thing. Make sure that your templates and content within the https sections of the site link out to the non https urls. The 301 will help with this, but why link to the wrong URL anyway?

All of that said. If your site is one that you deal with highly sensitive information (medical, financial come to mind) then you may simply want to have the site run as https. You would need to bulk up your server resources to handle this as https can slow things down a little bit, but it can be done.

CleverPhD

You mention 301s and canonical not being available - are you absolutely sure? Try and ask the host again, sometimes when you get another customer service person you may get an answer. The canonical would be idea.

I would not delete the old posts. They have links to them and get traffic. Couple of ideas come to mind. You could basically write up a short original summary of each post and put it on the old blog. Then have something like, if you want more information, this article has moved to and then link to the new blog post. That would at least drive referral traffic and would take care of the duplicate issue. In the absence of a canonical link, having a link to the "original" does help give credit. The link to the new site would also work to give credit for the post to the new site. At the same time, this will be kind of messy as when you change the content on the posts on the old site, you could potentially mess up the rankings of those pages.

I would test this out. Select 10-20 articles from the old site, and see what happens. As a comparison, take another 10-20 and just cut off the blog post after 300 words and then link to the full article on the new site.

I will be honest, as I read my suggestion, this will be kind of messy. Go back and push for that canonical. You can then link and copy and you will be totally clean. All you need is access to the HEAD portion of the pages. They look to have a "premium" option if you pay, maybe that would give you access?

Good luck!

CleverPhD

Thanks for the post Keri.

Yep, the OCR option would still make the image option for hiding "moo"

http://www.youtube.com/watch?v=fLwYpSCrlHU

CleverPhD

I would add, you want to also no follow, noindex all links to any of your shopping cart pages. Ideally, if you have your cart pages in a given folder, you can disallow the whole folder and take care of things as a group.

CleverPhD

Answer looks to be here

http://jensontaylor.blogspot.com/2012/03/blogger-changes-top-level-domain-names.html

"The reason my blog url is ending in .co.uk is because I access my blog from UK. If you access my blog from another country you should see the extension of that country, appended to my blog. So if you access my blog from Australia you should see my blog address as jensontaylor.blogspot.com.au"

"The reason behind the change in the top level domain names is to accommodate Google with the facility to censor blogs based on country of access."

More information here

https://support.google.com/blogger/answer/2402711?hl=en

So it looks like this is impacted by where you access the blog from.

CleverPhD

Here is the Google info on what the Geotargeting does

https://support.google.com/webmasters/answer/62399?hl=en

They would look at the extension, but also where you are hosted, location information on the site (eg your address) etc.

As far as who you target with the settings

"The tool handles geographic data, not language data. If you're targeting users in different locations—for example, if you have a site in French that you want users in France, Canada, and Mali to read—we don't recommend that you use this tool to set France as a geographic target. A good example of where it would be useful is for a restaurant website: if the restaurant is in Canada, it's probably not of interest to folks in France. But if your content is in French and is of interest to people in multiple countries/regions, it's probably better not to restrict it."

So, it depends on what users you want to target. If you truly want to be international, do not set it. I bet if your site is in english and your are hosted in the US and your physical address is in the US, Google will show you as a US site.

CleverPhD

One quick suggestion. Make sure in Google webmaster tools under site settings that when you verify the domain that you properly specify your location. I am betting that you are not based in the middle of the Indian Ocean!

Also there is a great answer here

http://moz.com/community/q/do-domain-extensions-such-as-com-or-net-affect-seo-value

CleverPhD

Well that is how to exclude them from an alert that they setup, but I think they are talking about anyone who would setup an alert that might find the PDFs.

One other idea I had, that I think may help. If you setup the PDFs as images vs text then it would be harder for Google to "read" the PDFs and therefore not catalog them properly for the alert, but then this would have the same net effect of not having the PDFs in the index at all.

Danielle, my other question would be - why do they give a crap about Google Alerts specifically. There has been all kinds of issues with the service and if someone is really interested in finding out info on the company, there are other ways to monitor a website than Google Alerts. I used to use services that simply monitor a page (say the news release page) and lets me know when it is updated, this was often faster than Google Alerts and I would find stuff on a page before others who did only use Google Alerts. I think they are being kind of myopic about the whole approach and that blocking for Google Alerts may not help them as much as they think. Way more people simply search on Google vs using Alerts.

CleverPhD

Robots.txt and exclude those files. Note that this takes them out of the web index in general so they will not show up in searches.

You need to ask your client why they are putting things on the web if they do not want them to be found. If they do not want them found, dont put them up on the web.

CleverPhD

The 410 is supposed to be more definitive

http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

404 is "not found" vs 410 is "gone

10.4.5 404 Not Found

The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.

10.4.11 410 Gone

The requested resource is no longer available at the server and no forwarding address is known. This condition is expected to be considered permanent. Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval. If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead. This response is cacheable unless indicated otherwise.

The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed. Such an event is common for limited-time, promotional services and for resources belonging to individuals no longer working at the server's site. It is not necessary to mark all permanently unavailable resources as "gone" or to keep the mark for any length of time -- that is left to the discretion of the server owner.

That said, I had a similar issue on a site with a couple thousand pages and went with the 410, not sure it really made things disappear any faster than the 404 (that I noticed).

I just found a post from John Mueller from Google

https://productforums.google.com/forum/#!topic/webmasters/qv49s4mTwNM/discussion

"In the meantime, we do treat 410s slightly differently than 404s. In particular, when we see a 404 HTTP result code, we'll want to confirm that before dropping the URL out of our search results. Using a 410 HTTP result code can help to speed that up. In practice, the time difference is just a matter of a few days, so it's not critical to return a 410 HTTP result code for URLs that are permanently removed from your website, returning a 404 is fine for that. "

So, use the 410 as a matter of a few days you may see a difference with 30k pages.

All of that said, are you sure with a site that big you would not need to 301 some of those pages. If you have a bunch of old news items or blog posts, would you not want to redirect them to the new URLs for those same assets? Seems like you should be able to recover some of them - at least your top traffic pages etc.

Cheers

CleverPhD

Appreciate the positive comment EGOL!

CleverPhD

Just some advice, I would not search for backdoor link on urban dictionary. Nuff said. Yosepr, read the link that SEO 5 Team posted and it explains it all.

Basically, you link to page on a site that then links to a page on the site you are "back door" linking to. It is an indirect way to link to the other site as you link to a page that links to them (and then vice versa).

CleverPhD

Thank you - please mark my response as Good Answer if it helps.

Cheers!

CleverPhD

You can do that, but it is less specific on what you are actually doing with your server. The 503 and retry after lets the spiders know exactly what you are doing (no confusion). Thank you for the clever remark below.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

CleverPhD

@CleverPhD

Posts made by CleverPhD

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved