Welcome to the Q&A Forum

matbennett

You're welcome. Let me know if they come back with any questions.

matbennett

I do this on a few domains and it works well and without a problem.

You should ensure that logged out users (and those who's status you cannot verify) are not directed and then only direct the logged in ones. That way any crawlers (and logged out users) that come along will not see the redirect at all and just read the url that they requested.

This is slightly different logic to. It sounds like you might be testing all users and redirecting them to one of two different URLs - neither of which is the one they requested. This would be undesirable and would cause the 302 that you describe.

matbennett

403s are tricky to diagnose because they, by their very nature, don't tell you much. They're sort of the server equivalent of just shouting "NO!".

You say Moz & Xenu are receiving the 403. I assume that it loads properly from a browser.

I'd start looking at the .htaccess . Any odd deny statements in there? It could be that an IP range or user agent is blocked. Some people like to block common crawlers (Not calling Roger names there). Check the robots.txt whilst you are there, although that shouldn't return a 403 really.

matbennett

Depending on how it is done, then yes. The specifics are a bit dependent on where the client is based and where the trademark of the brand is registered. However everywhere has some form of trademark protection and something a passing off law.

matbennett

It might be helpful to think more in terms of what a link actually offers rather than what format it takes. If there is a website that people use to find organisations like yours then it makes sense to be there whether that website is in directory format, blog format or any other format. If a website has no apparent value in itself move on.

The problems with directories really came about from two issues:

1. Directories that only existed in order to link to websites.
There are still thousands of sites with names like seolinksbymat.com that quite clearly are only ever used to get links. Why should being listed there demonstrate any authority/validation for a site?

2. People went a bit mad for it
People love shortcuts. When presented with the opportunity to either work hard to get valid links or just submit them to directories lots of people did the latter. Before long services appeared that submitted your site to thousands of them, which i turn caused more directories to be created. Big old mess and Google got bored of it.

Avoid those two things and actually there is nothing wrong with a link from a website that happens to be in directory format. If those links are just a small part of a bigger picture and come from sites that have inherent value then you shouldn't be scared of them.

The SEO industry loves to see things as black and white. It's easy, it's lazy and it is wrong. Everything is bad when it is done badly. Directories are very easy to do badly, but you should never be worried about getting a link from something like Yellow pages.

matbennett

Wow - there are some big questions there. I think that the best way to answer some of it will be to point you at some resources that might help out.

Before I do I will say that tools are definitely not a boring niche. Seriously, I have work projects where I would have given a limb for something as interesting as tools to work with. The thing is tools make stuff and fix stuff and that is interesting. They also have a techie appeal which I always think makes thing easier.

OK - specifics (and I'm going to be a pain and really just answer the first bit)

SEO

1. Best way to research and choose the keywords for PPC?

Firstly SEO and PPC are different things. It's worth remembering that as it is going to be difficult searching for answers if you get them mixed up. PPC is when you pay on a per click basis. SEO is generally seen as dealing with traffic from organic sources (ie google natural results).

Here is a link to the moz.com guide to keyword research: http://moz.com/beginners-guide-to-seo/keyword-research
Here is a link to one from Viperchill which I think is well worth a read too: http://www.viperchill.com/keyword-research/

In truth, if you are handling that campaign by yourself you are going to need to bone up quickly on a wide range of topics - not just the stuff you have identified, but the stuff you don't yet know that you need to know!

One of the best self-paced ways of doign this is probably distilledu http://www.distilled.net/u/ . It's an online course that will walk you through a lot of the basic stuff. Every topic it covers can be found elsewhere of course, however having it in one place from a trusted source and having progress tracked is really helpful.

Take a look

matbennett

If you ensure that those pages return a 404 (Not found) response then Google will remove them from it's index in time.

If you want to speed the process up then you can also do this from webmaster tools. There is a guide to removing URLs here: https://support.google.com/webmasters/answer/1663419

It has been said that using the URL removal tools on a lot (read hundreds) of individual URLs might be seen as negative. However if you just have a handful then that would be the way to do it.

matbennett

Guest blogging as an activity isn't frowned on. Guest building as a way to manipulate link authority (or in fact anything as a way to manipulate link authority) is. In that video Cutts says this in the first sentance "Guest blogging for links"

Like most things in SEO it is how you do it. If I write some quality pieces for well respected blogs and these include followed links to my sites (as do many other types of sites / pages of varying types) that is very unlikely to ever cause a problem. That is very different to having a strategy that involves large amounts of guest posts on sites of questionable value with a high proportion of guest posts and I have little else of more weight to back up my link profile.

I really think the "black hat / white hat" metaphor doesn't help people with SEO. In truth there is a LOT of grey - it's almost all just shades of grey. Most methods lose their shiny whiteness if you try to scale them up - particularly if you dumb them down at the same time. Most "frowned on" methods can still be used very effectively if you are selective and intelligent in your link building and keep focused on quality over quantity.

Don't believe me? Start another question here on moz.com and ask "If the Huffington post (#1 blog in the world according to Technorati) asked you to write an article about something you are passionate about and said you could have a follow link in it, would you use that link?" . I don't know many professional SEOs that would say no.

matbennett

The 404s in webmaster tools relate to crawl errors. As such they will only appear if internally linked. It also limits the report to the top 1000 pages with errors only.

matbennett

Weirdly enough, I've just been answering the same point in another question: http://moz.com/community/q/can-we-retrieve-all-404-pages-of-my-site

The link above has a few more options, but this bit is the most directly relevant to your question:

Analytics : As long as your error pages trigger the google analytics tracking code you can get the data from here as well. Most helpful when the page either triggers a custom variable, or uses a virtual url ( 404/requestedurl.html for instance). Isolate the pages and look at where the traffic came from.

matbennett

I wouldn't try to manually remove that number of URLs. Mass individual removals can cause their own problems.

If the pages are 404ing correctly, then they will be removed. However it is a slow process. For the number you are looking at it will mostly likely take months. Google has to recrawl all of the URLs before it even knows that they are returning a 404 status. It will then likely wait a while and do it again before removing then. That's a painful truth and there really is not anything much you can do about it.

It might (and this is very arguable) be worth ensuring that there is a crawl path to the 404 content. So maybe a link from a high authority page to a "recently removed content" list that contains links to a selection and keep replacing that list. This will help that content get recrawled more quickly, but it will also mean that you are linking to 404 pages which might send quality signal issues. Something to weigh up.

What would work more quickly is to mass remove in particular directories (if you are lucky enough that some of your content fits that pattern). If you have a lot of urls in mysite.com/olddirectory and there is definitely nothing you want to keep in that directory then you can lose big swathes of URLs in one hit - see here: https://support.google.com/webmasters/answer/1663427?hl=en

Unfortunately that is only good for directories, not wildcards. However it's very helpful when it is an option.

So, how to find those URLs? (Your original question!!).

Unfortunately there is no way to get them all back from google. Even if you did a search for site:www.mysite.com and saved all of the results it will not return the number of results that you are looking for.

I tend to do this by looking for patterns and removing those to find more patterns. I'll try to explain:

Search for site:www.yoursite.com
Scroll down the list until you start seeing a pattern. (eg mysite.com/olddynamicpage-111.php , mysite.com/olddynamicpage-112.php , mysite.com/olddynamicpage-185.php etc) .
Note that pattern (return later to check that they all return a 404 )
Now search again with that pattern removed, site:www.mysite.com -inurl:olddynamicpage
Return to step 2

Do this (a lot) and you start understanding the pattern that have been picked up. There are usually a few that account for large number of the incorrectly indexed URLs. In the recent problem I did they were almost all relating to "faceted search gone wrong".

Once you know the patterns you can check that the correct headers are being returned so that they start dropping out of the index. If any are directory patterns then you can remove than in big hits through GWMT.

It's painful. It's slow, but it does work.

matbennett

I think that the changes over the last 18/2 years months make things more interesting for agencies. There is just a bit more challenge in explaining the work to clients.

Technical SEO seems to have become more important again. It seems to now be pretty easy to hobble your own website through bad on-site practices and likewise to get an advantage through good ones (like schema). Now that we have (mostly) more useful webmaster tools at our disposal the emphasis seems to be on the site to either get this stuff right or suffer the consequence. That is definitely an area where the skills than an agency can offer can really bring value.

Link "building" might be out of favour, but links are as important as ever. That then means someone with the skills and knowledge to conceive the ways to do that, identify opportunities, conceive content, produce that content and then do the work needed to get that content noticed.

Wrap all that up with a strategic view that many companies don't have the digital skills to pull off and I think that there is more scope than ever for good agencies.

matbennett

OK - that is a bit of a different problem (and a rather familiar one). So the aim is to figure out what the 330 "phantom" pages are and then how to remove them?

Let me know if I have that right. If I have then I'll give you some tips based on me doing to same with a few million URLs recently. I'll check first though, as it might get long!

matbennett

As you say, on site crawlers such as Xenu & Screaming frog will only tell you when you are linking to 404 pages, not where people are linking to your 404 pages.

There are a few ways you can get to this data:

Your server logs : All 404 errors will be recorded on your server. If someone links to a non-existent page and that link is ever followed by a single user or a crawler like google-bot, that will be recorded in your server log files. You can access those directly (or pull 404s out of them on a regular, automatic basis). Alternatively most hosting comes with some form of log analysis built in (awstats being one of the most common). That will show you the 404 errors.

That isn't quite what you asked, as it doesn't mean that they have all been indexed, however that will be an exhaustive list that you can then check against.

Check that backlinks resolve : Download all of your backlinks (OSE, webmaster tools, ahreafs, majestic), look at the target and see what header is returned. We use a custom build tools called linkwatchman to do this on an automatic regular basis. However as an occasional check you can download in to excel and use the excellent SEO Tools for excel to do this for free. ( http://nielsbosma.se/projects/seotools/ <- best seo tool around)

Analytics : As long as your error pages trigger the google analytics tracking code you can get the data from here as well. Most helpful when the page either triggers a custom variable, or uses a virtual url ( 404/requestedurl.html for instance). Isolate the pages and look at where the traffic came from.

matbennett

that was the other that I was trying to think of! Thanks

matbennett

Linkrisk.com looks pretty good. I was pretty impressed with the new tools that cognitive seo have added for this as well.

I should say that I have not used either of the above "in anger". I still prefer to download the link data (WMT + Majestic + ahrefs ideally), stick it in to excel, add some insights using SEO Tools for Excel and go through manually.

However for a quick look I think linkrisk is good. However - it isn't cheap.

matbennett

Your A records are served up by DNS servers around the web. There is no conditional rules in the system that can say "for france resolve www.mydomain to ip#1 and for spain resolve the same to ip#2".

You can have es.mydomain pointing to 1 ip and fr.mydomain to another of course. I don't think that is what you mean though.

matbennett

If those pages are only accessible to logged in users then you could block them in robots.txt . If you nofollow the links pointing to them as well then it will stop the urls getting indexed by Google. I'd assume that moz would then honour the same and stop returning that error.

matbennett

As Chris says, this shouldn't be a problem at all. We've done this on a few stores and not seen any measurable impact in either direction from it.

The only thing to be aware of is the possibility of introducing canonical URLs. http://example.com and https://example.com are different URLs. If both are accessible and return a valid header then both can be indexed. Always worth ensuring that you either have a redirect, a rel=canonical or robots.txt addressing that issue if you have https in place.

matbennett

I can't decide about Yahoo. I've tried to spot patters in ranking change when we've added or expired a listing, but have never been able to see anything conclusive. It's also an absolutely textbook paid link (it's not like it sends traffic),.

You say it yourself; "Forget about Yahoo search, it pushes authority and trust to all SE's" - therefore we're doing it just to manipulate rankings. I can't believe Matt Cutts (or anyone else in search quality) has never asked the question "Should we allow Yahoo listings to pass authority - they're just paid links?". It does have manual review in it's favour, but otherwise it is hard to justify.

Might be interesting to run an experiment where someone tries to alter a site's position using only well known, manually reviewed directories.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

matbennett

@matbennett

Posts made by matbennett

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved