Thanks Keri,
Happy to help where I can.
Actually, I think a little taste of TAGFEE could lead to addiction!
Have a great weekend
Welcome to the Q&A Forum
Browse the forum for helpful insights and fresh discussions about all things SEO.
Thanks Keri,
Happy to help where I can.
Actually, I think a little taste of TAGFEE could lead to addiction!
Have a great weekend
Hi BaseKit,
What is the reason for there being so many 404's?
Did you move your site or rebuild your site structure? Do you have a lot of pages that are removed after a short time?
The answer to these questions will help to know what is the best approach for your situation.
Sha
Hi CP,
If you wish to use robots.txt to block crawlers, then your two robots.txt files should be as follows:
For your http protocol (http://vistastores.com/robots.txt
User-agent: *
Allow: /
For the https protocol (https://vistastores.com/robots.txt
User-agent: *
Disallow: /
Personally, I prefer to use the noindex meta tag for page blocking because it is a more reliable way of ensuring that the pages are not indexed.
(Never try to use both at once)
This link explains the difference between the two:
[Google Webmaster Tools Help.](http://www.google.com/support/webmasters/bin/answer.py?answer=35302 "Robots blocking crawlers")
Hope that helps,
Sha
```You can use a robots.txt file to request that search engines remove your site and prevent robots from crawling it in the future. (It's important to note that if a robot discovers your site by other means - for example, by following a link to your URL from another site - your content may still appear in our index and our search results. To entirely prevent a page from being added to the Google index even if other sites link to it, use a [noindex meta tag](http://www.google.com/support/webmasters/bin/answer.py?answer=61050).)
Hi,
The quickest, easiest and most accurate way is to crawl your site with the Screaming Frog SEO Spider tool.
This will give you the information you are looking for and a whole lot more besides and depending on the size of your site, you could have it in seconds.
If you have your site loaded as a campaign in Moz then you would be able to see the information current to the time of the last crawl.
Hope that helps,
Sha
Hi Don,
There are some things you can do to optimize your product feeds.
There is some specific information about optimizing feeds for Google in this Q&A thread Does anyone have any tips for optimizing your Google product feeds?
CPC Strategy are quite generous with information to help site owners who wish to manage their own feeds in their blog. There is some information about Bing feed optimization on their Blog in this Tip of the Day post.
Hope that helps,
Sha
Hi Jaycie,
Google's view of the issue is that you should have a robots.txt file in order to eliminate the risk of your web host dealing with requests in an unexpected way and returning something strange.
Matt Cutts talked about robots.txt in this Webmaster Help Video last month.
Hope that helps,
Sha
Hi Ade,
Since learning this one thing will be something that you are likely to use over and over, I figure it is much better if you see how it actually works. So, we wrote a little resource to show you how to do a basic 301 redirect as well as one that goes back one level to your category page.
If you take a look at this simple 301 Redirect course for managing 404 errors, you can see three working pages and also download the code.
Let me know if you have any questions.
Hope that helps,
Sha
Hi Elchanan,
Well Eyepaq is batting a thousand today!
Eyepaq is quite correct. The only way to "transfer" the bulk of the link equity is to redirect the domain which would inevitably result in a transfer of the manual action as well. In fact, it is worse than that. In recent times a number of domains have been dealt a manual action by association without a redirect even being in place. These manual actions have been applied because the Webspam team believes that the sites are related and are part of a larger scale manipulative effort, or indeed an effort to get out from under a penalty. Matt Cutts talked about this at SMX West, stating that people should not be able to "move down the road" to avoid a manual action.
If you genuinely needed to build a new site before the manual action and have not put your life savings and years of effort into building a brand, there could be a business case for starting again with a brand new domain, BUT remember you will be starting with nothing - even less than you will have if you successfully clean up the existing domain. This has to be a careful business decision and any new site would need to be completely unique and without any connection to the penalized site. Personally, starting over would be my last resort unless the site was fundamentally broken (and the domain name was a poor choice) to start with.
There are generally five broad reasons why a reconsideration request may fail:
Insufficient data - maybe links in the backlink profile have not been surfaced in the data gathering stage. Incomplete data is common and is best remedied by using as many data sources as possible and in some cases by pulling multiple samples over a period of days or weeks.
(Remember that the links returned by the Webspam team when a reconsideration request fails are "examples". They are intended to point you toward other links in the backlink profile which follow the same patterns or use the same unnatural linking tactics.)
Mistakes in Analysis - If links have been misclassified as natural and are kept, the reconsideration request will fail. Sometimes this happens because people rely solely on algorithmic analysis tools to determine which links to keep or remove and results are not 100% accurate. I would always argue that a real human should be the primary tool when doing analysis because I believe there is no room for mistakes in a job that your livelihood depends upon!
Sometimes human analysis can go wrong too - most often because people forget that this is about "unnatural" linking. That means links that were created rather than earned.
Another mistake that people make at this point is to try to just remove the worst of the unnatural links to preserve some of the benefits that were gained from unnatural linking. Omitting unnatural links from the cleanup effort because you think they are not so bad is a big mistake for two reasons:

Incomplete or ineffective Disavow submissions - As mentioned above, it is always best to disavow at the domain level to ensure that any links you are unaware of do not remain in play and sabotage your efforts. The only exceptions to this rule are unnatural links on high value domains you might reasonably love to have "natural" links from. In these extremely rare cases you should disavow the specific URLs to ensure that any natural links are preserved and any natural links you might earn from that domain in the future will be accorded their rightful value. Also - a red light went on for me when I saw "Google ignored some domains in the two disavow files we submitted". This causes me to wonder whether you have uploaded two completely separate files to the Disavow Links Tool? If so, then this could be the problem. The Disavow Links Tool submission is an overwrite, not an update. This means that you need to combine any existing disavow list with the new list before uploading. If you don't do this then you are effectively re-avowing all of the domains or links that were in the existing file.
If you need to update an existing disavow file with a new list, you can use this free tool to make it easy. Once you have created a free account you can upload your existing list, then upload any new list in the future to create an updated disavow file. When you upload a new list the tool will combine the data, remove duplicates and add date notations so that you can keep track of when domains were added. The tool also ensures that your new disavow file is within the 2Mb file size limit and generates it in the correct text format, ready for submission.
Insufficient effort in the cleanup - Sometimes this is actually just that there is insufficient evidence provided that the work has been done. Most common mistakes here:
Not making a case for reconsideration - Site owners need to demonstrate that they understand where they went wrong and will not repeat the same mistakes. In addition to this they need to convince the Webspam team that they have made a "good faith effort" to remove the links. Also, if there are links that are known to be natural, but may look suspicious, address them. Give a reasonable explanation as to why links have been retained (as long as there IS a reasonable explanation). You can use this checklist to make sure you have covered the most important things in your reconsideration request.
This Slide Deck provides an overview that might be helpful.
Any or all of these things can be playing a part in a failed reconsideration effort. It is not uncommon for it to take multiple attempts to have a penalty revoked, but the more of these potential problems we can eliminate by following best practice from start to finish, the more predictable the results.
Best of luck with resolving the manual action and getting things back on track.
Hope that helps,
Sha
Hi John,
Since your campaign is set up using the root domain, all subdomains will automatically be included.The only possible way that you may be able to remove the subdomain from the crawl would be to use robots.txt to specifically block mozbot from crawling it. However, if you do this, then it will not be possible to access the information for the subdomain separately.
So, what you are really asking is whether the subdomain can be split into a separate campaign without you having to start from scratch. I suspect the answer is no (certain things, like keywords may not be relevant anyway).
The only way to get a definitive answer on whether it can be done would be to email the SEOmoz help team direct - help [at] seomoz.org.
Hope that helps,
Sha
Hi Diane,
Ryan's response is spot on and his suggestions are excellent.
If you can provide the URL(s), then we can take a look and see exactly what is going on with the referring page(s).
If you don't want to share the information publicly in the Q&A, you can private message each of us through your SEOmoz profile page.
If you ever think that someone has access to edit pages on your site without your permission, the first thing to do is to check with your service provider whether there are any active ftp accounts that you are unaware of. I have seen situations before where people have managed to get a "back door" set up and then it is as simple as logging in and changing pages without your knowledge.
Given that this involves a legal dispute, if we can do a proper diagnosis and trace the source of the errors (or if there happens to be a back door in place), then you would be able to:
Hope that helps,
Sha
Hi Ade,
So sorry I wasn't around to follow this up for you. I have been away for the day and had wireless connection issues, so could not check Q&A until now.
Oops! Yes. Joomla does have its own error handling which does make a big difference, but it should be simple to fix once you understand what happens when you put the .htacces file in place.
When a request is received by the server, the .htaccess file is read from top to bottom, checking each rule in the file for a match. Once a match is found, the specific action assigned to that rule is executed. This means that no rules thereafter are read.
So, if you ensure that your code appears at the beginning of the .htaccess file, then whenever the conditions described by the rule are matched, the redirect will occur. However, if no other rule in the .htaccess is matched, then Joomla error handling will come into play should any other error be present.
This of course means that any specific rule you wish to add in the future should also appear before the Joomla code. As long as you always make sure it is last to be read, everything should work just as you intended.
Hope this helps,
Sha
hmmm...all good comments, but have you thought about the possibility that you may have a problem with keyword stuffing?
I count 58 instances of "Tampa" in the home page code, 172 instances of "DJ" and 18 instances of "Tampa DJ"
What is perhaps more important from the Search engines' point of view is that the reason I noticed the over use of these terms is that the text flow and usability has been quite severely impacted.The text doesnt flow naturally when you read it and by the time I got to the end of the footer menu I really didn't want to ever see the word Tampa again 8(
It is a very easy thing to get so caught up in finding ways to use your keyword terms that you lose track of how many times you have already done it. There are a couple of tools you could use to help sort this problem:
If you have a campaign running for the site in the SEOmoz Pro App, make sure you have your chosen keywords loaded and then use the On-page feature to identify and correct problems. (If you don't have paid Pro you can always check out the SEOmoz 30 Day Free Trial)
OOPS! Scratch that next bit - Ryan just pointed out that the highlight feature is now in my beloved MozBar...YES!! One less bar I need to keep installed 8). I keep the SEOBook Toolbar for Firefox installed so I can use its Use the keyword highlighting feature in the MozBar (Check Ryan's post below for how to find it). Basically, you type your chosen term into the toolbar, click the highlighting pen and UH OH!! now you can see that there might be a few too many instances of your term on the page... This is actually a tool I find really handy and easy to use - it's actually the only thing I wish was in the Mozbar (HINT, HINT Mozzers!!)
Hope this helps,
Sha
Hi Shradda,
I agree with Ryan that the use of a meta noindex tag is the preferable way to block the pages, but obviously there may be difficulties with applying the tag, depending upon how your pages are generated and whether you are able to alter the code or not.
You can also use ?option=com_fireboard etc to create 301 redirects back to a higher order category page or search.
You should be able to use a single line of code to 301 all pages within each directory.
Using 301 redirects will also send a signal to search engines to de-index those pages.
Very clever 404 page too! Had to watch him go all the way across the page and back just so I knew I wasn't missing anything! 
Sha
Wow! Thanks Ryan.
I'm sure it won't surprise you to know that I'm always reading eagerly when I see you respond to a question as well. 
Hi,
You are not the only one experiencing issues at the moment.
Best to report your problem direct to support [at] seomoz.org with details of what you are seeing. The Rankings system is in Beta at present so far as I know, so just need to let the Moz staff know ASAP.
Sha
Hi callassist,
You are correct in assuming that the SEOmoz crawler does not distinguish pages that are noindexed.
There is a simple explanation of the reason for this in a response to this thread by Daniel Deceuster.
As you will see there, a feature request is already in play for a feature to allow users to "turn off" the pages they wish to ignore in their reports (and this would be an example of where that will be very useful). If you follow the link in the previous thread, you will be able to keep updated on progress with the feature request.
Hope that helps,
Sha
Hey Ryan,
Just using Screaming Frog SEO Spider. It has proven very reliable and quickly identifies errors, server status for every page and much more.
The caveat is that I always check any Status code errors in the browser as there are quite often situations like this where the server is returning a Status error when the page renders fine in the browser.
You just have to be careful to ensure that if you want to scan the root domain you use the non-www URL as usual.
Hope it's useful,
Sha
Hi Ryan,
The major problem is that any experienced programmer can easily write their own script to scrape a site. So there could be thousands of "bad bots" out there that have not been seen before.
There are a few recurring themes that appear amongst suspicious User Agents that are easy to spot - generally anything that has a name including words like grabber, siphon, leach, downloader, extractor, stripper, sucker or any name with a bad connotation like reaper, vampire, widow etc. Some of these guys just can't help themselves!
The most important thing though is to properly identify the ones that are giving you a problem by checking server logs and tracing where they originate from using Roundtrip DNS and WhoIs Lookups.
Matt Cutts wrote a post a long time ago on how to verify googlebot and of course the method applies to other search engines as well. The doublecheck is to then use WhoIs to verify that the IP address you have falls within the IP range assigned to Google (or whichever search engine you are checking).
If you are experienced at reading server logs it becomes fairly easy to spot spikes in hits, bandwidth etc which will alert you to bots. Depending which server stats package you are using, some or all of the bots may already be highlighted for you. Some packages do a much better job than others. Some provide only a limited list.
If you have access to a programmer who is easy to get along with, the best way to get your head around this is to sit down with them for an hour and walk through the process.
Hope that helps,
Sha
PS - I'm starting to think you sleep less than I do! 
Just to clarify what you are doing here, a couple of questions:
These things will make a difference as to how you can approach the issue.
Sha