Welcome to the Q&A Forum

Travis_Bailey

I got distracted from this thread. I see the iframe pages. I have a hunch, but I'm not ready to render an opinion.

It's hilarious that they actually styled one of the tables as 'linkfarm'. SMH

Travis_Bailey

I've learned one thing especially and that is: Don't try to learn Italian from a tire website. XD

I didn't find an instance of an iframe. There is a reference to iframe in the CSS, but no style is in place for an iframe. Though they do use a lot of JQuery, however.

Fun thing I learned today: noscript can be crawled and rendered. Just check the cache. The only thing that's actually cached in the corpo are the contents of the noscript tag. Weird, but apparently possible.

But if there's one thing I do know, at least at this moment, it's that a lot of vendita gomme aren't held to the highest standard. Also, this site's conversion rate will continue sucking eggs - as long as they require someone to create an account to purchase.

Otherwise the site just loads fast as hell, even in the US, and it's keyword stuffed to the Nth from the src up. In sum, I need to learn Italian and sell tires.

Travis_Bailey

Would you be comfortable with PMing the competitor URL via Moz? I'm not interested in taking a client. I'm interested in what's happening. Moz is my witness.

Have you found evidence/considered the possibility that they're redirecting domains to the target domain? It's basically like running on quicksand, but it can be successful for a while. Just like any light switch tactic.

Travis_Bailey

Not a problem. Though one thing did kind of strike me as odd, and that's the white #fff maps link. Then again, I don't really know what color the background(s) will be. If the backgrounds are white, I would err toward the side of caution and use a contrasting style. Just sayin'.

Besides, there's markup for map links which you can incorporate. More on that here. You can play around with the markup until you have something that satisfies your needs and validates, via the structured data testing tool.

A note on the testing tool:

It's so user friendly, it's not readily apparent that you can just click on the first window and paste your src.

Once you get comfortable with that, there are a lot of other ways you can use the markup.

Travis_Bailey

It's a good idea to be as specific as possible. A lot of people tend to over-think the whole deal. Every major search engine, of consequence in the Western hemisphere, has endorsed this form of structured data. If Schema provides a particular type, by all means use it.

The first snippet definitely checks. Your competitors aren't likely using it. The search engines have admitted they're a little slow on the uptake, and you put in the time. You have some pretty good contact info/NAP boiler plate right there.

All the thumbs. First snippet.

Travis_Bailey

The 302 response code isn't an error, per se. It's a redirect code that generally means 'found'. It doesn't pass 'link juice'.

Some of the common uses I've seen include using 302 codes for external links, when they can simply be set to nofollow. The nofollow link attribute is handy when you feel it's necessary to link to an external source, but don't want any of the baggage that comes with passing PR.

In the worst case, someone moved pages 'temporarily' when the pages were actually moved permanently. Which may mean pages with incoming links at the old URL aren't passing 'juice' or 'PR' like they should to the new URL.

Here's the Moz Guide to Redirect Codes for further reading.

I don't know anything about your site, so there may be a good reason for the 302 responses. But the good reasons are quite limited.

Travis_Bailey

Hi Nathan,

I second Richard's statement. It's a very good idea to build out individual landing pages with contact information. It's even better to include Schema markup, at least for the NAP. And lucky you! There's markup that relates directly to a furniture store as a local business.

To get started with the markup, you'll want to look at the code examples at the bottom of the local business page. That should give you a solid idea how you should structure the markup. Then you can see if your markup checks prior to pushing live via the Google Structured Data Testing Tool.

Here's another reference via Moz in regard to on-site local considerations.

Travis_Bailey

You're welcome, Guillermo. There are already a ton of things with white label, so I'm not really sure if it's something you'll want to support forever. I mean, you could be doing things that make money - rather than incurring maintenance costs with in-house solutions.

Though it would be a fun little exercise one could do for grins.

And as it would so happen, Moz Analytics does offer localization. I guess I missed that one... odd.

Travis_Bailey

If you need something smaller, cheaper and white label with localization - you can take a look at AWR Cloud or Authority Labs. Both provide API access at some price point, both of which are a lot lower than SERPs.com. While I haven't used an Authority Labs offering, beyond a trial, I have used AWR Cloud extensively years ago. It does what it says on the tin.

YMMV, and Moz Analytics has localization on the roadmap - at least last I knew. I've pretty much given up on rank tracking as it can lead one 'down the garden path'. But that's enough preaching out of me. ; ) Best of luck.

Travis_Bailey

As Ryan touched on, it's always been a usability/accessibility concern. Since sites can get a little boost from other UI/UX concerns, it would stand to reason that alt text is still something to consider. As far as I know, it's never been the holy grail - but a nod to usability and/or accessibility can't hurt.

Alt text should be somewhat terse, though descriptive of the content. Otherwise, how can one justify the bandwidth alone?

Travis_Bailey

Hi Evelien,

Does organic traffic appear to be attributable to any particular country or countries? This may sound strange, but I wonder if a competitor pulled out of the market. It appears just about every competitor you have in google.be got a pretty nice organic increase around that time, which has continued. Kruidvat seems to have the lion's share now.

Last I knew, Luxembourg was something of a tax shelter. With the recent changes in VAT, I wonder if a significant competitor or competitors found it difficult to continue operations. But that's just a 'shot in the dark' on my part.

Travis_Bailey

I would generally dispense with the concern over metrics, considering the source. It sounds like a great citation source, regardless. Plus it may do what links were intended to do in the first place: Drive Traffic

OSE, aHrefs, Majestic and the like are just keyhole views into what's really going on. Albeit important keyhole views, but still limited insights into the big picture.

I would challenge that if one focuses less on granular metrics, and puts more attention into traffic and general relevancy; one would be happier with the results and have more time for generating similar results.

Travis_Bailey

Good to hear you may be getting closer to the root of the problem. Apologies that it took so long to get back to you here. I had 'things'.

I followed the steps and you should be able to determine the outcome. Spoiler Alert: No block, this time.

It's a whole other can of worms, but should you need more human testing on the cheap; you may find Mechanical Turk attractive. One could probably get a couple hundred participants for under a couple hundred dollars, with a task comparable to the one above.

Just a thought...

Travis_Bailey

In your position, I would want to know more about what I'm getting into as well. Before I have a contract, I would like to know what they've been doing over the last three years. There's a lot of time there where, potential, previous actions could help or hinder your efforts.

Did they disavow?
What did they (or a contractor) disavow, if anything?
If they 'performed a disavow', where is the file? (There's a possibility it wasn't properly formatted, or it may not have been submitted.)
Have they sent out link removal requests?
If so, what were the results?
Did they continue building low quality links after the fact? (History is a factor.)
If so, for how long?
Have they tried a reconsideration request after a, what you would deem sufficient, disavow/removal effort? (Though it may walk and quack like an algo/filter penalty, it could be manual.)

The above would be a few of my primary concerns before I started looking at anchor text ratios. If you've already covered those bases, good on you. Just let it be known, to everyone's general disinterest, that I said as much.

You may find that a lot of the heavy lifting is already done, but the execution was flawed at some critical point. Which may free resources toward building a better internet and generally making your client giddy. Easy peasy, right?

I agree with Ryan's second paragraph. Definitely under-promise and attempt to over-deliver. I haven't seen many sites that didn't have at least a chance at recovery, if money were no object. However, there are sites where it would be wise to start over from an economic perspective. (Time/Opportunity Cost+Actual Money)

It's that nearly three year long penalty that would give me pause, prior to jumping in. Again with the ratios, if there's been a disavow and you don't have the file; you're not looking at anything remotely accurate - until you go through the same process. Still, no one ever has the entire picture. It's various shades of confidence in what you can gather about the situation.

There. I made it two paragraphs without emoting. I can go play video games now.

Travis_Bailey

I can't really argue with log files, in most instances. Unfortunately, I didn't export crawl data. I used to irrationally horde that stuff, until I woke up one day and realized one of my drives was crammed full of spreadsheets I will never use again.

There may be some 'crawlability' issues, beyond the aggressive blocking practices. Though I managed to crawl 400+ URI before timeouts, after I throttled the crawl rate back the next day. Screaming Frog is very impressive, but Googlebot it ain't, even though it performs roughly the same function. Though, given enough RAM, it won't balk at magnitudes greater than the 400 or so URIs. (I've seen... things... ) And with default settings, Screaming Frog can easily handle tens of thousands of URI before it hits it's default RAM allocation limit.

It's more than likely worth your while to purchase an annual license at ~$150. That way, you get all the bells and whistles - though there is a stripped-down free version. There are other crawlers out there, but this one is the bee's knees. Plus you can run all kinds of theoretical crawl scenarios.

But moving along to the actual blocking, barring the crawler, I could foresee a number of legit use scenarios that would be comparable to my previous sessions. Planning night out > Pal sends link to site via whatever > Distracted by IM > Lose session in a sea of tabs > Search Google > Find Site > Phone call > Not Again... > Remember domain name > Blocked

Anyway, I just wanted to be sure that my IP isn't white listed, just unblocked. I could mess around all night trying to replicate it, without the crawling, just to find I 'could do no wrong'. XD

Otherwise it looks like this thread has become a contention of heuristics. I'm not trying to gang up on you here, but I would err on the side of plenty. Apt competition is difficult to overcome in obscurity. : )

Travis_Bailey

I'll PM my public IP through Moz. I don't really have any issue with that. Oddly enough, I'm still blocked though.

I thought an okay, though slightly annoying, middle ground would be to give me a chance to prove that I'm not a bot. It seems cases like mine may be few and far between, but it happened.

It turns out that our lovely friends at The Googles just released a new version of reCAPTCHA. It's a one-click-prove-you're-not-a-bot-buddy-okay-i-will-friend-who-you-calling-friend-buddy bot check. (One click - and a user can prove they aren't a bot - without super annoying squiggle interpretation and entry.)

I don't speak fluent developer, but there are PHP code snippets hosted on this GitHub repo. From the the documentation, it looks like you can fire the widget when you need to. So if it works like I think it could work, you can have a little breathing room to figure out the possible session problem.

I've also rethought the whole carpenter/mason career path. After much searches on the Yahoos, I think they may require me to go outside. That just isn't going to work.

Travis_Bailey

Rest assured, that I don't scrape/hammer so hard that it would knock your site down for a period. I often throttle it back to 1 thread and two URI per second. If I forget to configure it, the default is 5 threads at two URI per second. So yeah, maybe a bit of the moz effect.

Chrome Incognito Settings:

Just the typical/vanilla/default incognito settings. It should accept cookies, but they generally wouldn't persist after the session ends.

I didn't receive a message regarding cookies prior to the block notification.

On a side note, I don't allow plugins/extensions while using incognito.

Fun w/ Screaming Frog:

It's hard to say if the 8.5 hour later instance was my instance of Screaming Frog. The IP address would probably tell you the traffic came out of San Antonio, if it was mine. I didn't record the IP at the time, but I remember that much about it. Otherwise it's back in the pool.

Normally Screaming Frog would display notifications, but in this instance the connection just timed out for requested URLs. It didn't appear to be a connectivity issue on my end, so... yeah...

Fun w/ Scraping and/or Spoofing:

Screaming Frog will crawl CSS and JS links in source code. I found it a little odd that it didn't.

I also ran the domain through the Google Page Speed tool for giggles, since it would be traffic from Googlebot. It failed to fetch the resources necessary to run the test. Though cached versions of pages seemed to render fine, with the exception of broken images in some cases. Though I think that may have something to do with the lazy load script in indexinit.js, but I didn't do much more than read the code comments there.

In regard to the settings for the crawler, I had it set to allow cookies. The user agent was googlebot, but it wouldn't have came from the typical IPs. Basically just trying to get around the user agent and cookie problem with an IP that hadn't been blocked. You know, quick - dirty - and likely stupid.

Fun w/ Meta Robots Directives:

A few of the pages that had noindex directives appeared to lack genuine content, in line with the purpose of the site. So I left that avenue alone and figured it was intentional. The noarchive directive should prevent a cache link. I was just wondering if one or more somehow made into the mix, for added zest. Apparently not.

While I'm running off in an almost totally unrelated direction, I thought this was interesting. Apparently Bingbot can be cheeky at times.

Fun w/ The OP:

It looks like Ryan had your answer, and now you have an entirely new potential problem which is interesting. I think I'm just going to take up masonry and carpentry. Feel free to come along if you're interested.

Travis_Bailey

No worries, I'm not frustrated at all.

I usually take my first couple passes at a site in Chrome Incognito. I had sent a request via Screaming Frog. I didn't spoof the user agent, or set it to allow cookies. So that may have been 'suspicious' enough from one IP in a short amount of time. You can easily find the screaming frog user agent in your logs.

Every once in a while I'll manage to be incorrect about something I should have known. The robots.txt file isn't necessarily improperly configured. It's just not how I would have handled it. The googlebot, at least, would ignore the directive since there isn't any path specified. A bad bot doesn't necessarily obey robots.txt directives, so I would only disallow all user agents from the few files and directories I don't want crawled by legit bots. I would then block any bad bots at the server level.

But for some reason I had it in my head that robots.txt worked something like a filter, where the scary wildcard and slash trump previous instructions. So, I was wrong about that - and now I finally deserve my ice cream. How I went this long without knowing otherwise is beyond me. At least a couple productive things came out of it... which is why I'm here.

So while I'm totally screwing up, I figured I would ask when the page was first published/submitted to search engines. So, when did that happen?

Since I'm glutton for punishment, I also grabbed another IP and proceeded to spoof googlebot. Even though my crawler managed to scrape meta data from 60+ pages before the IP was blocked, it never managed to crawl the CSS or JavaScript. That's a little odd to me.

I also noticed some noindex meta tags, which isn't terrible, but could a noarchive directive have made it into the head of one or more pages? Just thought about that after the fact. Anyway, I think it's time to go back to sleep.

Travis_Bailey

For starters, the robots.txt file is blocking all search engine bots. Secondly, I was just taking a look at the live site and I received a message that stated something like; "This IP has been blocked for today due to activity similar to bots." I had only visited two or three pages and the cached home page.

Suffice to say, you need to remove the User-agent: * Disallow: / directive from robots.txt and find a better way to handle potentially malicious bots. Otherwise, you're going to have a bad time.

My guess is the robots.txt file was pushed from dev to production and no one edited it. As for the IP blocking script, I'm Paul and that's between y'all. But either fix or remove it. You also don't necessarily want blank/useless robots.txt directives either. Only block those files and directories you need to block.

Best of luck.

Here's your current robots.txt entries:

User-agent: googlebot
Disallow:

User-agent: bingbot
Disallow:

User-agent: rogerbot
Disallow:

User-agent: sitelock
Disallow:

User-agent: Yahoo!
Disallow:

User-agent: msnbot
Disallow:

User-agent: Facebook
Disallow:

User-agent: hubspot
Disallow:

User-agent: metatagrobot
Disallow:

User-agent: *
Disallow: /

Travis_Bailey

Google+ is a social medium. The largest recorded changes came from Google+ at that time. A few months later, Matt Cutts announced that it may penalize Google+ for passing PR and gathering too much PR. So the most impressive parts of the KISS Metrics study were correct.

I hate to be a 'Cuttlet' or speak in terms of Page Rank, but there we have it. A social media platform could influence Page Rank at one time. It still may do so to a limited degree.

There's no doubt that the affect of a social post can bring about the effect of organic rankings, via links outside of typical social media.

People just have to care, to some degree - for some reason.

Knowing this is why I can afford the fine Chunky Soup for lunch. XD

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Travis_Bailey

@Travis_Bailey

Posts made by Travis_Bailey

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved