Just to follow up - I have now actually 410'd the pages and the 410's are still being re-indexed.
Posts made by Tom3_15
-
RE: 404's being re-indexed
-
RE: 404's being re-indexed
I'll check this one out as well, thanks! I used a header response extension which reveals the presence of x-botots headers called web developer.
-
RE: 404's being re-indexed
Thank you for the quick response,
The pages are truly removed, however, because there were so many of these types of pages that leaked into the index, I added a redirect to keep users on our site - no intentions of being "shady", I just didn't want hundreds of 404's getting clicked and causing a very high bounce rate.
For the x-robots header, could you offer some insight into why my directive isn't working? I believe it's a regex issue on the wp-content. I have tried to troubleshoot to no avail.
<filesmatch <strong="">"(wp-content)">
Header set X-Robots-Tag: "noindex, nofollow"</filesmatch>I appreciate the help!
-
RE: 404's being re-indexed
Thank you! I am in the process of doing so, however with a 410 I can not leave my JS redirect after the page loads, this creates some UX issues. Do you have any suggestions to remedy this?
Additionally, after the 410 the non x-robots noindex is now being stripped so it only resolves to a 410 with no noindex or redirect. I am still working on a noindex header, as the 410 is server-side, I assume this would be the only way, correct?
-
RE: 404's being re-indexed
Yes, all pages have a noindex. I have also tried to noindex them using htaccess, to add an extra layer of security, but it seems to be incorrect. I believe it is an issue with the regex. Attempting to match anything with wp-content.
<filesmatch "(wp-content)"="">Header set X-Robots-Tag: "noindex, nofollow"</filesmatch>
-
404's being re-indexed
Hi All,
We are experiencing issues with pages that have been 404'd being indexed. Originally, these were /wp-content/ index pages, that were included in Google's index. Once I realized this, I added in a directive into our htaccess to 404 all of these pages - as there were hundreds. I tried to let Google crawl and remove these pages naturally but after a few months I used the URL removal tool to remove them manually.
However, Google seems to be continually re/indexing these pages, even after they have been manually requested for removal in search console. Do you have suggestions? They all respond to 404's.
Thanks
-
International SEO And Duplicate Content Within The Same Language
Hello,
Currently, we have a .com English website serving an international clientele. As is the case we do not currently target any countries in Google Search Console. However, the UK is an important market for us and we are seeing very low traffic (almost entirely US). We would like to increase visibility in the UK, but currently for English speakers only. My question is this - would geo-targeting a subfolder have a positive impact on visibility/rankings or would it create a duplicate content issue if both pieces of content are in English? My plan was:
1. Create a geo-targeted subfolder (website.com/uk/) that copies our website (we currently cannot create new unique content)
2. Go into GSC and geo-target the folder to the UK
3. Add the following to the /uk/ page to try to negate duplicate issues. Additionally, I can add a rel=canonical tag if suggested, I just worry as an already international site this will create competition between pages
However, as we are currently only targeting a location and not the language at this very specific point, would adding a ccTLD be advised instead? The threat of duplicate content worries me less here as this is a topic Matt Cutts has addressed and said is not an issue.
I prefer the subfolder method as to ccTLD's, because it allows for more scalability, as in the future I would like to target other countries and languages.
Ultimately right now, the goal is to increase UK traffic. Outside of UK backlinks, would any of the above URL geo-targeting help drive traffic?
Thanks
-
If we should add a .eu or remain .com solely
Hello,
Our company is international and we are looking to gain more traffic specifically from Europe. While I am aware that translating content into local languages, targeting local keywords, and gaining more European links will improve rankings, I am curious if it is worthwhile to have a company.eu domain in addition to our company.com domain.
Assuming the website's content and domain will be exactly the same, with the TLD (.eu vs .com) being the only change - will this add us benefit or will it hurt us by creating duplicate content - even if we create a separate GSC property for it with localized targeting and hreflang tags? Also - if we have multiple languages on our .eu website, can different paths have differing hreflangs?
IE: company.eu/blog/german-content German hreflang and company.eu/blog/Italian-content Italian hreflang.
I should note - we do not currently have an hreflang attribute set on our website as content has always been correctly served to US-based English speaking users - we do have the United States targeted in Google Search Console though.
It would be ideal to target countries by subfolder rather if it is just as useful. Otherwise, we would essentially be maintaining two sites.
Thanks!
-
RE: Dynamically Inserting Noindex With Javascript
It seemed to work. Hopefully the noindex is respected, thank you!
-
RE: Dynamically Inserting Noindex With Javascript
It looks like it is active. Thanks, John! Can you no-index an entire directory in GSC? I thought it was only per URL.
-
Dynamically Inserting Noindex With Javascript
Hello,
I have a broken plugin creating hundreds of WP-Content directory pages being indexed by Google. I can not access the source code of these pages to add a noindex to them. The page URL's all have the plugin name within them. In order to resolve the issue, I wrote a solution with javascript to dynamically add in a noindex tag to any URL containing the plugin name. Would this noindex be respected by Google and is there a way to immediately check that it is respected?
Currently, I can not delete the plugin due to issues with it's php.
If you would like to view the code: https://codepen.io/trodrick/pen/Gwwaej?editors=0010
Thanks!
-
RE: Google is indexing bad URLS
I do agree, I may have to pass this off to someone with more backend experience than myself. In terms of plugins, are you aware of any that will allow you to add noindex tags to an entire folder?
Thanks!
-
RE: Google is indexing bad URLS
Thank you for all your help. I added in a directive to 410 the pages in my htaccess as so: Redirect 410 /revslider*/. However, it does not seem to work.
Currently, I am using Options All -Indexes to 404 the URLs. Although I still remain worried as even though Google would not revisit a 410, could it still initially index it? This seems to be the case with my 404 pages - Google is actively indexing the new 404 pages that the broken plugin is producing.
As I can not seem to locate the directory in Cpanel, adding a noindex to them has been tough. I will look for a plugin that can dynamically add it based on folder structure because the URLs are still actively being created.
The ongoing creation of the URL's is the ultimate source of the issue, I expected that deleting the plugin would have resolved it but that does not seem to be the case.
-
RE: Google is indexing bad URLS
Thank you for your response! I will certainly use the regex in my robots.txt and try to change my Htaccess directive to 410 the pages.
However, the issue is that a defunct plugin is randomly creating hundreds of these URL's without my knowledge, which I can not seem to access. As this is the case, I can't add a no-index tag to them.
This is why I manually de-indexed each page using the GSC removal tool and then blocked them in my robots.txt. My hope was that after doing so, Google would no longer be able to find the bad URL's.
Despite this, Google is still actively crawling & indexing new URL's following this path, even though they are blocked by my robots.txt (validated). I am unsure how these URL's even continue to be created as I deleted the plugin.
I had the idea to try to write a program with javascript that would take the status code and insert a no-index tag if the header returned a 404, but I don't believe this would even be recognized by Google, as it would be inserted dynamically. Ultimately, I would like to find a way to get the plugin to stop creating these URL's, this way I can simply manually de-index them again.
Thanks,
-
Google is indexing bad URLS
Hi All,
The site I am working on is built on Wordpress. The plugin Revolution Slider was downloaded. While no longer utilized, it still remained on the site for some time. This plugin began creating hundreds of URLs containing nothing but code on the page. I noticed these URLs were being indexed by Google. The URLs follow the structure: www.mysite.com/wp-content/uploads/revslider/templates/this-part-changes/
I have done the following to prevent these URLs from being created & indexed:
1. Added a directive in my Htaccess to 404 all of these URLs
2. Blocked /wp-content/uploads/revslider/ in my robots.txt
3. Manually de-inedex each URL using the GSC tool
4. Deleted the plugin
However, new URLs still appear in Google's index, despite being blocked by robots.txt and resolving to a 404. Can anyone suggest any next steps? I
Thanks!
-
Question Regarding Website Architecture
Hello All,
Our website currently has a general solutions subdirectory, which then links to each specific solution, following the path /solutions/ => /solutions/solution1/. As our solutions can be quite complex, we are adding another subdirectory to target individuals by profession. I would like to link from our profession pages to the varying solutions that help.
As both subdirectories will be top level pages in the main menu, would linking from our professions to **solutions **be poor architecture? In this case the path would look like: /professions/ => /professions/profession1/ => /solutions/solution1/.
Thanks!
-
RE: How to allow bots to crawl all but WP-content
Thank you for the help, Gaston!
-
RE: How to allow bots to crawl all but WP-content
Can I do so with:
Allow: *.jpg
Allow: *.png
-
RE: How to allow bots to crawl all but WP-content
Thanks, Gaston. I should have been more clear about what I am looking to do. I currently am having an indexation issue. Somehow, pages are being automatically generated by WordPress.
These pages are often .txt files of information or code from plugins, all beginning with /wp-content/uploads/ in their URL. I have been manually removing them from the index and would like to now have them be uncrawlable.
Best
-
RE: How to allow bots to crawl all but WP-content
Gaston,
Thanks for the fast reply! My images folder does follow that format, which is what makes me worrisome as we are blocking the wp-conent folder.
Thanks!