Questions
-
Index.php canonical/dup issues
Have you checked the backlinks? The only logical reason I can think of for the index.php versions of the URL to outperform the friendly versions is more sites have linked to them. I would make every effort to convince the client to use friendly URLs. Users clearly prefer them and technologies change. Even if they are using .php today, in a couple years it may be a dead technology and they will have to redirect their entire site. It's not a logical business move. With the above noted, if you wish to perform the redirect of all pages except the home page to the index.php form of the URL, it is doable with the proper regex expression. The issues I foresee have already been shared: URLs are harder to read by users and are therefore less friendly URLs are longer so therefore more difficult to share naturally in tweets (for example) without a URL shortening service URLs include "php" so when the site's technology changes the URLs will need to be redirected Users may experience confusion related to the inconsistent URL formats of the home page and the rest of the site Long URLs are cut off. You mentioned using other languages. If a page's title involves foreign characters, those characters are converted in the URL to ?unicode. It is where you see characters like "%20" replace a single character. With foreign URLs the length can often exceed maximums which is an issue. Keeping index.php is an extra 9 characters added to every URL. This decision approaches the SEO equivalent of a patient going against their doctor's advice. If it was my client, I would want a very firm acknowledgment this decision was against my advice and industry best practices.
Intermediate & Advanced SEO | | RyanKent0 -
Title tag solution for a med sized site
How about having your developers script something, that scrapes all 18.000 h1, h2, h3 for each article and store them in a database. Finding dupes then would be a piece of cake, even for a less experienced developer You could easily export all your duplicates to csv and then manually rename them based on their content. Dev time: about 1 day max. (Developed a lot of software myself and IMHO a good developer should get this up and running within 4 hours) If you don't have toooooo many duplicate tags, correcting those in question shouldn't be taking too long aswell. If you have done your chores you could reimport your corrected title-tags to the database. Your developer could write a script in the meantime, that sets the title-tag of a page according to the title-tag you stated in your database. Hope that helped If you have further questions on this, just go ahead. Had a similar problem with 25k+ pages for a major health insurance and we figured out, that the best way to prevent problems was to do most of the work manually than with a script. Helped us a lot to stay within the budget and given timeframe.
Intermediate & Advanced SEO | | akaigotchi0