Yoast SEO is a very capable tool, it makes a lot of the SEO process very accessible to WordPress Administrators, but I discovered just the other day that it is a little lacking when it comes to blocking search engines from pages.
Take the following example:
You offer a product or service with a fee, and you want to manage links to user-manuals and technical support information, etc, via WordPress, but you do not want the outside world accessing those resources. You take care to not have any URLs on the site that link to the pages, they are not included in navigation items, and the URLs are obfuscated. To an end user without knowledge, these pages do not exist, and are inaccessible. Now you want to block Search Engines as well.
Now, granted, in an ideal situation, these resources would be behind a pay wall of some kind, but you also do not want to manage user authentication credentials, so your primary concern is keeping these pages off Search Engines. Yes, “bad” search engines will do whatever they want, but let’s only consider the good guys for now.
If you are using Yoast SEO, and you are using the XML Sitemap feature, you will find that these pages will be indexed fairly quickly, and your research will tell you to place the relevant Resource IDs (Post or Page) in the exclusion list to remove them from the XML Sitemap.
What you may not find is that excluding these pages does not actually indicate to the Search Engine that they should not be indexed, it just doesn’t send the Search Engine to them via the Sitemap.
No, to indicate
noindex you must go to the pages individually and set their
nofollow flags accordingly. You can do this by navigating to the Post/Page in question, and navigating to the Advanced tab of the Yoast parameters for that resource. To make matters slightly more confusing, there are three different selection methods for the various robots
Why am I posting about this? Well, this had us confused and scratching our heads for some time.
As I said, Yoast is great, but it is a bit circuitous that you remove resources from the XML sitemap in one location, and block in another. If the developers of Yoast are reading this: It would be much more intuitive to have all of the SEO tools specific to a resource on that resource it’s self, and perhaps a list of resources and their assigned attributes inside of the Yoast Plugin Settings it’s self, with links to the resources to facilitate editing of these settings. I guess, as a developer, I could contribute to the cause, but that is beside the point.
To summarize: If you want to block a resource from Search Engines while using Yoast SEO, be sure to set it in the Excluded Posts list, as well as set it’s
noindex properties (and any other properties you wish, such as
noarchive) on the resource it’s self.