Block your test site from searchbots

Letting searchbots crawl through your site is how you can get your content to show up in search engine results.  However, there are times when doing so could have an adverse effect on your results.  If your Dev or Staging site has a lot of test content on it, then the searchbots are going to be lumping that in with your important content (and you probably don't want visitors to see your pages of Lorem Ipsum text.)  If you've synced your database from the production site to your staging site, then the searchbot might flag the content as being duplicate, which can decrease your SEO score.

The best thing to do with your Dev and Staging sites is to just block them from the searchbot altogether.  It only takes a few lines in your .htaccess file, and a small robots.txt file.

Custom robots.txt file

In reality, this can be named almost anything, but we are going to go with robots_noindex.txt (we can't use robots.txt, as that is the main file, which should allow crawling.)  Create this file in your document root (i.e. the same directory where Drupal places its default robots.txt file) and add the following lines to it:

User-agent: *

# Directories
Disallow: /

Basically, this is blocking everything on your site from searchbots (which is the goal, here.)

Modify the .htaccess file

We will use a redirect in the .htaccess file to load the robots_noindex.txt file instead of the default robots.txt file when we are on the dev or staging sites.  Open your .htaccess file (found in the Drupal root directory) and add the following lines within the rewrite section:

 # We don't want search bots indexing the dev and staging sites, so we'll
 # use a different robots.txt file (robots_noindex.txt) which will block
 # them from those domains.  The RewriteCond uses the Acquia environment
 # variable 'AH_SITE_ENVIRONMENT' which returns the environment of the domain.
 # If it's not 'prod' then we are on dev or staging, so we serve our
 # alternate robots.txt file.
 RewriteCond %{HTTP_HOST} ^(dev|staging)\.(.+)$ [NC]
 RewriteRule ^robots\.txt$ robots_noindex.txt [L]

What this is doing is checking if the domain has dev or staging in it.  If so, it will rewrite robots.txt as robots_noindex.txt, thereby serving the custom file in those environments.  Note that if you are using something besides dev or staging for your subdomain, you will need to edit the code accordingly.