Sunsetting Drupal Sites

Drupal Version

Eventually, the time comes to say goodbye to a Drupal site, but doing so is not always as easy as it may seem.  Below are tips and tricks for shutting down a site gracefully without losing your sanity along the way.

Creating an Archive (The Easy Way)

If your stakeholder(s) are agreeable, the easiest way to shut down a site is to make an archival copy of the filesystem and database, and then take the site offline.  With OIT Web Hosting, you can make both backups easily from your Plesk Control Panel, and then copy the resulting archive files to an internal departmental server.  After that, simply put in a request to OIT to have the web hosting account deleted.

The Realistic Way

Reality doesn't always work out so easily, and all too often your stakeholder(s) will acknowledge that a site isn't going to be updated any longer, but they'll still want a readable copy of the site kept around somewhere.  While you could keep the Drupal site itself running indefinitely, there are major roadblocks to doing this:

  • Any Drupal site that isn't monitored regular is a vector for hacking and other security nightmares.  If the content isn't being updated, then it's unlikely that anyone in your unit will be checking the site even on a monthly basis to see if it's still intact or not.
  • Drupal has to be kept up-to-date, and you likely won't have the time to do that for sites where content is not being updated regularly.
  • Eventually Drupal has to be upgraded to the next major version.  If you don't have time for updates, you certainly won't have time to do major upgrades.

For all these reasons (and more), it's best to get Drupal out of the equation as soon as a site goes "dormant".  There are two parts to this: preparing the site for the afterlife, then exporting it to a static archive that can replace the site in its current web hosting account.

Preparing the Site for the Afterlife

Nothing automated will work with a static copy of a website, so you need to go through and look for everything automated and remove those pieces from the site.  This includes:

  • Removing the Search link from the menu bar
  • Removing all exposed filters from Views
  • Removing all visible 'Login' links

If you're thinking you can just do all that to the exported HTML files, think again: even a moderately large site will generate many dozens of HTML files (sometimes multiple files for the same page, when that page is referenced through different aliases).  This makes it near impossible to go through those pages by hand to remove stuff that doesn't work.  Save yourself the headache and remove all of the automated components before you do your static export.

Generating the Static Export

There are two methods currently recommend, depending on your skill set and abilities:

SiteSucker (Runs on MacOS)

SiteSucker has been reported to work well for exporting Drupal 9 sites into static HTML files.  The basic version is available for free (unclear how it is licensed), while the 'pro' version must be purchased.

wget (Runs on MacOS or Linux)

wget is a command line utility for retrieving web pages.  When given the right command parameters, it can act like a web crawler, recursively pulling down an entire website and writing it out to a collection of static HTML files on your local computer.  Here is the formulation that works well with OIT Web Hosting Plesk servers:

/usr/bin/wget -e robots=off -rdkE --restrict-file-names=windows <site-URL-here>

The 'windows' flag is needed due to OIT's Plesk servers not handling the question mark ('?') symbol in filenames.  By having wget output files with Windows compliant filenames, no question marks are used, making the files work under Plesk as well.

Note: MacOS does not come with 'wget', but you can either compile it from source, or install it via the Brew package manager.

Drupal-generated Static Export

Drupal websites can use Tome (not available for Drupal 7) to generate a static version of their Drupal website using their provided configuration instructions for existing websites.

Tome configuration

Configure your Drupal website's settings.php with the following setting to exclude cas-related pages:

$settings['tome_static_path_exclude'] = ['/cas', '/caslogin', '/casproxycallback', '/casservice'];