Importing News and Events via Feeds Extensible Parsers

Category
Drupal Version
Tags

Feeds Extensible Parsers is the module normally used to do XML and JSON imports of news and event data from the Mercury server.  (This module supersedes an older module called "Feed XPath Parser", which is no longer supported and should not be used.)

Editor's Note:  While this method is being utilized by some units on campus, it can not be recommended due to the Feeds Extensible Parsers module still being in a beta release since April 2015, and not being covered by the Drupal organization's security advisory policy.  We strongly recommend that you try to utilize the built-in functionality of the Mercury Reader Module whenever possible, and only use Feeds Extensible Parsers as a last resort.

Setting Up Feeds Extensible Parsers

  1. Install CTools and Job Scheduler (dependencies of "Feeds")
  2. Install the Feeds module
  3. Install the Feeds Extensible Parsers module.
  4. Go to the Feed importers on the Structure administrative menu to create a new importer that you will use to map Mercury content to your local news/event content types.

But, before you get started with building importers you’ll want to first set up your content types with fields that match the basic structure of Mercury news/events, such as adding fields for a summary, summary sentence, related links, boilerplate, etc. For fields that can have multiple values be sure to also configure your local node’s field to do the same (such as with related links, or keywords.)

For images associated with a Mercury item you can either map/store the node ID of the image (which would then require using hg_reader functions for building the display of the images,) or download the image directly to your site 

Feeds Settings

Basic Settings

If you prefer to create a content type to use as an importer, you can assign one under “Attach to content type,” or you can choose to “Use standalone form.” The standalone form usually works for most purposes. All standalone importers can be found at yoursite.com/import

Periodic import - If you want imports to run automatically then choose an interval. For most cases it’s best to have it automatically import only twice a day, and then manually override it if you want something to show up sooner.

All other setting can be left as is.

Fetcher

  • Fetcher - Select HTTP Fetcher.
  • HTTP Fetcher Settings - Leave as is (don’t check either option)
  • Parser - Select XML XPath parser

Processor

  • Processor - Skip down to Processor (we’ll return to XPath XML processor settings after mapping) and select the Node processor option.
  • Node Processor Settings - If you plan to add additional fields to your news/event nodes that are not populated from Mercury you’ll want to set it to update existing nodes. If you plan to only include Mercury content in your local nodes you can opt for Replace existing nodes, but most users stick with the Update existing option.

    For Text format it’s best to select an input format that allows most block-level HTML tags, or at least the same options that you see in the body field of Mercury items. Select your content type that you want to map Mercury content to, the default author of your local nodes (typically a site administrator), and an Expire setting if desired (usually this is left at Never).

Mapping

  • Node Processor Mapping - Now you’ll want to add the mapping for the fields of your local news/event content type that you plan to populate with Mercury content. For each field select the “xpathparser” option, then a field from the target drop-down, and select the Add button. Note that you’ll need to select a field to serve as a unique target. For this choose the “GUID” option for the target, and you’ll later map this to the item’s Node ID in Mercury.

XML Parser Settings

Now that you’ve identified the fields to be mapped you need to provide an XPath value for each field.

Enter “node” for the Context field.

For each field that you’ve added you’ll want to provide the appropriate value based on the XML tags in Mercury’s feeds. So for instance with your title field you’d enter “title”, for the body field “body,” etc.

For the “field_image” above note that it’s set to use the “image_full_path” value. This will return the full path to the image file on Mercury, and since it’s mapping to an image field in the local content type it will pull a copy of the image down to your server and store it with your local node. However, you can not map image titles or descriptions from the Mercury version to the additional fields available with an image field type (such as the alt text, description, etc.) So to display the full details of a related image it would be best to instead map the image’s “nid” value in Mercury and use hg_reader functions to build the display of the image in your local content type where you could add a title and description as intended by the original contributor. If you went this route you would need to provide the XPath value as: hg_media/item/nid

The same could be done if you were to set up a field in your local node to store video files uploaded to Mercury. Note that these days most users are uploading YouTube IDs instead of the actual video file, but you would still want to build the display of the YouTube video via its Mercury “nid” value so that you could display the title and description as intended by the original contributor.

For related files that are uploaded with a Mercury item you can map them as actual files and download them to your server (use the “full_path” value), but you will lose the option of including the title provided for the file.

You may instead choose to set up related files as a “link” field type in your local node so that you can include the provided title of the file. The Feeds module will allow mapping for both the title and link values of link fields (link fields are provided by the link module: drupal.org/project/link)

Alternatively, you could set up separate media content types in your local site and map all media from your news and event feeds, but it’s preferred that you leverage the hg_reader functions as much as possible to display Mercury content.

Note the “@id” entered for the “guid” field above. This is referring to the node ID that’s published in the XML feed as an attribute of the node element in the XML tree (i.e., ). This is stored as a unique value for your local news/event node and will help prevent duplicates being imported to your site (check out W3schools.com for more on the Xpath syntax: http://www.w3schools.com/xpath/)

However, note that it’s only unique to the importer, so if you have two importers set up for two different Mercury feeds and an item is added to both of those feeds it will still be duplicated on your site.

You may want to also map the “changed date” (Xpath value = changed) to your local nodes to help Feeds identify when an item has been updated in Mercury.

Once you’ve got your importer set up you can go to /import on your site to view all importers created, and initiate a pull from Mercury (as well as delete items pulled in from Mercury.)

Fields

Each one-to-one mapping within XPath is composed of a few common types:

  • String literals are marked by "Text". Best used for static values for a field.
  • @attribute selects attributes on a tag (such as href, class, id).
  • value/ selects a tag.

Pictures / Images

Full Import

Note: If the website in question is using OIT Web Hosting, you may have to contact OIT Web Hosting support to enable curl_init()

To retrieve photos (with metadata) imported, follow the directions below:

  1. Install the Mercury Reader (hg_reader) module.
  2. If you open up hg_reader.api.php, you can read all about the Mercury API. The function in question is this:
function hg_reader_get_file($type, $id, $option = 'original')
/**
 * This function fetches files (including images) from Mercury.
 *
 * @param string $type
 *   Either "image" or "file"
 * @param int $id
 *   Either a Mercury image node ID or a Mercury file ID (note that this corresponds to node/files/
 *   item/fid within a node's XML.
 * @param string $option
 *   For images, the name of the Mercury ImageCache preset desired.  For files, this option will be automatically set to "other."
 */

So in your own theme template file, if you insert:

print hg_reader_get_file('image', 232661, '200xX_scale');

You'll get a 200-pixel wide image of the chair of the physics department. Furthermore, hg_reader will cache the image as befits the friendly, fluffy little creature it is.

XPath Import

Another option is to simply import the image path URL using XPATH and the XML.

  • concat("http://hg.gatech.edu/",hg_media/item/image_path)

Please note that this implementation may have firewall issues when viewing images from off-campus.

For more information