PHP

Exporting Drupal Content to Microsoft Word

I was recently given an interesting task at work. I was asked to export all blog posts for a given author from a Drupal site into a Microsoft Word document. At first, I wasn't sure how I was going to accomplish this, so I turned to Google and found a few PHP classes that purported to do exactly what I needed. However, a few false starts later, I was unable to get any of them to work. That's when I came across LiveDocX. LiveDocX is a template-based SaaS solution that allows developers to create documents from data across disparate data sources.
It allows developers to create word processing documents by combining user-defined Microsoft Word templates with data from disparate data sources, such as XML files and databases. It is typically used to create professional, print-ready word processing documents in DOCX, DOC, RTF and PDF. LiveDocx is a Web Service that can be easily integrated into any web application without installing or configuring any software on your server. Currently, the following programming languages are supported: * ASP.NET * PHP As LiveDocx is strictly based on open standards, it is simple to add support for more programming languages. As long as SOAP (Simple Object Access Protocol) is available on the client-side system, LiveDocx runs on all operating systems and in all programming languages.
This looked to be the best solution for what I was attempting to do, and best of all, it was free. All I had to do was sign up for an account, and then I was free to begin coding my solution. I knew that I wanted the solution to be dynamic; I didn't want to hard-code the author into my code. Instead, I wanted to be able to export any author's blog posts. So, step 1 was to create a form that would allow site administrators to find the author they are looking for. The form consists of a textfield with autocomplete functionality, and a select box with export options. For the purpose of this example, the only option is to export to MS Word (doc). However, LiveDocX also supports docx, rtf, and pdf.
<?php

function MYMODULE_blog_export_form() {
  
$form = array();
  
$form['export'] = array(
    
'#type' => 'fieldset',
    
'#title' => t('Blog Export Options'),
    
'#collapsed' => false,
    
'#collapsible' => false,
  );
  
$form['export']['method'] = array(
    
'#type' => 'select',
    
'#title' => t('Export output type'),
    
'#options' => array(
      
'msword' => t('Microsoft Word'),
    ),
  );
  
$form['export']['author'] = array(
    
'#type' => 'textfield',
    
'#title' => t('Author name'),
    
'#autocomplete_path' => 'admin/autocomplete/bloggers',
    
'#description' => t('Enter the name of the user.'),
  );
  
$form['submit'] = array(
    
'#type' => 'submit',
    
'#value' => t('Submit'),
  );
  return 
$form;
}

?>
You'll notice that the textfield has an #autocomplete property, which is the path that executes the autocomplete function. This function is defined like this:
<?php

/**
 * Menu callback function to provide autocomplete functionality for
 * searching for users by username
 *
 * @param string $search_string
 */
function MYMODULE_autocomplete_bloggers($search_string) {

  static 
$blogger_roles = array();
  
$result db_query("SELECT r.rid FROM {role} r WHERE
    r.name = '%s'"
'blogger');
  while(
$role db_fetch_object($result)) {
    
$blogger_roles[] = $role->rid;
  }
 
  
$matches = array();
  
$result db_query("SELECT u.uid, u.name FROM {users} u
    LEFT JOIN {users_roles} ur ON ur.uid = u.uid
    LEFT JOIN {role} r ON r.rid = ur.rid
    WHERE u.name LIKE '%s%%' AND
    r.rid IN ("
.join(','$blogger_roles).")
    LIMIT 50"
$search_string);
  while (
$row db_fetch_object($result)) {
    
$matches[$row->name] = $row->name;
  }
  print 
drupal_to_js($matches);
  exit();
}
?>
Now that the autocomplete functionality was hooked up, it was time to define the form's submit handler. The submit handler makes use of Drupal's Batch API to define a batch process and execute it.
<?php

function MYMODULE_blog_export_form_submit($form$form_state) {
  
$uid db_result(db_query("SELECT uid FROM {users} WHERE name = '%s'"$form_state['values']['author']));
 
  if(
$uid) {
    
$start_func 'MYMODULE_blog_export_'.$form_state['values']['method'];
    
$finished_func 'MYMODULE_blog_export_'.$form_state['values']['method'].'_batch_process_finished';
 
    
// Add a batch set with simple operations taking an argument.
    
$batch = array(
      
'title' => t('Blog Export'), // Not displayed.
      
'operations' => array(
        array(
$start_func, array($uid)),
      ),
      
'finished' => $finished_func,
    );
    
batch_set($batch);
    
batch_process('admin/content/blogs/export');
  }
  else {
    
drupal_set_message('An error occurred while trying to process this action.');
  }  
}

?>
The above code defines a batch process with a start function, and an end function. For clarity, the start function is responsible for finding all of the nodes for the specified author. The end function is responsible for sending those results to LiveDocX.
<?php

function MYMODULE_blog_export_msword($uid, &$context) {
  
$limit 5;
  
$context['finished'] = 0;
  if (!isset(
$context['sandbox']['progress'])) {
    
$max_nodes db_result(db_query("SELECT count(n.nid) FROM {node} n WHERE n.uid = %d AND n.type = 'blog' ORDER BY nid ASC"$uid));
    
$context['sandbox']['progress'] = 0;
    
$context['sandbox']['current_node'] = 0;
    
$context['sandbox']['max'] = $max_nodes;
    
$context['sandbox']['results']['author'] = user_load(array('uid' => $uid));
    
$block_values = array();
    
$context['sandbox']['results']['block_values'] =& $block_values;   
  }
 
  
$nodes = array();
  
$result db_query_range("SELECT n.nid, n.type FROM {node} n WHERE n.nid > %d AND n.uid = %d AND n.type = 'blog' ORDER BY nid ASC"$context['sandbox']['current_node'], $uid0$limit);
  while(
$row db_fetch_object($result)) {
    
$nodes[$row->nid] = $row;
  }
  if(
count($nodes) == 0) {
    
cache_set('famed:blog_export_results''cache'serialize($context['sandbox']['results']));
    
$context['finished'] = 1;
  }
  
$context['message'] = t('Processing nodes authored by user %uid', array('%uid' => $uid));
 
  foreach (
$nodes as $node) {
    
// Process the node
    
$node node_load($node->nid);
    if(
$node) {
      
$content node_view($nodefalsetruefalse);
      if(
$content) {
        
$context['sandbox']['results']['block_values'][] = array (
          
'post_title' => $node->title,
          
'content' => strip_tags($node->body),
          
'created' => date('Y-m-d h:m:s'$node->created),
          
'updated' => date('Y-m-d h:m:s'$node->changed),
          
'pub_status' => ($node->status == 1) ? 'Published' 'Unpublished',
          
'tags' => $node->nodewords['keywords'],
        );
      }
    }
    
    
// Update our progress information.
    
$context['message'] = t('Processing blog posts authored by user %uid', array('%uid' => $uid));
    
$context['results'][] = t('Processed node %node', array('%node' => $node->nid));
    
$context['sandbox']['progress']++;
    
$context['sandbox']['current_node'] = $node->nid;
  }
    
  
// Inform the batch engine that we are not finished,
  // and provide an estimation of the completion level we reached.
  
if ($context['sandbox']['progress'] != $context['sandbox']['max']) {
    
$context['finished'] = $context['sandbox']['progress'] / $context['sandbox']['max'];
  }
}

?>
This code fetches all of the nodes from the database, and stores information about each node in the $context['sandbox']['results']['block_values'] array. When there are no more nodes to process, this array gets serialized and stored in Drupal's cache, so that the data can be used in the batch finished function. The finished function is responsible for sending all of the data to the LiveDocX web service. It's important that you create your document template before trying to send data to the web service, as LiveDocX works like a mail merge. Your template will consist of named MailMerge fields, and merge blocks for repeating data.
<?php

/**
 * Batch finished handler.
 */
function MYMODULE_blog_export_msword_batch_process_finished($success$results$operations) {

  
// Load the data from Drupal's cache
  
$cache cache_get('famed:blog_export_results''cache');

  
// Unserialize the cache data
  
$blog_data unserialize($cache->data);
  
cache_clear_all('famed:blog_export_results''cache');
  if (
$blog_data) {
    
// Turn up error reporting
    
error_reporting (E_ALL|E_STRICT);
     
    
// Turn off WSDL caching
    
ini_set ('soap.wsdl_cache_enabled'0);
     
    
// Define credentials for LD
    
$credentials = array(
      
'username' => 'my_user_name',
      
'password' => 'my_password',
    );
     
    
// SOAP WSDL endpoint
    
$endpoint 'https://api.livedocx.com/1.2/mailmerge.asmx?WSDL';
     
    
// Define timezone
    
date_default_timezone_set('Europe/Berlin');

  

    
// Create a new instance of the SoapClient object
    
$soap = new SoapClient($endpoint);
    
$soap->LogIn(
      array(
        
'username' => $credentials['username'],
        
'password' => $credentials['password']
      )
    );
    
    
// Upload template
    
$path_to_template './'.drupal_get_path('module''MYMODULE').'/template.doc';
    
$data file_get_contents($path_to_template);
    if(empty(
$data)) {
      
drupal_set_message('Failed to read the template''error');
      
watchdog('famed''Failed to read the template'WATCHDOG_ERROR);
      return;
    }
    
    
$soap->SetLocalTemplate(array(
      
'template' => base64_encode($data),
      
'format'   => 'doc'
    
));
    
    
$fieldValues = array (
      
'author' => $blog_data['author']->name,
      
'email' => $blog_data['author']->mail,
      
'title'  => 'Blog Posts by '.$blog_data['author']->name,
    );
 

    
/**

     * In the template, these field  values are used on the title page of the document,

     * and in the header/footer of the doucment.

     */
    
$soap->SetFieldValues(array (
      
'fieldValues' => assocArrayToArrayOfArrayOfString($fieldValues)
    ));
    

    
// Block values is the repeating data, in this case, the contents of each blog post
    
$soap->SetBlockFieldValues(array(
      
'blockName' => 'blogpost',
      
'blockFieldValues' => multiAssocArrayToArrayOfArrayOfString($blog_data['block_values'])
    ));
    
    
// Build the document
    
$soap->CreateDocument();
    
    
// Get document as DOC
    
$result $soap->RetrieveDocument(array(
      
'format' => 'doc'
    
));

    
// Fetch the document
    
$data $result->RetrieveDocumentResult;
    
$filename './sites/default/files/blog.doc';
    if(
file_exists($filename)) {
      
unlink($filename);
    }

    
// Write the document to the filesystem
    
file_put_contents($filenamebase64_decode($data));

 

    
// Force the browser to download the document
    
if(file_exists($filename)) {
      
header ("Content-type: octet/stream");
      
header ("Content-disposition: attachment; filename=blog.doc;");
      
header("Content-Length: ".filesize($filename));
      
readfile($filename);
      exit;
    }
    else {
      
drupal_set_message('Failed to download the file''error');
    }
  }
  else {
    
// An error occurred.
    // $operations contains the operations that remained unprocessed.
    
$error_operation reset($operations);
    
$message t('An error occurred while processing %error_operation with arguments: @arguments', array('%error_operation' => $error_operation[0], '@arguments' => print_r($error_operation[1], TRUE)));
  }
  
drupal_set_message($message);
 
}
?>
The data structures, which are sent to LiveDocx can be tricky to get right in PHP, so some additional functions are needed to massage the data that gets sent in the SetFieldValues() and SetBlockFieldValues() methods:
<?php

/**
 * Convert a PHP assoc array to a SOAP array of array of string
 *
 * @param array $assoc
 * @return array
 */
function assocArrayToArrayOfArrayOfString ($assoc) {
  
$arrayKeys   array_keys($assoc);
  
$arrayValues array_values($assoc);
  return array (
$arrayKeys$arrayValues);
}
 
/**
 * Convert a PHP multi-depth assoc array to a SOAP array of array of array of string
 *
 * @param array $multi
 * @return array
 */
function multiAssocArrayToArrayOfArrayOfString ($multi){
    
$arrayKeys   array_keys($multi[0]);
    
$arrayValues = array();
 
    foreach (
$multi as $v) {
      
$arrayValues[] = array_values($v);
    }
 
    
$_arrayKeys = array();
    
$_arrayKeys[0] = $arrayKeys;
 
    return 
array_merge($_arrayKeys$arrayValues);
}
?>
The trickiest part for me was getting the template correct in order for the LiveDocX service to work properly. I didn't know how to create merge blocks; as it turns out, it's as simple as inserting bookmarks into your template that follow a specific naming convention: blockstart_ blockend_ It's also important to know that LiveDocX is currently limited to having merge blocks defined in table cells. Future enhancements of the the service will support having merge blocks defined anywhere. I am excited for this to happen, as it will truly make this service a lot more flexible. The full API for LiveDocX can be found here.

New Drupal Module Released

Back in October, I released my first module for Drupal, the open-source content management system. These days, I seem to be developing exclusively for Drupal, and with a robust API and thriving community, I can only say how much fun it is to work with.

However, like any platforms, there are pieces missing. Fortunately, Drupal is one of those platforms that is very easily extended through modules. I came across one of those missing pieces while working on a project.

Drupal and Taxonomy Weights

I recently worked on a project in Drupal that called for a large number of taxonomy terms. I needed to put the terms in a specific order, but unfortunately, I had more terms than Drupal's weight field supports, which is a range from -10 to +10.

I did a quick search on Drupal, and was horrified to see how many people are hacking core to add a greater range. This is pretty easy to do without hacking core.

Flex and Drupal Paths

At CommonPlaces, each developer has his or her own sandbox to code in. Each sandbox can run n instances of a Drupal application, which all run out of subdirectories from the developer's web root.

Hack-proof Your Drupal App - the Video

I had the pleasure of presenting at DrupalCon in Szeged Hungary, and the topic of my presentation was Drupal security from the perspective of the application. I am pleased to be able to share the video of my presentation. Drupal, DrupalCon, CommonPlaces, Szeged, security, hacking, filters, output

DrupalCon Experiences in Szeged, Hungary

I have been attending DrupalCon this week, hosted in the beautiful Hungarian town of Szeged.

I was fortunate in that my company, CommonPlaces, was generous enough to become a silver sponsor for the conference.

Drupal and Sane Flash Remoting

On my latest project, I was faced with a challenge: build Flash widgets that displayed dynamic data and could be embedded on any web page. Phase two of the widgets called for user interaction with the widget, as opposed to simply displaying content.

Drupal: Cross-domain Widgets

Drupal is incredibly flexible, but in current versions, lacks the ability to export content easily in the form of widgets. However, the Services module gives you that flexibility in a very easy to use manner.

Services allows you to expose pieces of your Drupal site, such as user, node, and views methods.

PHP Debugging Goodness

I have found PHP nirvana in a box.

I was trying to debug the lastest dev version of the userpoints module for Drupal, and was getting nowhere. The process of debugging PHP is tedious to begin with, but the practice of putting print statements into your code in places you think are likely the problem is a nightmare, and a huge black hole for productivity.

DrupalCon Boston 2008: Day1

I have the distinct pleasure of attending DrupalCon 2008, which is being held in Boston, MA this year.

Drupal: Incorrect Pager Results

I've been working on a Drupal module that generates a search form and presents the results below the form.

Drupal: Releasing Custom Modules

I recently built my first module for Drupal, which exposes data from the Userpoints module to Views. There was some talk with the CEO of my company about releasing the module to the community as a contributed module, and some hedging about whether to release it or not.

Releasing modules shouldn't even be a topic of discussion within a company. The work I did on this module was built off the many, many, many hours others have spent on Userpoints and Views. In addition, thousands of developers contributed to, and continue to contribute to Drupal.

Drupal: Loading custom userprofile data

I have found that the core Drupal profile module provides very limited customization possibilities. However, the Usernode and Nodeprofile modules help out immensely with this.

If you want access to custom user profile fields, such as CCK fields, you simply load the profile:


<?php
$usernode 
nodeprofile_load('uprofile'$node->uid);
?>

Drupal, PHP, Usernode, Nodeprofile, user+profile, Image

Drupal: Calling Views In Code

Drupal is a fairly flexible system as far as CMS applications go, and is even more flexible as a development platform. The Views module gives developers ways to dynamically build queries of data, and display that data in many different ways. Sometimes, however, you want to display views in ways not supported through the admin interface. Fortunately, there are other ways to get the job done.

For example, I have a block in my sidebar where I want to display a list of the latest blog post, latest forum topic, and latest user poll. Each of these lists can be created as separate views.

Pagination Helper for CakePHP

In one of my CMS projects, I ran across a case where the user created a post of very long content that scrolled endlessly down the page. In an effort to make the content more easily readable, I created a Pagination helper that breaks that content into discrete blocks of content with "next" and "prev" links.

Syndicate content

About Erich

Erich is a web developer and a native New Englander who is passionate about life, the universe, and everything.

He is currently a senior Drupal developer at Harvard University, working on the IQSS OpenScholar project.  Prior to joining the team at Harvard, he was the engineering manager at CommonPlaces e-Solutions, in Hampstead, NH, contributing as the lead engineer on the Greenopolis.com and Twolia.com.

Erich is active in the Drupal community, having contributed modules and patches to the community. He presented at DrupalCon in Szeged Hungary, and co-presented at DrupalCon 2009 in Washington, DC.

Erich lives in New Hampshire with his wife, two sons, and two weimaraners.  When not writing code, Erich enjoys landscaping and woodworking.

Faceted search

Categories

Content type

Project types

Artwork Type

Artwork Tags

Recent comments

Activity Stream

September 2, 2010

September 1, 2010

  • Twitter ebeyrent tweeted "@cpliakas LOL @webkenny as @acquia product" 11:23am #

August 31, 2010

  • Twitter ebeyrent tweeted "Dear @msnbc, I want to see the hurricane report, not a goddamn advertisement!! #thisisseriousshit" 12:08pm #
  • Twitter ebeyrent tweeted "Need a good D6 starter theme, looking at Blueprint, Fusion, and Omega. Any other recommendations? Preferences? #drupal" 11:35am #

August 30, 2010

  • Twitter ebeyrent tweeted "Having fun upgrading a D5 site to D6. #drupal" 3:05pm #
  • Twitter ebeyrent tweeted "@DrupalSnark Believe it or not, this was actually from a DrupalCamp presentation at Yale this weekend..." 11:38am #
  • Twitter ebeyrent tweeted "Today is an exciting and bittersweet milestone. My oldest son enters first grade..." 6:31am #

August 28, 2010

  • Twitter ebeyrent tweeted "Following @jjeff's presentation was like walking into the opera wearing my clown shoes. #drupalcampct" 4:38pm #
  • Twitter ebeyrent tweeted "Overlay module in #drupal 7 is awful. Don't use it. #drupalwtf" 4:31pm #
  • Twitter ebeyrent tweeted "Profile module in #drupal 7 doesn't use fields. Still. Don't use it. #drupalwtf" 4:29pm #
  • Twitter ebeyrent tweeted "Slides from the "Hack-proof Your Drupal App" presentation at #drupalcampct are online: http://bit.ly/9q9yRK" 4:19pm #
  • Twitter ebeyrent tweeted "@jjeff consults the orb at #drupalcampct" 2:42pm #
  • Twitter ebeyrent tweeted "Listening to @jjeff's presentation at #drupalcampct" 2:21pm #