Introduction

In 2001, after several prototypes and iterations, the Memorial Sloan-Kettering Cancer Center website (www.mskcc.org) launched on my "homegrown" content management system (CMS) called the "Inettool." For the last decade, while maintaining and customizing the Inettool, I came to the realization that I was digging a "CMS hole" where my code and MSKCC's data were gradually being buried and trapped in this custom built system. This experience led me to conclude that "custom built software requires everything to be custom built." Using an Open-Source CMS, like Drupal, prevents one from being trapped in a custom built CMS, because Open Source code provides pre-existing and tested functionality that can be customized.

The simplest explanation to why I choose Drupal is "given enough eyeballs, all bugs are shallow," Drupal has a community of engaged participants looking at and contributing code. For me, Drupal's contributed code and open discussions are its biggest strengths; I have not hit any brick walls or black boxes while using Drupal to build the MSKCC.org website.

About MSKCC.org

As one of the world's premier cancer centers, Memorial Sloan-Kettering Cancer Center is committed to exceptional patient care, leading-edge research, and superb educational programs.

Memorial Sloan-Kettering Cancer Center's broad mission requires that the www.mskcc.org design and information architecture serve a wide variety of audiences, each of whom are interested in very different content. For example, each aspect of MSKCC's mission, treatment, research, and education, has a dedicated landing page. A newly diagnosed cancer patient generally is interested in seeing only information about their specific cancer type while a postdoctoral student might want to view a researcher within a given research program or department. The primary goal for the redesign of www.mskcc.org was to better address the website's user needs, while cleanly conveying MSKCC's mission.

About the "Switch"

As mentioned in the introduction, the previous website was built using a custom built CMS, called the "Inettool." The driving force behind the custom built CMS was the desire for the institution to have a specialized, customizable website.

I will let you in on a little secret and misconception: "most websites are never really that special, it is the institution and/or business behind the website that is special." In the case of the www.mskcc.org website, the CMS is just a tool used to convey their mission and message, which I personally summarize as "quality, compassionate care."

I successfully convinced MSKCC to switch to Drupal by explaining the ongoing challenges of maintaining their custom built CMS. One final, but key selling point for switching to Drupal was the availability of enterprise support from Acquia.

So MSKCC agreed to adopt Drupal and "the switch" got underway.

The "switch" can be broken down into several steps/decisions, which include...

  • Migration
  • System Architecture
  • Content Management
  • Information Architecture
  • Site Features (aka modules)
  • Templates (aka themes and panels)

First, Some Stats

Below are some general stats to help describe the scope of the migration and the general system architecture requirements.

Site Stats (Yearly)
Visits Page Views
5,567,343 Visits
3,364,927 Unique Visitors
20,805,773 Pageviews
Drupal Stats
Modules Users
206 modules enabled
142 contrib modules
64 custom modules
55 active users
17 roles
Nodes Books
33 content-types
297 fields
11297 nodes
139 books
2003 book pages
Menus Taxononomy
108 primary links
26 secondary links
14 vocabularies
1972 terms
Views  
112 views  

Migration

Conceptualizing a migration of data from the Inettool to Drupal was the first step of the switch process. There were many questions to be asked and answered on how the migration will be accomplished. What data would be moved? Finally, how much data could be cleanly migrated, and how much data would require additional post-migration cleanup?

How to migrate an existing website to Drupal can be a pretty easy question to answer, since Drupal has several contributed modules to import and export data. Honestly, I made a 'newbie' mistake, which may have been a good decision, to write a custom migration module from scratch. I saw this as an opportunity to learn PHP and the inner workings of Drupal's API and database structure while knowing that this code would be thrown away after the final migration. Anyone new to Drupal should be willing to throw away code, it is just part of the learning process.

Besides learning Drupal, I had three goals for my migration script, which were:

  • Automated nightly builds so that everyone could review the migrated data as changes were being made
  • Single page imports that would be used to debug minor migration issues
  • Finally, to cleanly migrate 90% of the existing 10,000+ pages, thus requiring little post migration cleanup

Besides one or two issues that had to be fixed post-final migration, the data migration was successful, requiring about 3 weeks of post-migration cleanup but admittedly there was a lot of pre-migration cleanup. The most important thing was when it was time to finally migrate the website, everyone on the project was comfortable and ready to move to Drupal.

System Architecture

Since the project began in 2009, the new website uses Drupal 6. Though the web-site has no patient health information (PHI), MSKCC reasonably required that the web servers be hosted internally. The key performance recommendation I made, especially for a MSKCC's initial launch on Drupal, was to have no authenticated traffic on the website. By keeping all external user's anonymous, every page on the website can be cached by a reverse proxy and the website can handle a fairly large load.

No one at MSKCC, including myself, had ever launched a large Drupal or LAMP stack website, so Acquia was brought in to do a general Drupal site audit and make server recommendations. The final solution was an F5 load balancer in front of 2 varnish reverse proxy/web servers, 1 memcache server, and 2 master and slave MySQL DB servers.

In the end, the server architecture for this website is pretty much the standard setup for a high-performance Drupal website. The website is very responsive and has come nowhere near reaching its max load.

Custom server requirements were added to the 'Site status' report using hook_requirements(). These custom requirements check for properly configured firewall rules, internal webservice access, and additional PHP add-ons, like Oracle's OCI8 Database drivers.

Content Management

The website, www.mskcc.org, is primarily a content and information driven site which is why it was important to focus on the website's content types and navigation system before implementing site features (aka modules). The website has 33 content types, which may seem like a lot but the broad mission of MSKCC, which is treatment, research, and education, requires some additional content type specificity. For example, doctors, researchers, and staff members all require unique content types with custom fields with unique node access rules and controls.

Below are some notable content-types:

HTML fragment
An html fragment is small piece of HTML code that is used as global content within the website's blocks, main menus, and/or super footers. HTML fragments are primarily used by web developers to build editable pieces of specialized but customizable content.

View
The view content type provides content administrators an easy mechanism for build listings of data (aka Views) on the website. The view content type includes several CCK fields that are passed as arguments to a selected view.

Teaser
The teaser content type is a simple call-out, which consists of title, image, description, and a link that redirects to a complete web page. The teaser content-type is used to create a specialized call-out for a page whose default teaser is not acceptable.

View the complete list of content-types used on the website.

Information Architecture

Menus
Out of the box, Drupal supports a primary and secondary menu. These menus are used in the main navigation bars at the top of website. The primary and secondary menu handles the first 3 tiers of the www.mskcc.org website, and then a combination of taxonomy, books, and views manage the lower levels of the website's information architecture.

Taxonomy
Drupal's taxonomy system was used to manage MSK hierachical medical specialities and even simple event categorization. I built a custom taxonomy helper module to generate hierarchical and alphabetical taxonomy term displays for finding a doctor by specialities or department.

View the complete list of taxonomy vocabularies used on the website.

Books
Besides having a lot of unique content, the website has many unique sections maintained by different users. The book module, included in Drupal core, was the best means to break down the website's very rich information architecture. A custom 'Book helper' module was created to allow administrators to customize a book's navigation using some additionally available menu features, include disabling menu items and customizing a menu item's title.

Views
I use Views religiously, for anything that is "a list of things." As long as the Views module remains as helpful with either generating a SQL query and/or with displaying the results of an SQL query, I am going to use it. An MSK views module was created to handle all views related customization including altering queries, exposed filters, and additional template preprocessing.

Contributed Modules

The website uses 100+ contributed modules, which were selected based on their usage stats balanced against their usefulness on the MSKCC website.

In fact, some key challenges were solved by using some of the less-popular contrib modules and features. Some examples are:

Third Party Wrappers
MSKCC has several applications that are built using ASP.NET. To maintain a consistent look and feel, the main website's template must be shared with these applications. The Third Party Wrappers module solved this challenge by allowing the website's template to be wrapped around a "third party application" by creating header and footer snippets that developers can include in their application.

Node order
By default, Drupal's taxonomy system orders a term's nodes by its posted date. Doctors needed to be weighted by job title when listed within their department and specialities. The Node Order module provided MSKCC with this functionality and was very easy to set up.

Print PDF
The Print module is a very popular Drupal module that allows users to print pages and even books as PDF documents. MSKCC is using this feature to generate a PDF of a entire cancer overview, like Lung Cancer, for patients and caregivers. This feature allows those who would rather read offline to print out all the information about a cancer instead of reading it on a computer screen, or having to print each page individually.

See the complete list of contributed modules used on the MSKCC website.

Custom Modules

There are about 50 small, custom modules that were created for the MSKCC.org website. These custom modules contain mostly glue code, small enhancements to existing module, and tweaks to improve user experience. The decision to use small custom modules was inspired by the Unix philosophy: "Write programs that do one thing and do it well."

A few noteworthy custom modules and code snippets are:

The MSK toolbar module is good example of 'glue code' used to pull together the Print module, Service links, and a custom Subscribe to Feeds module.

The MSK glossary module allows users to lookup cancer related terms within the NCI glossary. The popups are generated using the Beauty Tips module.

The MSK disaster recovery admin settings form is used to track and disable certain aspects of the website in the event of the www.mskcc.org website having to be moved to the MSKCC disaster recovery data center.

See the complete list of custom modules used on the website.

Sandbox Modules

While planning and implementing MSKCC's custom modules, I tried to make sure that any re-usable functionality was abstracted out into generic modules that could be shared with the Drupal community. Meanwhile, the great GIT migration occured which changed and improved how the Drupal project and its contributed modules were being developed. One of the coolest changes was the addition of developer sandboxes. Sandboxes are basically open sourced Drupal projects that are not fully-fledged projects but they give developers a way to share their code. This is exactly what I intended to do.

During Rand Fay's DrupalCon presentation "Git on Drupal.org: It's Easier Than You Think!", I asked the question "should developers just sandbox all their code" while working on a Drupal website. The answer I got was "yes," so I decided to build and share my sandbox. I restructured my 'sites/all/modules' directory to reflect this by adding a 'sandbox' directory next to my 'contrib', 'custom', and 'dev' directories. I would describe this new 'sandbox' directory as code that sits somewhere between being completely custom to that which can one day being contributed back to the Drupal community.

See the complete list of my sandbox modules.

Templates

The website uses the Zen base theme and follows the style guidelines set forth by this clean and well-done theme. I also applied the concept of the Drupal 7 stark theme by copying and streamlining a lot of CSS files from core and contrib. One easy optimization was to remove any admin related classes from the main theme since they would never be used because the website uses a separate admin theme. The site admin them is the backport of the Drupal 7 Seven admin theme.

Similar, to the solution of using hook_requirements() to add additional MSKCC specific requirements to the 'Site status' report, I created a custom MSK comps reports to track the development progress of the website's design templates along with links to working examples for each template.

Panels

Panels are used to layout the content on MSKCC's landing pages. Using Panel nodes allowed for a node's CCK fields to be easily displayed in panel panes.

Panels, like everything in Drupal, is extensible so I created some custom panel layouts, styles, and content types to help build MSKCC's 20+ landing pages.

Layouts
MSKCC's custom panel layouts are simply copies of the default layouts include with the Panels module with a select menu added to set the default panel pane column widths to line up with a 960 grid.

Styles
A custom MSK stylizer was built to allow editors to select pre-defined custom classes for each panel pane. The Skinr module provides very similar functionality for an entire theme but the website only required this functionality for panel panes so it was easier to just implement a custom stylizer (which was copied from the default stylizer included with ctools).

Content-types
Finally, custom (Panel) content-types where required to build the landing page slideshows and video players but whenever feasible I used Views content panes to build custom content-types.

Finally, Some Lessons Learned...

Follow Drupal's best practices

One of the key factors behind Drupal's healthy community of code contributors is the project's well-defined and enforced best practices. Before switching to Drupal, the only best practice I followed was trying to write clean code. Following Drupal best practices was the easiest way to improve my programming skills and the overall quality of the website's code.

Below are the five Drupal best practices that I worked to adhered to for the MSKCC website:

  1. Code standards
    Drupal's code standards are very well documented. The Coder module is extremely helpful in correcting any bad habits and mistakes.
  2. Version control
    Use version control. `nuff said
  3. API documentation
    Generally, developers hate writing documentation, to encourage myself and all future developers on the website to write decent API documentation, we setup a secure api.mskcc.org website using the using the API module. Seeing one's lack of documentation or just grammatical mistakes on a website can be a great motivator.
  4. Issue tracking
    Getting the project team, including myself, to switch from using email to track issues to an actual issue tracking system took considerable effort but everyone is now happily using Unfuddle to manage issues.
  5. Unit testing
    SimpleTest is now part of Drupal 7 and this is the only best practice that I admittedly fell short of implementing. Unit testing is something I hope to implement during the upgrade to D7.

Namespace everything.

Originally, I started out namespacing just my modules with msk_* and soon realized it helps to namespace every custom object including Views, Panels, Rules, and even css classes. I namespaced all my views with 'msk_', then include the type of view, and finally a unique name for the view. For example, the clinical trials view is named 'msk_directory_trials' and the view used for the news feed content pane is named 'msk_content_pane_news_feed'.

Export everything.

The project does not use the Features module but does export everything into code, this includes Views, Panels, Rules, and ImageCache. The website uses the Strongarm module to export almost all of the website's configuration settings (aka variables) into code. I created a Strongarm dump module which allows every system configuration page to be easily exported. When the site is updated to D7, it will use Features module.

Document everything.

I personally use Google Docs to document and share everything. I also keep an organized list of any useful modules and/or Drupal related blog posts. There is no 100% perfect resource for Drupal, so it is worth tracking discussions about tricks, hacks, and APIs for modules like Views and Panels.

For developer documentation, I made sure to include the recommended README.txt files and API comments with every module and setup a series of text README files for coding standards, installation guides, changes and issues with module, etc., which are stored in SVN and available via a secure help section within the MSKCC website.

Conclusions

Drupal works... maybe this is too simple of a statement for a complex web-site project, using close to 200 modules, but in the end, Drupal accomplished what it was designed to do: build a website. Drupal allowed MSKCC to focus on their website's mission and not the technology behind it. In the end, MSKCC's goals were met because the website looks great and the information is easy to find.

About the Team

The launch of the new MSKCC.org was a joint effort of four different groups/organizations who were responsible for design, content, web development, and infrastructure. Magnani, Caruso and Dutton (MCD) designed the new site and re-worked the information architecture. The Big Blue House (My company) was responsible for all Drupal development. MSKCC's Department of Information Systems configured and administers the enterprise LAMP server stack. Finally, MSKCC's Department of Public Affairs manages the website day-to-day, and is responsible for the high-quality content and beautiful photography, as well as ongoing strategy and optimization.