Sunday, October 24, 2010

Fix CDATA Bug on WordPress 3.0.1's WXR Export

Today I migrated one of my client's website from WordPress 3.0.1 to Drupal CMS 6.19.

I used WordPress's export function to create a WXR backup/dump file and imported it to Drupal using WordPress Import module.

Unfortunately the WXR export file is unreadable/broken because it is non well-formed.

Some investigation reveals that WordPress export doesn't escape CDATA content properly, i.e. content containing "]]>" is left unescaped.

]]>

should be escaped as

]]]]><![CDATA[>

I've reported this bug as Ticket #15203 on WordPress Trac.

I've also provided a patch as below:

diff --git a/wp-admin/includes/export.php b/wp-admin/includes/export.php
index a9e8f22..cd3a9d4 100644
--- a/wp-admin/includes/export.php
+++ b/wp-admin/includes/export.php
@@ -138,6 +138,7 @@ function export_wp( $args = array() ) {
                        $str = utf8_encode( $str );

                // $str = ent2ncr(esc_html($str));
+               $str = str_replace(']]>', ']]]]><![CDATA[>', $str);
                $str = "<![CDATA[$str" . ( ( substr( $str, -1 ) == ']' ) ? ' ' :

                return $str;


This fixed export functionality. However I think the import functionality is still affected, as it doesn't use a proper XML Parser (see WordPress Trac Ticket #7400).

With WordPress WXR fixed I can import WordPress contents to Drupal successfully.

P.S. WordPress Import module v6.x-2.1 for Drupal fails under PHP 5.3. Solution: Use the latest WordPress import module v6.x-2.x-dev.

No comments:

Post a Comment