Importing an existing site


pelican-import is a command-line tool for converting articles from other software to reStructuredText or Markdown. The supported import formats are:

  • WordPress XML export
  • Dotclear export
  • Posterous API
  • Tumblr API
  • RSS/Atom feed

The conversion from HTML to reStructuredText or Markdown relies on Pandoc. For Dotclear, if the source posts are written with Markdown syntax, they will not be converted (as Pelican also supports Markdown).


pelican-import has some dependencies not required by the rest of Pelican:

  • BeautifulSoup4 and lxml, for WordPress and Dotclear import. Can be installed like any other Python package (pip install BeautifulSoup4 lxml).
  • Feedparser, for feed import (pip install feedparser).
  • Pandoc, see the Pandoc site for installation instructions on your operating system.


pelican-import [-h] [--wpfile] [--dotclear] [--posterous] [--tumblr] [--feed] [-o OUTPUT]
               [-m MARKUP] [--dir-cat] [--dir-page] [--strip-raw] [--disable-slugs]
               [-e EMAIL] [-p PASSWORD] [-b BLOGNAME]

Positional arguments

input The input file to read
api_token (Posterous only) api_token can be obtained from
api_key (Tumblr only) api_key can be obtained from

Optional arguments

-h, --help Show this help message and exit
--wpfile WordPress XML export (default: False)
--dotclear Dotclear export (default: False)
--posterous Posterous API (default: False)
--tumblr Tumblr API (default: False)
--feed Feed to parse (default: False)
-o OUTPUT, --output OUTPUT
 Output path (default: output)
-m MARKUP, --markup MARKUP
 Output markup format (supports rst & markdown) (default: rst)
--dir-cat Put files in directories with categories name (default: False)
--dir-page Put files recognised as pages in “pages/” sub- directory (wordpress import only) (default: False)
 Import only post from the specified author.
--strip-raw Strip raw HTML code that can’t be converted to markup such as flash embeds or iframes (wordpress import only) (default: False)
 Disable storing slugs from imported posts within output. With this disabled, your Pelican URLs may not be consistent with your original posts. (default: False)
-e EMAIL, --email=EMAIL
 Email used to authenticate Posterous API
-p PASSWORD, --password=PASSWORD
 Password used to authenticate Posterous API
-b BLOGNAME, --blogname=BLOGNAME
 Blog name used in Tumblr API


For WordPress:

$ pelican-import --wpfile -o ~/output ~/posts.xml

For Dotclear:

$ pelican-import --dotclear -o ~/output ~/backup.txt

for Posterous:

$ pelican-import --posterous -o ~/output --email=<email_address> --password=<password> <api_token>

For Tumblr:

$ pelican-import --tumblr -o ~/output --blogname=<blogname> <api_token>


To test the module, one can use sample files: