Migrating content from Preposterous to Hugo was easier than I expected. You might think that since I created Preposterous I’d expect it to be easy, but it’s been years since I looked at the code and since I never intended to make the content exportable, I thought it would be more complicated.
As it turns out I did myself a huge favor and used JSON files to store the content and rendered the HTML client-side when each post was loaded!
I also made things easier by creating a JSON index for each blog, which was good because I didn’t capture things like the post date in the individual post files (duh!).
After poking-around a bit I was able to write a little Python script that did most of the migrating automatically:
import json from bs4 import BeautifulSoup import html2text import codecs # open the posts file with open('./input/posts.json') as posts_file: # convert post index json to python dictionary posts_data = json.load(posts_file) # loop through all posts for post in posts_data['posts']: selected_post = post['post'] # DEBUG: limit to one post for image testing #if selected_post['slug'] == 'scaled-image-test': # translate to Hugo metadata # preposterous post properties: # * date # * url # * title # * slug # * author output = '---\n' output = output + 'title: ' + selected_post['title'] + '\n' output = output + 'date: ' + selected_post['date'] + '\n' output = output + 'author: ' + selected_post['author'] + '\n' output = output + 'draft: false\n' output = output + 'tags:\n' output = output + ' - preposterous\n' output = output + '---\n' # load refereneced HTML file post_html_filename = './input/' + selected_post['url'].split('/')[-1] # parse HTML into python object with open(post_html_filename) as post_html_data: post_html = BeautifulSoup(post_html_data,"lxml") # identify and copy images image_tags = post_html.findAll('img') for image_tag in image_tags: image_tag['src'] = '/preposterous/' + image_tag['src'] # TODO: handle other assets (video, audio, etc.) # reformat as markdown html = post_html.find('div', class_='content').prettify() markdown = html2text.html2text(html) output = output + markdown print('---output-----------------------------------------') print(output) print('--------------------------------------------------') # write the result to a properly-named file file = codecs.open('./output/' + selected_post['slug'] + '.md', 'w', encoding='utf8') file.write(output) file.close()
There were a few posts whose titles didn’t format properly, but with a little manual massaging I was able to get Hugo to accept all of the content the script generated. One problem I haven’t solved is that the date on some posts are mis-interpretted by Hugo as year
0001. I’m not sure why this is, and I’d like to fix it, but after looking at the output for an hour I gave up. Maybe fresh eyes in the future will figure it out?