From Preposterous to Hugo

Migrating content from Preposterous to Hugo was easier than I expected. You might think that since I created Preposterous I’d expect it to be easy, but it’s been years since I looked at the code and since I never intended to make the content exportable, I thought it would be more complicated.

As it turns out I did myself a huge favor and used JSON files to store the content and rendered the HTML client-side when each post was loaded!

I also made things easier by creating a JSON index for each blog, which was good because I didn’t capture things like the post date in the individual post files (duh!).

After poking-around a bit I was able to write a little Python script that did most of the migrating automatically:

import json
from bs4 import BeautifulSoup
import html2text
import codecs

# open the posts file
with open('./input/posts.json') as posts_file:
    
    # convert post index json to python dictionary
    posts_data = json.load(posts_file)
    
    # loop through all posts
    for post in posts_data['posts']:
        selected_post = post['post']
        
        # DEBUG: limit to one post for image testing
        #if selected_post['slug'] == 'scaled-image-test':

        # translate to Hugo metadata
        # preposterous post properties:
        # * date
        # * url
        # * title
        # * slug
        # * author

        output = '---\n'
        output = output + 'title: ' + selected_post['title'] + '\n'
        output = output + 'date: ' + selected_post['date'] + '\n'
        output = output + 'author: ' + selected_post['author'] + '\n'
        output = output + 'draft: false\n'
        output = output + 'tags:\n'
        output = output + '  - preposterous\n'
        output = output + '---\n'

        # load refereneced HTML file
        post_html_filename = './input/' + selected_post['url'].split('/')[-1]

        # parse HTML into python object
        with open(post_html_filename) as post_html_data:
            post_html = BeautifulSoup(post_html_data,"lxml")

            # identify and copy images 
            image_tags = post_html.findAll('img')
            for image_tag in image_tags:
                image_tag['src'] = '/preposterous/' + image_tag['src']

            # TODO: handle other assets (video, audio, etc.)

            # reformat as markdown
            html = post_html.find('div', class_='content').prettify()
            markdown = html2text.html2text(html)

            output = output + markdown

            print('---output-----------------------------------------')
            print(output)
            print('--------------------------------------------------')

            # write the result to a properly-named file
            file = codecs.open('./output/' + selected_post['slug'] + '.md', 'w', encoding='utf8')
            file.write(output)
            file.close()

There were a few posts whose titles didn’t format properly, but with a little manual massaging I was able to get Hugo to accept all of the content the script generated. One problem I haven’t solved is that the date on some posts are mis-interpretted by Hugo as year 0001. I’m not sure why this is, and I’d like to fix it, but after looking at the output for an hour I gave up. Maybe fresh eyes in the future will figure it out?