I’m currently working on (and close to finishing) a pretty big site. It’s a rebuild of a site built in a proprietary, Django-based CMS, with the intention to have the new site look almost exactly like the old one, except with the addition of a store.
I’ve had to learn a lot on this project. Bits which I thought would be scary and time-consuming (dealing with someone else’s front-end code) weren’t always that bad, thanks to the original designers and developers being pretty damn awesome. Bits I thought might be easier – importing and handling the data – less so. I thought I’d share some of my learnings for you, which might help with two things. Firstly, if you’re building a site and thinking about the day it might need to be radically changed; and secondly, if you’re moving a site from another (proprietary) system to WordPress.
The most important things I’ve learned – or at least had reinforced – are about how you gather and store data for content.
As an example, the site build I’m doing sets dimensions for items. In the original CMS – and I’m not judging this, I know how these sorts of decisions are made – dimensions were stored as a single string, free text, with no formatting.
In some ways, this is great: the site’s editors have complete freedom to put the weird dimensions of their oddly shaped product in to the site. You could put a note about the weight of the object if you wanted. Brilliant!
On the other hand, it’s entirely possible to then face a situation where the dimensions suddenly have more of a role to play: calculating (or at least informing) shipping. It’s possible, for example, that an object with a volume over a certain amount would cost a lot more to ship. Had the dimensions been constrained at the outset, with separate fields for height, width and depth, and units set, then this wouldn’t be an issue: as it is, we have to do a conversion job on all the dimensions – checking for units (some use “in”, some say 12″, some are in cm, some have no units), converting imperial to metric, and so on. It’s not a massive job, but it’s extra overhead.
Let’s consider the example of a small online store. Because it’s small – say, 20 items – the owner doesn’t use unique ID codes for products (SKUs). Not a problem. But over time, slowly but surely, the number of products creeps up… and then you end up in a situation with 2000 products all needing a code adding to them, manually.
My advice to clients is always to gather as much data about “stuff” (products, events, whatever) as you can, and break it down as much as is practically possible. Addresses: do them line by line so you know you can format however you want for whatever front-end processing you need. Dimensions: set units, break them in to width and height and depth. That kind of thing.
How you actually store them is unimportant – as arrays, serialized or not, as single values, whatever – but by making sure that you’re staying true to variable types when you store the data, it gives you absolute freedom when, in a couple of years’ time, your business has grown hugely and you need to migrate all that content to a new system.
This is not going to be popular with everyone, however. Editors get frustrated about the number of form fields they need to fill in, and more so at the constraints when there’s that one special case. But you don’t want that one special case to cost you hundreds or thousands of pounds in a year’s time: developers need to be aware that there are special cases, and allow for null, or “special” values (i.e. fall back to a value of “see above” or similar if a value is missing or set to special) rather than doing away with the idea of the constraint. You can’t have every field as a free text box.
The original system, as I said, was Django-based, using PostGreSQL as a database.
At first I tried to use an automated tool to convert PostGres to MySQL and learned one important thing about doing this: don’t. It’s going to fail; if it works, it’s going to mess up character encoding, and it’s just a huge pain. Instead, rest assured that connecting to, and querying, a PostGres DB is actually pretty simple: it’s just standard SQL. If you’re importing in to WordPress, don’t worry too much about not being able to use $wpdb, because – and I’ll get to this later – you’ll be doing it on a local, isolated box, and as long as you remove all of this stuff before moving to live or staging you’ll be fine.
PostGres is not something I’d ever used before, and it turns out there are several challenges with this. Firstly, if you’re used to using WAMP, XAMPP or other Windows-based LAMP-stack, PostGres is almost certainly not installed by default. There are lots of tutorials on the web for enabling it, which takes a little while, but isn’t too bad.
Secondly, phpPGAdmin sucks if you’re used to using phpMyAdmin. The export options are limited, and when importing the errors if memory is too low (or the import too big) are far from helpful. I spent a long time working out the best way to import the data on to my local dev system, doing pretty much every iteration of a data dump from the live server, before realising that WAMP can use command line tools (yes, I’m an idiot). Run cmd and cd to the bin directory of your PostGres installation, and you can run psql fine.
This is needed because all of these web-based interfaces are terrible at handling big exports and imports. phpPGAdmin would start working, chug away for an hour or so, and then something would interrupt the process and it would fail part way through, either silently or with terrible error messages, and you have to start again. At one point I was copying and pasting individual tables from the dump to try to import them one by one: with some tables having 20,000 rows, a lot of these with long strings of HTML, my PC’s copy buffer couldn’t even cope, let alone the importer. Command line tools? Imported perfectly first time. The browser based tools have a major failing in that they rely on the browser being able to handle the data being output by the importer, and for a big dump this can be enough to chomp through an insane amount of memory – on top of what’s being used to actually handle the data. Making it worse still, the data going in is huge as well as the data going out – it’s not going to end well.
Next up was getting all the data in to WordPress. I may share some of my code for this as a Gist or something, but here are my top tips:
Right – that’s not even the half of what I’ve learned on this project, but hopefully it’s a start if you’re looking at importing a big site from a proprietary system into WordPress!