Josh (the blog)

Hey there. I’m Josh, a SydneyCanberra-based maker of Internets. I don’t update this very often.


@joahua

Tagging old post backcatalog with WordPress

I just finished adding tags to each of the 1200+ posts on this blog. Needless to say, I enlisted help.

Calais Archive Tagger, a free WordPress plugin, did most of the heavy lifting for me. It connects to a web service called OpenCalais, run by ThomsonReuters (so nothing dodgy is going on with your data, they’re a pretty big publishing conglomerate!) The biggest problem with it is that, given the particular emphasis of OpenCalais towards establishing commonalities between different data sets, it paid a disproportionate amount of attention to proper nouns, and when product names were incomplete (for example, my old Pentax SP500 camera that I often just referred to as “SP500″) it would match tags to other products that had a more complete title. Which would be excellent if that were, in fact, what I was talking about.

I ended up sifting through the maybe 2500+ tags it created and deleting about 400 of those, and consolidating others.

I’m nowhere near HAPPY with the tags as representative of the content of each post, but, from the ones I’ve scanned, it’s most definitely better than nothing at all. Clearly categories are decreasingly relevant as stronger search capabilities have emerged over the past couple of years, so tags are a great way of enhancing searchable content — it’s not just about relating similar information, it’s about creating a mesh or network of content. This has SEO benefits, but can also function as a barometer of the type and nature of content being discussed. For the record, I don’t think it’s a fantastic barometer for this blog just yet!

One other unexpected thing it did was expose some spam that had found its way into a handful of posts through old WordPress vulnerabilities (I presume pre-2.8 era)… there were only three, with probably negligible Page Rank effects for anyone.