Tagging old post backcatalog with WordPress

I just fin­ished adding tags to each of the 1200+ posts on this blog. Need­less to say, I enlisted help.

Calais Archive Tag­ger, a free Word­Press plu­gin, did most of the heavy lift­ing for me. It con­nects to a web ser­vice called Open­Calais, run by Thom­son­Reuters (so noth­ing dodgy is going on with your data, they’re a pretty big pub­lish­ing con­glom­er­ate!) The biggest prob­lem with it is that, given the par­tic­u­lar empha­sis of Open­Calais towards estab­lish­ing com­mon­al­i­ties between dif­fer­ent data sets, it paid a dis­pro­por­tion­ate amount of atten­tion to proper nouns, and when prod­uct names were incom­plete (for exam­ple, my old Pen­tax SP500 cam­era that I often just referred to as “SP500”) it would match tags to other prod­ucts that had a more com­plete title. Which would be excel­lent if that were, in fact, what I was talk­ing about.

I ended up sift­ing through the maybe 2500+ tags it cre­ated and delet­ing about 400 of those, and con­sol­i­dat­ing others.

I’m nowhere near HAPPY with the tags as rep­re­sen­ta­tive of the con­tent of each post, but, from the ones I’ve scanned, it’s most def­i­nitely bet­ter than noth­ing at all. Clearly cat­e­gories are decreas­ingly rel­e­vant as stronger search capa­bil­i­ties have emerged over the past cou­ple of years, so tags are a great way of enhanc­ing search­able con­tent — it’s not just about relat­ing sim­i­lar infor­ma­tion, it’s about cre­at­ing a mesh or net­work of con­tent. This has SEO ben­e­fits, but can also func­tion as a barom­e­ter of the type and nature of con­tent being dis­cussed. For the record, I don’t think it’s a fan­tas­tic barom­e­ter for this blog just yet!

One other unex­pected thing it did was expose some spam that had found its way into a hand­ful of posts through old Word­Press vul­ner­a­bil­i­ties (I pre­sume pre-2.8 era)… there were only three, with prob­a­bly neg­li­gi­ble Page Rank effects for anyone.

Not an outage

Google.cn search queries for May 19th at 2:27pm took a bit of a hit, as follows:

Three min­utes of national mourn­ing for earth­quake vic­tims. Taken seri­ously and mov­ing in a way that is a lit­tle dif­fi­cult to imag­ine an ana­logue for in Aus­tralia — tongue-in-cheek about re:cessation of Google-ing… but intended as a broader com­ment on national dis­plays of stuff in all seri­ous­ness. Per­haps unfair as Aus­tralia hasn’t really had any dis­as­ter of this mag­ni­tude in recent times, I know.

Every­one was out­side as traf­fic stopped to remem­ber and share in the grief of mil­lions. Some things are more impor­tant than search.

[Google post via]