People versus search engines

It seems that search engines are an immutable fact of early-twenty-first cen­tury exis­tence. We can’t escape them in any imme­di­ate sense, and can­not believe they could ever dis­ap­pear (I recall one instance on Whirlpool forums where a user thought his/her ISP’s inter­a­tional link must be down because he couldn’t access Google. This was one of the very few times Google had actu­ally dropped off the face of the planet for about twenty min­utes. It was sim­ply out­side the realm of possibility.)

Yet, increas­ingly, our surf­ing habits are defined by this bizarre social con­cept that seems to be shap­ing cer­tainly acqui­si­tions and web-two-point-oh-bubblism, wherein web­sites serve users by con­nect­ing them with one another, not on the basis of them know­ing what they wanted, but rather in a bizarre a pri­ori man­ner whereby degrees-of-separation (MySpace) or user-supplied-already-knowns (Live­Jour­nal, Xanga, etc.) define con­nect­ed­ness and dis­played content.

Search is no longer the macro-inter killer app, but an intra-site facil­ity applied to micro­cosm — often based on “trans­par­ent” tech­nol­ogy that has, on the basis of known knowns (in the words of a cer­tain Rums­feld), already done some of the hard work for users (I should say peo­ple, but don’t out of habit: it is an indus­try haz­ard) with­out actu­ally ask­ing them any­thing. This is where loca­tion– and organisation-based match­ing (cf. MySpace, Face­book, etc.) come in.

But none of this data is intel­li­gently search­able by generic engines.

None of this data (in the case of Myspace espe­cially, hor­ri­bly marked-up doing-everything-wrong-with-the-web tech­ni­cally entity that it is) is avail­able for index­ing by search engines because it’s not abid­ing by any defined seman­tics. There is not, for exam­ple, any over­whelm­ing use of micro­for­mats — hCard, etc. — for defin­ing con­tact details in any com­mon sense. Yet these things are search­able within a given website.

And, what’s more, these things are search­able with great pre­ci­sion within (social net­work­ing) sites. This is because of a very well defined inter­nal seman­tic (not the “seman­tic web”, but inter­nal data struc­tures) and an enforced obe­di­ence to these struc­tures that was never a part of pre-SocNet sites.

Soc­Net plat­forms are rad­i­cally dif­fer­ent from web 1.0 sys­tems in that they are (iron­i­cally) vastly more con­strict­ing. As “web 1.0″ I would cite Geoc­i­ties and free web host­ing ser­vices, por­tals, and all-things-to-all-people con­tent net­works. Now, we’ve got blogs (pre­cisely defined web­sites), MySpace (chiefly Soc­Net pro­files with bits on the fringes com­mon to the users, and now with enough impe­tus to appear unstop­pable), Flickr (free — and fee-for-service that peo­ple actu­ally pay for — web host­ing, pre­cisely defined as photo host­ing), and, strangely, a por­tal (Yahoo!) still on top of Alexa 500 rank­ings. A por­tal that owns both Flickr and Geoc­i­ties, but has changed the model of the lat­ter to place greater empha­sis on fee-for-service host­ing. But I digress into strat­egy — the point is not that, but rather in the way social data is stored.

Flickr is meta-data rich. It uses a well defined sys­tem based on EXIF, intrin­sic seman­tics (title, descrip­tion, tags — tags that get used prop­erly, unlike Face­book which doesn’t bother to make such things clear — I want Face­book to flop, by the way, because it annoys me, so don’t expect nice things to be said about it. It’s a poor closed-system imi­ta­tor, albeit with a stu­pidly effec­tive adver­tis­ing model every­one else should be wish­ing they came up with first but haven’t seen in order to copy… because it’s a closed sys­tem (or used to be) exclu­sive in scope. Which makes it very effec­tive SocNet/Web 2.0, by my own def­i­n­i­tion, so I don’t really have a basis for com­plaint.) and extrin­sic seman­tics (groups, pools, etc.).

Pro­files, unlike ‘pure’ Soc­Net (Myspace, Face­book), per­mit anonymity, but allow dis­clo­sure of as much as is desired: at any rate, that is not the pur­pose of the site. Myspace/Facebook’s rai­son d’etre is pro­files. (Well, and that and cash-cow-marketing-tool of the *R**IA’s of the world) Accord­ingly, its pro­files have very def­i­nite seman­tics even whilst the rest of the site may not (I speak of Myspace more, here). Myspace gives core “Details” pro­file info indi­vid­ual fields, whilst allow­ing a diverse “Inter­ests & Per­son­al­ity” infor­ma­tion in freeform textar­eas that are designed to entice users into par­tic­i­pa­tion (and, pos­si­bly, aid­ing more fuzzy searches — but mostly I think it’s just com­pelling con­tent, as there is no imme­di­ately obvi­ous way to search that data).

“Inter­ests & Per­son­al­ity”, along with blog con­tent, seems to be the only freeform con­tributed mate­r­ial avail­able on the site. Want music or a video with your pro­file? You’ve got to browse to the band’s site, load the player (no go in Opera with Flash at the minute, it seems), and then select “Add” on the track. They (yeah, it’s kinda big-brotherish) know exactly what song you chose, what band it’s from, what genre, etc. — that is to say, unam­bigu­ously and cer­tainly beyond a probably-common song title. This isn’t an upload-yourself-and-we’ll-manage-rights kind of thing. The offi­cial­ity gives that inter­nal data struc­ture that much more depth: but, again, the point is that the data is inter­nal and not open.

This, it seems, is the defin­ing qual­ity of Soc­Net. That’s what makes the ideas of open fed­er­a­tion advo­cated by Google Talk ear­lier this year so bizarre for the rest of us. We don’t par­tic­u­larly care, because closed sys­tems mean inno­va­tion (because we can define new data for our­selves to work with) and/or exten­si­bil­ity that isn’t pos­si­ble in an open plat­form (if, for exam­ple, not all fed­er­ated part­ners agree to a spec exten­sion — take, for exam­ple, Google Talk’s own Jab­ber base and pro­pri­etary VoIP on top of that). Open­ness is in Google’s inter­ests, because it’s so depen­dent on things being open for its core busi­ness (search). But real peo­ple want ser­vices that work, not ser­vices that push them to another site. I’ve never trusted sites that bounce me off to Google for their site’s search, even if it’s one of those crappy co-branded things. It doesn’t make sense. Why would you make some­one inspect your web­site from an infe­rior per­spec­tive when all the infor­ma­tion is stored in a data­base, with the pos­si­bil­ity of more seman­ti­cally mean­ing­ful search open inter­nally only?

Google won’t deal with your inter­nal search needs. It’s not designed to. It does a great job of deal­ing with pub­licly indexed mate­ri­als com­pletely aside from Soc­Net ser­vices. Soc­Net sites thrive on and are empow­ered by strong intrin­sic seman­tics that make clever profile-based (or UGC–based) search pos­si­ble, which builds loy­alty etcetera in a way for­eign to infor­ma­tional web­sites. Soc­Net is expe­ri­en­tial and (sur­prise sur­prise) social — it doesn’t have to be about anything.

Con­tent was deposed as king some­time in the mid­dle of the first decade of the twenty first cen­tury, and with that regime change his deputy, Search, was also shuf­fled to a some­what less promi­nent posi­tion. Some­where out of sight, Search’s iden­ti­cal twin, Query, is the real power behind the throne: it uses unin­dexed data and makes clever links to bring peo­ple closer together in a way that tra­di­tional search engines had never even envisaged.

Web design in schools

Still… teach­ing… WYSIWYG design prin­ci­ples! My brother is on another com­puter here design­ing some web­page using a word proces­sor in HTML mode, and I’m furtively glanc­ing, wait­ing for the crush­ing moment when he dis­cov­ers that his pretty fonts aren’t going to dis­play like that in a real browser.

Acces­si­bil­ity issues aside, peo­ple don’t seem to under­stand that typog­ra­phy doesn’t work like print.

I’d like to go and rant to the teacher who set the project — not because they use redun­dant and dep­re­cated design prac­tices, but sim­ply in response to their role in per­pet­u­at­ing these. Edu­ca­tors have a greater bur­den of respon­si­bil­ity here, being a cat­a­lyst for the prac­tices of tomor­row. Admit­tedly, edu­ca­tion is not the only cat­a­lyst (I think most peo­ple my age who under­stand the notion of the seman­tic web can attest to this!), but that should not dimin­ish its poten­tial role in this.

I argue that, in their role as edu­ca­tors, they have failed — their influ­ence is a wholly neg­a­tive one in this aspect for sev­eral reasons.

Web design in this out­moded form, regard­less as to the WYSIWYG appli­ca­tion used to enact this, is not effec­tive in devel­op­ing an individual’s design skills.

Note that I don’t speak of web design gen­er­ally — I think, done prop­erly, it pro­vides an excel­lent ground­ing in design in a more flex­i­ble frame of mind (think­ing in terms of fluid lay­outs, for exam­ple, as opposed to sta­tic print lay­outs). My crit­i­cism is applied only to the pri­mary use of appli­ca­tions such as Front­page or Dreamweaver as sole design tools, and more so to word pro­cess­ing and DTP soft­ware that per­form a sec­ondary func­tion in being able to export HTML. Notably, use of graphic design tools is exempt from such a crit­i­cism (Pho­to­shop, Illus­tra­tor, Fire­works… and to a lesser extent Flash — lesser because it is not designed for the pri­mary pro­duc­tion of graph­i­cal ele­ments, rather for the imple­men­ta­tion of these in an inter­ac­tive and engag­ing framework) — these have value in the devel­op­ment of design skills, even if these skills are not directly applic­a­ble in an elec­tronic context.

The notion of markup is for­eign, even whilst the user recog­nises the pur­pose of an appli­ca­tion as being to cre­ate doc­u­ments in a markup lan­guage.

Clearly, such edu­ca­tion ignores the core tenet of the tech­nol­ogy on which it is based. Given the gen­eral pedan­ti­cism preva­lent in computing-related courses (I do not com­ment on the depth of edu­ca­tion, only the nature of that which is given), one would imag­ine that the fun­da­men­tal ele­ments, par­tic­u­larly in a “sim­ple”, uncom­piled lan­guage, would be addressed. Appar­ently not — per­haps it was too rel­e­vant for consideration?

WYSIWYG cre­ation rejects the notion of sep­a­ra­tion of markup (con­tent), pre­sen­ta­tion and behaviour.

The risks are three-fold.

Firstly, that pro­duc­tion of qual­ity con­tent should be hin­dered by the bun­dled nature of the medium — that is, peo­ple will focus on pre­sen­ta­tion at the expense of con­tent. The seman­tic web frees content-creators from this — their pur­pose is sim­ply that, with lay­out being dic­tated at the pre­sen­ta­tional layer. For a broader exam­ple of this, see gen­eral crit­i­cisms of Pow­er­Point as being a time-wasting and hol­low pre­sen­ta­tion form.

Sec­ondly, that the con­tent should be bound to pre­sen­ta­tion, and its longevity would be com­pro­mised by this link. This is a well-documented risk in rela­tion to the seman­tic web, and one of the core rea­sons com­monly given in sup­port of this. Ample evi­dence sup­port­ing this exists, so I won’t elab­o­rate further.

A third risk is the gen­eral acces­si­bil­ity of infor­ma­tion — also well doc­u­mented. The cre­ation of qual­ity con­tent is still pos­si­ble, but if this con­tent is acces­si­ble to no-one due to usabil­ity bar­ri­ers, it is redundant.

Pro­mo­tion of WYSIWYG devel­op­ment meth­ods is counter-productive in all areas — con­tent pro­duc­tion, gen­eral qual­ity of design, and cre­ation of an accessibility/usability cul­ture — and should cease imme­di­ately in all edu­ca­tional spheres presently sup­port­ing this practice.

*steps off soap box*

CeBIT Australia 2005

Attended this one this after­noon — it was rather impres­sive, with over 600 exhibitors. I was sur­prised by the preva­lence of open-source busi­nesses there… that, along with VoIP, were prob­a­bly the two emer­gent tech­nolo­gies this year. There were also the usual busi­ness CRM/“knowledge” drones, but they gen­er­ally stuck to them­selves, so that was okay.

Aside from that, var­i­ous con­tent man­age­ment sys­tems were out in force — includ­ing one or two that appar­ently haven’t caught onto the seman­tic web yet. Most notably, one was demo­ing their CMS on a mas­sive plasma screen with bla­tantly obvi­ous char­ac­ter encod­ing errors every­where (you know, char­ac­ters dis­play­ing as black dia­monds with ques­tion marks). I quizzed one of them about it and he basi­cally said that it was some­thing to do with their not demo­ing it on a live site. Bull.

If you can’t get that sort of stuff right at a trade show, when you’re try­ing to sell prod­ucts, what are the chances of actu­ally being able to deliver?

Another provider, Netcat.biz, seemed to have the right idea in terms of semat­ics at least in their pre­sen­ta­tion at CeBIT, but a quick check of their own web­site reveals a lack of a DOCTYPE, despite their use of CSS for pre­sen­ta­tion and a not-too-horrible (or rel­a­tively easy to patch up) markup situation.

There’s still clearly a mar­ket for truly acces­si­ble con­tent man­age­ment, although I doubt many busi­ness cus­tomers would actu­ally know the dif­fer­ence. Unfor­tu­nately, that’s the real­ity of it, and pos­si­bly why nei­ther of these two com­pa­nies (there were other CMS exhibitors, but those two stood out as most ‘impres­sive’, regard­less as to the qual­ity of their solu­tion) have both­ered to develop such a product.

Sigh.

Whilst I’m on a bit of a rant, the exhi­bi­tion had a bla­tantly sex­ist cul­ture hap­pen­ing. ATI and Sap­phire were prob­a­bly the worst offend­ers, employ­ing lycra body­suits to attract atten­tion, but they were by no means the only ones. Short skirts were the norm for many female sales­peo­ple at the event — one has to won­der when the IT indus­try is going to grow up.

In all, how­ever, the event was impres­sive — sig­nage and event dis­plays were won­der­fully over-the-top, exhibitors, for the most part, knew what they were talk­ing about, and free cof­fee abounded!