Josh (the blog)

I’ve delivered simple, clear and easy-to-use services for 20 years, for startups, scaleups and government. I write about the nerdy bits here.


@joahua

People versus search engines

It seems that search engines are an immutable fact of early-twenty-first century existence. We can’t escape them in any immediate sense, and cannot believe they could ever disappear (I recall one instance on Whirlpool forums where a user thought his/her ISP’s interational link must be down because he couldn’t access Google. This was one of the very few times Google had actually dropped off the face of the planet for about twenty minutes. It was simply outside the realm of possibility.)

Yet, increasingly, our surfing habits are defined by this bizarre social concept that seems to be shaping certainly acquisitions and web-two-point-oh-bubblism, wherein websites serve users by connecting them with one another, not on the basis of them knowing what they wanted, but rather in a bizarre a priori manner whereby degrees-of-separation (MySpace) or user-supplied-already-knowns (LiveJournal, Xanga, etc.) define connectedness and displayed content.

Search is no longer the macro-inter killer app, but an intra-site facility applied to microcosm — often based on “transparent” technology that has, on the basis of known knowns (in the words of a certain Rumsfeld), already done some of the hard work for users (I should say people, but don’t out of habit: it is an industry hazard) without actually asking them anything. This is where location- and organisation-based matching (cf. MySpace, Facebook, etc.) come in.

But none of this data is intelligently searchable by generic engines.

None of this data (in the case of Myspace especially, horribly marked-up doing-everything-wrong-with-the-web technically entity that it is) is available for indexing by search engines because it’s not abiding by any defined semantics. There is not, for example, any overwhelming use of microformats — hCard, etc. — for defining contact details in any common sense. Yet these things are searchable within a given website.

And, what’s more, these things are searchable with great precision within (social networking) sites. This is because of a very well defined internal semantic (not the “semantic web”, but internal data structures) and an enforced obedience to these structures that was never a part of pre-SocNet sites.

SocNet platforms are radically different from web 1.0 systems in that they are (ironically) vastly more constricting. As “web 1.0″ I would cite Geocities and free web hosting services, portals, and all-things-to-all-people content networks. Now, we’ve got blogs (precisely defined websites), MySpace (chiefly SocNet profiles with bits on the fringes common to the users, and now with enough impetus to appear unstoppable), Flickr (free — and fee-for-service that people actually pay for — web hosting, precisely defined as photo hosting), and, strangely, a portal (Yahoo!) still on top of Alexa 500 rankings. A portal that owns both Flickr and Geocities, but has changed the model of the latter to place greater emphasis on fee-for-service hosting. But I digress into strategy — the point is not that, but rather in the way social data is stored.

Flickr is meta-data rich. It uses a well defined system based on EXIF, intrinsic semantics (title, description, tags — tags that get used properly, unlike Facebook which doesn’t bother to make such things clear — I want Facebook to flop, by the way, because it annoys me, so don’t expect nice things to be said about it. It’s a poor closed-system imitator, albeit with a stupidly effective advertising model everyone else should be wishing they came up with first but haven’t seen in order to copy… because it’s a closed system (or used to be) exclusive in scope. Which makes it very effective SocNet/Web 2.0, by my own definition, so I don’t really have a basis for complaint.) and extrinsic semantics (groups, pools, etc.).

Profiles, unlike ‘pure’ SocNet (Myspace, Facebook), permit anonymity, but allow disclosure of as much as is desired: at any rate, that is not the purpose of the site. Myspace/Facebook’s raison d’etre is profiles. (Well, and that and cash-cow-marketing-tool of the RIA’s of the world) Accordingly, its profiles have very definite semantics even whilst the rest of the site may not (I speak of Myspace more, here). Myspace gives core “Details” profile info individual fields, whilst allowing a diverse “Interests & Personality” information in freeform textareas that are designed to entice users into participation (and, possibly, aiding more fuzzy searches — but mostly I think it’s just compelling content, as there is no immediately obvious way to search that data).

“Interests & Personality”, along with blog content, seems to be the only freeform contributed material available on the site. Want music or a video with your profile? You’ve got to browse to the band’s site, load the player (no go in Opera with Flash at the minute, it seems), and then select “Add” on the track. They (yeah, it’s kinda big-brotherish) know exactly what song you chose, what band it’s from, what genre, etc. — that is to say, unambiguously and certainly beyond a probably-common song title. This isn’t an upload-yourself-and-we’ll-manage-rights kind of thing. The officiality gives that internal data structure that much more depth: but, again, the point is that the data is internal and not open.

This, it seems, is the defining quality of SocNet. That’s what makes the ideas of open federation advocated by Google Talk earlier this year so bizarre for the rest of us. We don’t particularly care, because closed systems mean innovation (because we can define new data for ourselves to work with) and/or extensibility that isn’t possible in an open platform (if, for example, not all federated partners agree to a spec extension — take, for example, Google Talk’s own Jabber base and proprietary VoIP on top of that). Openness is in Google’s interests, because it’s so dependent on things being open for its core business (search). But real people want services that work, not services that push them to another site. I’ve never trusted sites that bounce me off to Google for their site’s search, even if it’s one of those crappy co-branded things. It doesn’t make sense. Why would you make someone inspect your website from an inferior perspective when all the information is stored in a database, with the possibility of more semantically meaningful search open internally only?

Google won’t deal with your internal search needs. It’s not designed to. It does a great job of dealing with publicly indexed materials completely aside from SocNet services. SocNet sites thrive on and are empowered by strong intrinsic semantics that make clever profile-based (or UGC-based) search possible, which builds loyalty etcetera in a way foreign to informational websites. SocNet is experiential and (surprise surprise) social — it doesn’t have to be about anything.

Content was deposed as king sometime in the middle of the first decade of the twenty first century, and with that regime change his deputy, Search, was also shuffled to a somewhat less prominent position. Somewhere out of sight, Search’s identical twin, Query, is the real power behind the throne: it uses unindexed data and makes clever links to bring people closer together in a way that traditional search engines had never even envisaged.