Josh (the blog)

I’ve delivered simple, clear and easy-to-use services for 20 years, for startups, scaleups and government. I write about the nerdy bits here.


@joahua

Ansearch style indexing fixed

I’ve been advised that Ansearch’s style indexing bug has been ironed out, and will wash out as the next round of indexing comes into service. Chances are this hasn’t really affected anyone, as it’s unlikely a substantial number of the top 500,000 sites according to Ansearch’s statistics would have firstly picked up on it, and secondly bothered to do something meaningful about it.

But hey, it’s still nice to know, right?

Why’s (Poignant) Guide to Ruby

So I was looking for some light reading on the Ruby language, just because it [or the Rails framework that builds on it] seems cool and shortly before the biggest exams I’ve ever done commence is always a great time to acquire new and completely irrelevant skills (but, be assured — or disappointed — to know, I’m saving learning a language until afterwards!).

In my searching, I stumbled across a Creative Commons licensed work, entitled Why’s (Poignant) Guide to Ruby. It’s a work of moderate genius and… well… poignancy. Tis most poignant indeed. On a scale of one to poignant, it’s… towards poignant.

If you’re looking to learn something about Ruby and don’t mind occasional, rather amusing, diversions, I’d say (at twelve pages into the PDF version) it’s well worth a read.

Selling an audience short?

Or, What Josh Said About Ansearch That Was Irrelevant to Most Users.

Dean Jones responded to my Ansearch Answers post with the following:

All in all I feel [the post is] a fair representation of the so called facts, but I stand by my recent email… namely that simply reviewing us on technical issues that most people either

  1. wouldn’t have discovered, or;
  2. would not likely care about,

is selling your audience short.

I’m inclined to disagree, and just wanted to quickly post to say that. I like to think I understand the ‘audience’ here fairly well. They’re either people with (web-)geek tendencies, and are hence interested in any analysis and criticism I can deliver on the technical aspects of products, etc., or (and this category is completely unrelated to the former) students and humanities-focussed people reading various content I’ve published here — ranging from stage plots to a short story to an essay on the nature and effects of the digital divide.

Most guests in the latter category are just that: guests. They generally discover this content via a search engine, read what they want, and leave. Over 80% of my visitors stick around for one minute or less, presumably because they find what they need quickly, or discover that the content isn’t what they were looking for.

The “regular” audience/participants, however, are not that. I don’t think you’re all geeks, but this blog leans towards that style of content, and you match that accordingly. You don’t come here looking for product recommendations (the one exception to that being someone who viewed my post on Asterisk/VoIP, and asked me what my experiences with it had been some time later: to which I replied, we haven’t bothered, as we moved into a house with a Commander system preinstalled!). You come here, I think, for the quality of writing, for rants, for occasionally insightful (I hope) comment on various facets of things I deem interesting.

This is a blog. This is not a newspaper, though it is possible that search engines, ironically, are changing the clout of this medium to something similar. The distinction between newspaper and blog becomes blurred with posts like the one that inspired this, because of the form it was written in. It is important, however, to remember the audience.

People don’t come here to shop for search engines. We might be interested in how they work, what they do, what the potential benefits and failings of each one is, but ultimately it doesn’t affect anyone’s choice in the real world. Similarly, investors are unlikely to come here, scoping out Ansearch’s offering before buying into parent company Optum. And, if they did, my concluding remarks were positive — I genuinely believe the story balanced out in their favour more than anything else. If I overplayed the significance of a small flaw that could potentially be abused, my apologies. I don’t, however, regret including it in there at all, because I think it’s something my audience is interested in.

As you stated in an earlier email… “I’m not 100% sure as to how one should go about reviewing a search engine.” Here’s a tip. like Google, Yahoo, MSN… we are a business. For us to stay in business we need to generate revenue.

To do this we need to get more people to our SE, to get them to come back more often, and to, through their usage (CPM, CPC etc…) generate revenue.

To achieve this we need to provide a search service that the user finds useful. Given our rapid growth over the past months in UV’s and revenue, I would say we are doing OK.

Unfortunately for Ansearch and anyone else who wants to use this as an advertising space, we don’t particularly care if you’re making money. It’s good to hear they’ve grown: if their evolving product is anything to go by, they deserve it. But metrics such as revenue and Unique Visitors mean little to this audience, even if it’s what investors want to find out all about.

I think this is a fair assessment of this site’s ‘audience’ (the important ‘audience’, for me, being the minority that don’t come through search engines, subscribe by RSS, and come back regularly) — though, as always, your role is not restricted to that. You are participants. In light of this, I’d invite comment and discussion on this post as to your role as you understand it. It’s possible I’ve got this all wrong… but I doubt it.

Ansearch answers

All had been quiet on the Ansearch front as I awaited a response from Ansearch CEO Dean Jones, promised a hair under two weeks ago when I alluded to an earlier analysis/criticism I’d written when talking about the state of play with Australian search engines, specifically referring to the then-newcomer Ansearch.

Dean picked up my post via Technorati, a blog search engine that uses RPC update services to track what people are talking about in real-time. I was suitably impressed by this diligence and apparent desire to hear what the market has to say about their product: could this be the same company whose birth was so marred by a spat of cyber-squatting, in what Dean Jones was reported to have described as a fit of “youthful exuberance”?

Apparently so. Ansearch’s beginnings, though marred by dubious practices1, received praise from various quarters of the mainstream press — or, at least, those quarters not controlled by News Corp, whose domains had come under threat. However, the Internet community responded quietly, and those voices that were heard were mostly of disdain at Ansearch’s domain practices.

Strangely enough, my original post wasn’t about any of that. I hadn’t heard of Ansearch until I read an article on them in the SMH — an article which reads a little too much like a rehashed press release for my liking: the telltale sign is in the closing sentence “Ansearch is the search engine division of Optum Ltd.” — if it were filed in the Business section of their paper, I’d understand, but it wasn’t.

I wandered over to their site, played around for a bit, and decided their offering was mediocre. In hindsight, it probably didn’t help that I wasn’t shopping for anything in particular — according to a ZDNet article, “In the short term [Ansearch] is focusing very heavily on the commercial end of the market.” — but at that point in time, I also don’t think they’d tuned their listings particularly well, as a search for DashLite turned up my WordPress hack over commercial listings for the actual Dashlite brand I inadvertantly used.

I say “at that point in time”, because it appears to have substantially improved since, as per Jones’ claim: “Much has changed since your first article on us some 6 months ago.”

Much improved, it seems, on several fronts. Their core offering has shaped up nicely, and some facets of my initial complaints regarding accessibility have been met. Their ancillary product offerings seem to have developed nicely: Ansearch CEO Jones claims “Each of [our properties] goes through up to 7 stages ranging from an initial, simple SERP/Directory style page through to a more involved service, mini portal, search tool, etcetera.” He went on to say that these ancillary properties (such as http://www.picsearch.com.au/, http://www.videosearch.com.au/, http://www.thefreedictionary.com.au/ and http://www.messengers.com.au/ amongst several others) are currently being actively separated from the core Ansearch site (he described it as “quarantining”), and the exact direction of a number of these projects would become clear over the coming months, with the appointment of a full time manager of these online properties.

I’m a tad concerned about his description of their strategy with regard to these — he said this would become clear over the months to come, and I’m hanging off two words here: distributed portal. Whilst I can see this as being of value to users (especially for generic, non-brand-specific/legally dubious domains such as jokes.com.au and the ones listed above), it doesn’t seem to fit Ansearch’s core strength as I perceive it: as a commercial portal, and not as another Google. “We are not aiming to be another Google… we don’t have their budget and, to be frank, there are enough people trying to clone them: why build another?”

In fact, Jones suggested that Ansearch’s strengths lie in that it is not the ubiquitous search behemoth, and that its index is “something unique… something faster… [and] against the so called “arms race” of search (my SE has more links than yours etc…)”. I’d agree this is indeed a strength, and also a reason for them not to try and be a portal. Australia already has Yahoo! and NineMSN for domestic portals, and I’m struggling to see what Ansearch will do to differentiate themselves in this: but I’m happy to be surprised!

Ansearch apparently holds an index of only 500,000 websites considered by its metrics to be “most popular”. I argued that this was potentially a bad thing as relevant content might lie outside this realm: for example, this website performs well when people search for reviews of the HP 2610 or information about Apache on Ubuntu linux or ACT files from MP3 players that record audio, but isn’t included in Ansearch’s core index.

Which is perfectly valid, for a commercially-focussed site, I just think they could be missing out a little bit. They can leverage on my content for their advertising impressions and potential clickthroughs, because they have more valuable content showing up in their listing alongside advertised products. If someone reads my HP 2610 review after having found it in Ansearch, and decides they’d like to buy it and remembers having seen a “Buy HP printers!” ad on Ansearch, they’ll most likely click “back”. It’s abstract, behavioural stuff, but valuable nonetheless.

Whether it’s valuable enough for them to bother is another matter. “We spider our own content… something that over time will be done daily,” says Jones. “Having only 500,000 websites will allow us to index sites more often, and as is the case with the ‘site info’ pages, provide far more info on these pages.” Which is a value-add, and worth preserving. If that’s all resources permit, I think they’re doing the right thing as is. Jones openly admits Ansearch’s index of popularity “has a commercial flavour to it” — and rightly so. Given their much-touted gender and age demographic based search feature, this makes sense.

Their index of popularity seems to be fairly slow-moving. “Monthly we add around 20,000 sites… and take out 20,000.” I’d guess this would be the lowest 20,000 that gets shuffled, and this seems to make sense. One has to wonder whether all the higher-ranking pages can have substantially fresh content month after month, but presumably they do — it’s one of the things the SEO experts have always cried from rooftops.

It was interesting to hear Jones speaking about these people, too: amusing, even! Web developers the world over often join in speculation as to what exactly makes search engines tick, such that we can boost our clients (or employers) website’s performance. It seems the reverse is also true: search engines all over the world similarly speculate as to what those horrible developers are doing to screw with their indexes day in and day out!

I don’t say this in jest, and I believe they’re right to complain: “The larger SE’s are having a very tough time coming up with clever ways to index content to counter SEO… only to have SEO’rs quickly find ways around it. Cat and mouse…” I think “counter SEO” was a poor choice of words, given that relevant content should hopefully still be rewarded, but his point stands.

Just as interesting is Ansearch’s strategy to avoid falling prey to dodgy SEO tactics:

By only indexing the root page, we remove almost all SEO trickery. This works in 2 ways. Firstly, people rarely put spam on their home page — that is, doorway pages, link farms, etc. usually reside away from the main index… and, secondly, it deletes multiple results from the same website. It also stops the site owner/webmaster from saying they are relevant to 100 or 1000 keywords or phrases.

Kids, we just found a new argument against clients who love their splash pages!

Content rich front pages aren’t, however, an absolute solution (at least, not in Ansearch’s index). According to Jones, Ansearch’s policy of “ranking sites in true usage popularity, both on and offsite” is “SEO proof… or at the very least, extremely resistant.” I’d agree it’s a powerful metric, but my reservations above still stand.

One caveat of Ansearch’s algorithm that appears potentially exploitable is its failure to exclude content in the

If I dig a very deep hole, where I go to stop?

Wow. This Google Maps stupidity has to reach its peak soon! Found this one via CNet News.com: If I dig a very deep hole, where I go to stop? [sic]

Sydney-ites end up in the Atlantic somewhere. New Zealanders land just on the far western edge of Europe! (More specifically, roughly where Christchurch is hits roughly the Northern border of Spain).

Proudly brought to you with 36 days remaining until the HSC!