Skip navigation

The “Yahoo can’t target traffic on their main pages” meme is wrong

Don Dodge frequently states that Yahoo can’t make money from their traffic because only search gives you a way to effectively target advertising. Michael at TechCrunch recently got sucked into this idea too.
It’s a nice theory - just a pity it’s totally wrong. Yahoo (and other portals) don’t make much money from their traffic because they are obsessed with copying Google’s search-based advertising model. What they should have done was build a advertising system based around personalization - so they it shows ads to people based on the links they had previously clicked and emails they had previously read. That would play to their strengths rather than meaning they have to try & catch Google in search traffic.

Personalized advertising systems aren’t new. Amazon Omakase is probably the most widely deployed, but Amazon doesn’t really have the traffic base to make this super-effective, and Omakase is just used to advertise Amazon products. Personalization guru Greg Linden also built his own, similar system for Findory, also based on Amazon advertising inventory.

Taxonomy directed folksonomies

A short pointer to a post on my work blog: Taxonomy directed folksonomies.

Google takes on the "Find a Hotel" problem

If search is only 5% solved then searching for a hotel is most definitely in the 95% of unsolved problems. Below is a screenshot of the Google Search results for New York Hotel.

The extent of the problem is clear from the fact that the first organic search result isn’t even in New York.

Microsoft’s Live.com does slightly better:

The “Top local listings for hotel near New York, NY” link takes us to the Live Local map view, which at least presents us some relevant information.

Google has a similar feature, but it requires a click on the Maps navigation link.

The map view is useful, but isn’t really enough. What if we want to find out more about the hotels, for instance?

That’s where Google Base comes in. Garret Rodgers from Googling Google discovered a new version of Google Base, which he called the new version of Froogle. I think the real point of it is to improve search performance on structured (and possibly semi-structured) data. Look at the search results for new york hotel:

The new interface lets you view the results on a map, while at the same time refining your search by review, review date, review author, etc. Now this isn’t quite at the “filter by hotels with good views and WiFi” stage yet, but it appears to be headed that way.

Google Custom Search - more features than you think

Google Custom Search has recieved a lot of attention since its release. One thing I discovered while developing an implementation at work was exactly how customisable it is. It isn’t clear from the simple examples, but some of the more advanced examples and the documentation show how it is possible to do facet-based navigation of search results, refinement based on lables and more.

The other interesting development is its integration with Google’s AJAX search APIs. This means that it is possible to deeply customize the search results display. It also means that advertisments won’t be displayed (which may or may not be a good thing, depending on the application).

Comscore’s measurements are bunk

Comscore is a media measurement site similar to Hitwise or the Alexa site rankings. They do a decent job, but people are beginning to take their data way too seriously for the accuracy that their methodology delivers.

Essentially, they (like Alexa) rely on users installing software which monitors which sites they visit.

That methodology means your samples are not (and cannot be) representative of internet usage.

For instance, most workplaces, schools and colleges have policies and security settings prohibiting the installation of this kind of software. That’s already knocked out a very large proportion of internet users from the sample. (See, for example why Cornell regards Marketscore - aka ComScore - as spyware)
Then there’s the fact the experience internet users are unlikely to install this kind of software.

Shared computers are a big problem, too: Comscore keeps registration of the characteristics of the user who registered and installed the software. That means the usage patterns of families with shared computers cannot be reported on reliably, as plainly ridiculous reports like “More than Half of MySpace Visitors are Now Age 35 or Older” shows. Dana Boyd sums up the problems with that data very well in her post “MySpace is *NOT* gray”.

However, the fact is that there isn’t really a good way of measuring this kind of thing. Hitwise gets fairly accurate traffic data by buying traffic logs from ISPs, but that can’t show the characteristics of the individual users. However, I tend to trust the Hitwise traffic data more than other sources for general traffic trends. I tend to regard Comscore & Alexa data as “possibly indicative but not definitive, and always requiring further analysis”.

UPDATE: The marketwatch toolbar privacy statement requires all users to be at least 18 years old. Given this, it isn’t clear to me how they know that 11.9 % of MySpace.com visitors are aged from 11-17 years old…

The Google/YouTube deal: What a bargain

When I first heard the rumours of the Google/YouTube deal I didn’t like it. Now that it  has gone though, I’m having second thoughts: they might have got a bargain.

Google were prepared to pay $900 million for 3 years of search advertising on MySpace, which makes $1.6 billion to own a site that gets (very roughly) about half the traffic MySpace gets seem a pretty good deal to me.

Even ignoring the traditional page-based advertising, there’s huge potential for Google to embed advertisments in YouTube videos.

Obviously, the copyright problems are the big negative, but provided Google can work around that YouTube will be a money-making machine.

Actually using Amazon EC2

Demitrious Kelly has an absolutely fascinating blog about operations in a modern web environment.

The whole thing is worth reading, but his comments on the use of Amazon EC2 is particularly compelling.

Reading his post on using MySQL on EC2 shows what an interesting service EC2 is: the work arounds needed to get around the fact that EC2 has no persistent-between-reboots storage are interesting, but in some ways contribute a lot to the scalability and high-availability of systems built on it (if your database servers needs rebuilding every time they reboot, then you need to be able to make those rebuilds repeatable and automatic - exactly the same as if you need to scale your system).

As I’ve noted previously, there are other options for high-availability datastorage services, though.

Enterprise 2.0: Data Syndication

While Web 2.0 technologies have become very wide spread in consumer-facing websites, many enterprise developers have struggled to find anything relevant to their requirements (apart from some AJAX based UI improvements).

Recently, Bob Lee pointed out how useful Firefox live bookmarks are for keeping track of online documents.

I think this is the future for enterprise 2.0 - feeds get chopped up into standalone datapackets which are filtered, sorted, re-aggregated and subscribed to. Tools like ROME.Mano are already beginning to provide support for this kind of functionality.

Online Application Composition: Mashups++

One of the reasons I started this blog was to explore some more speculative ideas. Online Application Composition (OAC) is one of these ideas.

To some extent this idea this is inspired by Chris Anderson’s post on embedding Google Spreadsheet in a webpage. The real trigger, though, was Google’s release of the GData API for Google Base. That got me thinking about the quantity of data which is likely to be stored in Google Base.

Once that data is there, though, wouldn’t it be great if it was usable to “citizen programmers” - those people who build towering applications powered by sheets of Excel spreadsheets. Sure - there are problems with applications built this way, but wouldn’t it be nice to have the option to do something like that on the web?

Current “Mashup” applications built using APIS provided by Google, Amazon, Yahoo etc do provide some of this functionality, but I’d still argue that:

a) The programming skills needed to build a mashup are far beyond what is required to build a spreadsheet, and

b) Mashup don’t typically feature the deep data integration that gives Excel its value.

I’m not claiming that going from mashups to deeper, easier integration tools is a paradigm shift, but I do think that if a mashup is the typical Web 2.0 applications then perhaps the ability for non-programmers to compose online applications themselves could be dubbed Web 2.1.

There are a few applications already emerging in this space.

Dabble DB

Avi Bryant was kind enough to leave me a comment recently to make sure I knew about his tool in this space: DabbleDB.

If you haven’t seen the DabbleDB 7 minute demo video it’s worth seeing, if only to make you wonder why traditional applications aren’t as easy to use.

While DabbleDB enables “citizen programming” it has only just begun to add the features to enable deep API integration with other sites.

There is a screencast showing their work in this area. They do note, however, that:

There’s lots more work to do, of course. For example, these imports are one-time only, not recurring subscriptions as they should be. We need better standard ways of dealing with rich types like locations and date ranges. And although there are some good examples of structured data embedded in RSS, the majority of web apps still provide vanilla feeds.

DabbleDB is currently using Microsoft’s Simple List Extensions to transport extra data in RSS feeds. GData also provides some of the structure richer APIs requires,, and I expect that the DabbleDB team would be well aware of that option.

Dapper

Dapper is the newest of the tools I’m looking at today. I first became aware of it via a rave review on TechCrunch. Dapper is a tool which allows you to create an API for any website, and then combine it with other of these APIs (called “Dapps”) to build applications.

The Magg movie aggregator shows the kind of applications that can be built.

This is a very interesting tool: the ability to create APIs easily is a wonderful feature. It doesn’t yet have the ability to do “spreadsheet-like” features like adding columns of data, though.

However, it is still under heavy development as the latest Dapper blog post shows:

We’re happy to announce a new feature: Dapp linking. This feature lets you links two or more Dapps together. The output from the first Dapp is used as input to the second Dapp (and so on). For instance, if you have a Dapp which takes a zip code and returns movies playing in that area, and another Dapp which takes a movie title and returns reviews, you can link the two together. The end result is a new Dapp which, in this example, takes a zip code and returns a list of movies playing in the area and reviews of each movie.

JotSpot

JotSpot has always marketed itself as a Wiki, but it includes tools to do a whole lot more:

While other wikis only support plain old text, JotSpot allows you to create rich web-based spreadsheets, calendars, documents and photo galleries. It’s as easy as using a word processor — you don’t need to know HTML.

Joe Kraus (JotSpot’s founder) said (way back in March 2005):

JotSpot is a company that is building a platform to make it easy and affordable to build long-tail software applications. To take those Excel spreadsheets and turn them into real web-based applications where you don’t have versionitis, where updates find you instead of you looking for them and where you can integrate data in your hard drive with data from the web, email and other applications.

As far as I’m aware JotSpot doesn’t yet feature integration with other online APIs. As such it is a good “citizen programmer” tool, but can’t quite build the next generation of integrated applications.

Spam in social news sites

Greg and some other commentators have posted a few thoughts on how spam on Digg is becoming a problem

It should be pointed out that Netscape.com has an even bigger problem:

  1. It takes less votes to get to the front page on Netscape
  2. There are even more users of Netscape than Digg
  3. Netscape users are often less experienced than Digg users and therefor more likely to click on advertisments
  4. Netscape is a PR9 site (vs PR7 for Digg). That mean a bigger potential boost in search engine rating if you are on the front page of Digg compared to Netscape.