July 22, 2010

JavaScript Native Interface & History Repeating

I’ve only recently started to look at the Google Web Toolkit.   I haven’t gotten far enough into the implementation and usage of it to make a firm decision, but I do like the philosophical concepts (which I will get into later).  Yeah, I’m probably late to the game on GWT, but I was early to the game on a bunch of other things, so it balances out, I think.

One thing that I was surprised by (pleasantly) is the “JSNI” – the way that the GWT allows the user to “drop in” javascript in situations where the GWT widgets just won’t do.

Here’s the first blog post I found about JSNI .  I love the first comment.

JavaScript is king in the browser and GWT is for cowards.

Hee hee.  Go back 20 years or so, and you’d see the exact same argument, only with different names:

Assembly is king in graphics, and C is for cowards

Pretty much the same situation – a certain group of people have made their living from being experts at something cryptic and difficult.  Along comes something (in the older case, DirectX) that attempts to simplify that difficult thing, and those experts begin flinging poo at it.

This was back in 07, of course, I wonder if those people still hate GWT and the leaky abstractions it represents.

July 21, 2010

Using Hadoop for Data Mining

I wrote a whitepaper on Hadoop, and how you can use it to perform Business Intelligence on data that’s too expensive to analyze using existing solutions, either because the data is too messy, too voluminous, or both.

There are other uses for Hadoop, but I think this is one of the most strategic.

Let me know  your thoughts!

What is Hadoop

My recent post on Hadoop may leave people wondering “WTH is Hadoop?”.

Well first, if anything in the world can be called “Cloud Computing”, Hadoop can.

Hadoop is an open source software system that creates two things:

1) A highly scalable, fault-tolerant distributed file system (loosely based on the Google File System)

2) A highly scalable implementation of Google’s MapReduce algorithm

And it’s open source, and free, and has been in use at Facebook and Yahoo for several years now.

Your next question may be “What is MapReduce?”

MapReduce is an algorithm that splits a large amount of data into smaller chunks, and allows the data to be sorted and aggregated in various ways.   It’s one of the cornerstones of Google’s massive software infrastructure – a system that lets Google process all the data that comes in about who is linking to who, and which tags and text are being used, etc.

Essentially, Hadoop is a cloud-based data analysis tool – something that can scale very cost-effectively, and can chew through terabytes and/or petabytes of data using off-the-shelf computers with off-the-shelf operating systems and hardware.

Latest Strategic Hadoop News

Momentum continues to grow for Hadoop – the ability to use Hadoop to cost-effectively perform large scale data mining and data cleansing is considerable.