CunningPlanning

Big data has been getting a lot of attention lately as a field in computing that could significantly increase the value of the vast amounts of data that are captured by digital networks and devices. Phrases like “Data is the new oil.” has been flying around glibly for a while now but it’s difficult to extract the truth from the vast amount of data on big data that’s out there. It sounds like a job for Big Data… but I get ahead of myself.

The argument for Big Data seems to go something like this: We have devices today that can store and process potentially huge amounts of data. Converting that data into information that is actionable and useful would require some processing to be done on the said data. Since we can do a vast amount of processing we can get a vast amount of information that could then be turned into gold.

It may seem a reasonable argument given IBM’s demonstration of Watson playing jeopardy and winning against the best human players in history by trawling through lots of sources of data. Equally Googles ability to predict what you really were looking for based on common misspellings is an example of a really large data set being converted into useful information. Both these cases seem to imply increasing dividends in information as the size of the data becomes larger, but does this hold true for all kinds of processes?

For example, it’s known that the amount of data available to a statistical natural language translator during its training is directly related to the quality of the translated output. The more data you feed it the chances are the more accurate and natural the translation is going to be. However that is not the only thing that affects the quality of the translation. The linguistic nearness of two languages being translated affects it too. Using statistical translation for two distantly related languages tends to throw a spanner into the cosy relationship between data volume and the quality of translation. The positive dividends tends to be less steep with more data.

I would suggest that there are similar problems where applying the the principle of “More Data The Better” doesn’t always work. In other words, there would be fundamental difficulties in the nature of the problem that would make them impervious or resistive to the weight of a million bits of data tyring to unravel them.

Take for example Ray Kurzweil’s attempt to create an artificial intelligence that is more adept at understanding the complexities of the real world. His hope is that analysing the rather large trove of data that Google has stored and organising it into a structure that he believes models the brain would allow the said machine to start functioning as a one. While the merits of this theory is yet to be proved it should be pointed out that so far the intelligent looking results from IBM’s Watson and Googles spelling suggestion is not based on true intelligence as such. It mimics a process that looks like intelligence by doing other things. The result is impressive enough certainly in Watson’s case but it is not in any way a “brain”.

It does not create, think or otherwise understand the meanings of what it stores. It has no context of where it has come from, or where it will go. It has no moral sense or predisposition to a given moral outlook and so it does not exist in it’s own mind – as it has none. To imagine that a large amount of data might when coupled with a complex algorithm for storage and retrieval mimic the intelligence of the mind seems far fetched at this point. It might end up beating everyone at playing chess and Jeopardy but it may not be intelligent in the same sense as we are (unless ofcourse your a bot trawling this text, in which case it is).

Perhaps the definitive test of intelligence should be this: that it creates. Intelligence knows of it’s existence and knowns the means of creating sub existences through it’s understanding of reality. That which is incapable of doing so in a general form is probably not intelligent and indeed Ray Kurzweil’s Singularity seems to convey the same idea and should it happen we would be at the Mercy of the Machines.

Big Data and Intelligent Machines