I don’t know if my reading of this Orwellian* piece is in sync with what Rameez intended, but he thought it was fine for me to post it here. See what you think:
“Big Data or Pig Data” (A fable on huge amounts of data and why we
don’tneed models) by Remeez Rahman, computer scientist: posted at Realm of the SCENSCI
There was a pig who wanted to be a scientist. He was not interested in models. When asked how he planned on making sense of the world, the pig would say in a deep mysterious voice, “I don’t do models: the world is my model” and then with a twinkle in his eyes, look at his interlocutor smugly.
By his phrase, “I don’t do models, the world is my model”, he meant that the world’s data was enough for him, the pig scientist. The more the data, the more accurately the pig declared, he would be able to predict what might happen in the world.
Around that time, some dogs opened a pub called, “Doogle” which was visited by all animals in the jungle. The wine was delicious and the traffic at the pub was unprecedented. The dogs became rich and famous; they also obtained a lot of data from the visiting animals. They bought even more pubs and collected even more data about their customers.
Now, they wanted to analyze this data to attract even more customers towards Doogle. The pig saw this as a big opportunity and gathered other like-minded pigs. The drove of pigs helped Doogle in applying pigstatistical methods (ham-correlation formulation etc), to predict various things including: kinds of animals attracted to
the kinds of beverages; drinking patterns of different animals; the kinds of tables liked by classes of animals; arrival times; number of glasses Doogle would need in the near future, etc, etc, etc. To an astonishing degree, the pigs made quite accurate predictions using their pigstatistics.
The services of our pigs were acquired by other entities including FaceSlap, Barker, and Snorter, among others. Our heroic pigs helped their clients in outshining the competition. In fact the pigs method of collecting huge amounts of data and then applying pigstatistics on it came to be known as “Pig Data” in their honor.
In the meantime, somewhere in the jungle, the group of owl scientists who had through history been making models and theories and performing experiments based on them, were now being told that it was all meaningless; that their approach was worthless. The owls didn’t pay any attention, even though everyone else was euphoric. However, if the truth be told, some owls did lose heart and became so demoralized that they gradually transformed into pigs! And immersed themselves deep in the world of pig-data.
From time to time, Doogle, FaceSlap and others, would make some modifications, such as changing the color of the wine-glass and seeing how quickly people reached for the glass based on the color. Upon analysis of the customers reactions, the pigs could then analyze which color resulted in the fastest response-time. So this was the era of pig-data. The pigs had won the battle. Pig data was everywhere.
But the fact is that our hero-pig, whom we met at the beginning of our fable, was still not happy. He felt that things were only getting started. He wanted to replace the owls completely. What’s more, he wanted to predict EVERYTHING. He wanted psychohistory, as the ‘good doctor’ of old had dreamt. Yes sir, predicting everything was his goal!
He decided to start his quest by studying falling bodies. As was his norm, he collected data about all instances of all objects falling down all over the place. He now had huge amounts of data, and he applied pigstatistics on it. He discovered that more things fell in the morning and during day-time, when animals were awake, and fewer things fell during the night when animals were sleeping!
He shared his findings in front of the whole jungle, looking directly at the owls, who were also present. The chief owl, called Owlileonewtein, countered that while such information could be useful, it did not explain much. Why did bodies fall? At what rate did they fall? What were the relevant factors, etc?
On hearing this, the pig positively beamed with joy because he had come prepared.
He announced proudly that he had found a correlation between the weight of the body and the speed of falling. His stats told him that while heavy things fell at a great speed, light things such as animal hair, bird feathers, etc fell much more slowly. “So therefore”, he thundered, “I have discovered the law of falling bodies. Heavy goes fast; light goes slow.” All the animals clapped in joy. The law of falling bodies had been discovered!
Upon hearing all this, Owlileonewtein, the chief owl, said forcefully, “But this is not correct. If we ignore friction and air resistance, I can tell you that all bodies, regardless of their heaviness, fall at the same rate. Indeed consider a frictionless plane…”
But as soon as he said this, the pig snorted, “Frictionless plane? My dear animals, has anyone ever heard of such an oxymoron?” All animals laughed.
Owlilelonewtein protested: “No, based on my model, we can do suitable experiments to test it…”.
On hearing this, the pig suddenly got very serious and menacing. He lifted his paw and pointed it at Owlileonewtein, “You sir, are a relic of the past. Your way of doing things is over. Haven’t you heard what my fellow pig scientist, Peter Norpig, head of pig intelligence at Doogle, has said, ‘All models are wrong, and we can learn models from data.’ So enough of your models and enough of your model-based experiments. We need neither! All we need is pig-data!” And with this, the pig in his furious excitement stood up on his hind-legs, and shouted, stretching the word ‘pig’ with the full force of his pig personality:
“Piiiiiiiiiiiiiiiiiiiiig!” And the animals responded: “DATA!”
“Piiiiiiiiiiiiiiiiiiiig” — “DATA”! “Heavy goes fast; light goes slow!”
Having demonstrated his power to the owls, as a last act of annihilation, he picked up a stone from the ground and tore away a strand of hair from his tail. Holding one object in each fore-leg, he dropped them at the same time. The stone reached the ground much earlier than the strand. With this, he dusted one fore-leg against the other, and then turned around to show his backside to the owls. He shouted triumphantly, one last time:
“Heavy goes fast; light goes slow!”, “Heavy goes fast; light goes slow!”
“Piiiiiiiiiiiiiiiiiiiig” — “DATA!”
That’s great. (And I think Tukey, as well as Newton, would approve.)
I believe in model-based analysis. The real utility of statistics is in the context of model-based analyses. To tie that thought in with Pig Data, one thing that’s always bother me about (for example) neural networks is that they lack the ability to extrapolate. (Someone will hopefully correct me if I’m wrong about that.) They’re useful for interpolating – particularly when outputs are nonlinear functions of input variables – but not useful for making predictions.
Undoubtedly the fact that I work in the physical sciences colors my view, but physical observables have physical origins and I believe you need models to establish the connections between cause and effect. Write down an equation. Make a prediction. Conduct an experiment. How does observation compare with prediction? Use the discrepancy to develop a deeper understanding of the phenomena being observed. Revise your equation and update your prediction. Repeat as necessary.
Thanks Chris G. To avoid biasing reactions, I’m withholding “the moral of the story according to me” until at least 6 people comment.
I try to use one sentence to conclude the story: Theory without data is useless, while data without theory is dangerous.
btw @Chris, I think I haven’t heard the poor extrapolation of ANN, can you please suggest some reference that I can read? Thanks.