This post by the author was published on Cricketcountry on 24.12.2012. However, this seems as good a time to revisit the topic as any
Rewind to World Cup 2011. Sachin Tendulkar races to a scorching 111 off 101 balls against Dale Steyn, Morne Morkel and other Proteans. His glittering innings is studded with three sixes, one of them a blistering hook off an express Steyn bouncer. Indians pile up 296, and look well established in the driver’s seat. However, Faf du Plessis and Johan Botha gallop along a lightning quick home stretch before Robin Peterson mauls more than the 13 required runs from the first four deliveries of the final over bowled by Ashish Nehra.
What followed demonstrates one of the most vitriolic vagaries of Indian cricket. The man, who in another three weeks would be worshipped by millions as the all-conquering “Captain Cool”, for the time being became target of the most abusive avalanche of criticism in the media and social network circus.
But, venting wrath on Mahendra Singh Dhoni alone did not quite placate the fuming fans. Their roving eyes soon discovered a spotless piece of brilliance invitingly laid out, almost asking to be stained by poison tipped brushes splattering murky graffiti, much of it tinted with the dirty green of envy.
A few days earlier, Sachin Tendulkar had teed off with a 115-ball 120 at the Chinnaswamy Stadium against England, and India had raced to 338. A steady158 by Andrew Strauss, aided by some ordinary bowling at the death, had ensured a thrilling tie. For the Indian fans, who had been quite sure of victory when England had required 52 off the last five overs, it was as bad as a defeat.
Can tantalising correlation of this kind, repeated in such quick sequence, be resisted? Whispers started drifting around the cricketing grapevine, facts were rearranged into make-believe analysis on social, electronic, digital and print media. Chinese whispers bubbled through the network of fibre optic cables, posted, shared and tweeted around millions of PCs and mobile phones. A moronic myth was born: India loses whenever Sachin Tendulkar scores a hundred.
Of Tendulkar’s 48 ODI hundreds, a few of the 13 in which India ended up second best, were variously revisited and recalled.
Did they not show that Tendulkar is a fighter against odds, the last man standing when lesser players had given up long back? What of the 33 centuries in wins, and one apiece in a tie and an abandoned match?
Well, far too much factual accuracy for the lip-smacking attractions of such a phenomenal urban legend. It is so much easier to indulge in the pseudo-statistical witch hunts. Recurring repetitions of the rumour ensued, powered by the might of forward and re-tweet buttons, and soon the myth assumed proportions of universal truth.
The Tendulkar admirers protested for a while, but remote examples were flashed as incontestable exhibits. Single-handed battles against defeat were held up as proof of the supposed negative correlation.
His 143 against Australia in 1988 at Sharjah, his 141 against Pakistan in 2004 at Rawalpindi, his 175 in 2009 against Australia at Hyderabad… Some of the maestro’s best innings were placed under the glaring fire of fanaticism, to scrutinise and detect flaws – to burn holes into the magnificent credentials if required.
Even in statistically-savvy forums was floated a question by armchair analysts, and I quote: “Sachin scoring over 65 increases the chance of India losing the match. Any data on this? In the World Cup it was true 90% of the times.”
* It was doubly strange.
In the 2011 World Cup, Tendulkar scored 120 against England in a tie, 111 against South Africa in a loss and 84 against Pakistan in a win. So, three 65-plus innings had been played yielding three different results, leaving one to wonder about the 90% figure. But, criticism and data never go hand in hand, especially when Tendulkar enters the scene – as has been indicated in this article. Facts can be sucked into the vortex of whirling rumours in the digital age.
Such a provocative counter intuitive assertion, backed not by proof but one-off examples, is always attractive. It is this same fascination for curious correlations masquerading as truth that draws mankind to search for patterns in tea leaves, life lines on the palm, alignment of planets during birth and positioning of the bed with respect to the door.
Interestingly, there is a scientific name for such pseudoscientific phenomenon – Pareidolia, the perception of significant pattern where there is none.
There was a second reason that puzzled me. How could such an asinine assertion stick around without the robes of ridiculous reason being slashed with logical arguments till the bare nakedness of ignorance was exposed in public eye?
The problem essentially boiled down to the following hypothesis: Tendulkar playing a major innings increases India's chance of defeat. And in so minutely recorded a sport as cricket, the data is available for everyone to investigate and interpret. A reasonably trained statistician can easily determine if there is any truth in the statement.
Binary Logistic Regression
So, let us see how the claim fares in the face of unbiased logical analysis.
There are two parameters here - Tendulkar’s score and the result of the match. We are interested in knowing how the first affects the second.
Many a times in life we encounter problems of inferring how one variable depends on others. Think of a simple situation where you are not quite sure how the salary is structured in a company. However, you do know that three of your colleagues earn basic pays of 15,000, 17,500 and 20,000 and the corresponding take home salaries they carry back are 28,887, 34,012 and respectively 39,005. (You can fill in whichever currency symbol that suits your dreams).
It should be more or less reasonable to conclude from the available information that at the end of the month, the money deposited in the bank is approximately [(2 x the basic pay) – 1000]. Having derived this equation, you know that if your basic pay is 25,000, you can expect around 49,000 as your take home salary.
The statistical technique most often used to find out the relation of the resultant with the influencing variable(s) is known as regression analysis. (In this example, take-home pay is the resultant variable and basic-pay is the solitary influencing variable.)
However, the problem we are looking at is slightly different. Here we do have a numerical influencing (x) variable representing Tendulkar’s score, but the resultant (y) variable can take only two values, win and not-win. In other words, the y variable is binary – where a win may be denoted as ‘success’ and not-win as ‘failure’. We use the term not-win to absorb the ties and the no-results, queering the pitch further against the maestro, loading the results against him by counting anything that is not a win as failure.
In such situations, the statistical technique used to look at the data and decipher how a numerical xinfluences a binary y is called Binary Logistic Regression. This particular method takes all the available past data into consideration, and predicts the chance (probability) of the y variable being a success given the numerical value of x.
In other words, Binary Logistic Regression equation, fitted on the 442 data points denoting Tendulkar’s scores in each match along with the match results, provides us with the estimated probability of India winning the match given his scores x number of runs.
The result of this analysis is shown in the graph provided at the top of the article. Incidentally, the graph has been generated by feeding the available data into a popular Statistical package, called Minitab, thus taking elements of human bias absolutely out of the equation.
The following table summarises the results – which can also be verified from the graph:
Tendulkar's score | Probability of Indian win |
0 or cheap dismissal | 40% |
40 | 50% |
50 | 53% |
75 | 60% |
100 | 68% |
150 | 75% |
175 | 80% |
If indeed, as claimed by the clamouring critics, a Tendulkar century increases chances of a loss, thereby decreasing the likelihood of a win, then the graph should have dipped when it neared 100 on the horizontal axis, thus indicating a lower probability.
However, as established by the graph and reconfirmed by the table, the available data shows that reality is diametrically different from the laughable allegations. The graph proceeds upwards as Tendulkar's score increases, without caring for the manufactured intuitions of naysayers. The probability of a win becomes higher and higher with each added run. At 100, the graph shows a healthy 68%, which tells us that according to the historical data encompassing 442 innings, if Tendulkar scores a century, there is 68% probability of India going on to win the match.
As it traverses an upward path, unchecked by the raucous rambling of mathematically challenged armchair analysts, the rising line seems to pierce the heart of the fake fable near the hundred run mark and go right through it, continuing to rise even as the score becomes higher and higher.
Does the myth blow up embarrassingly in the faces of the critics?
Well, the American educator Laurence J. Peter was spot on when he memorably observed, 'Against logic there is no armour like ignorance'. Hence, it may be too much to expect enlightenment or conversions into rationality from this statistical exercise. A myth born out of the dangerous fumes of little knowledge, fed on addictive lures of spurious patterns and mindless malice, may explode when faced with facts, but will continue to thrive in the unlimited delusional space available for the multitude who would rather keep their eyes wide shut.
This analysis is for those adherents who still retain the ability to listen to the ripples of reason beyond the deafening din of detractors.
No comments:
Post a Comment