How a group of horse gamblers taught us a critical blind spot in decision making – and what it means for baseball’s future

In 1974, Paul Slovic – Professor of Psychology at the University of Oregon – put a group of professional horse gamblers to the test.

Slovic, a pupil under Nobel researcher Daniel Khaneman, designed a series of horse races. The gamblers – men and women who made a living off their winnings – were asked to do two things: Predict who they thought would win, and estimate how confident they were in their prediction.

In the first trial, Slovic gave each gambler five pieces of information. What he gave them was up to the gambler’s personal preference – age of the horse, jockey, years of experience, previous victories, etc. Ten horses, on average, competed in the races. If the gamblers did possess an advantage, Slovic would expect their predictions to eclipse 10 percent. In round one, they did. The gamblers were 17 percent accurate in predicting the winning horse. Their confidence level was 19 percent – right about where it should have been.

In the subsequent trials, Slovic increased the amount of information he gave each gambler. He ran trials where he gave 10, 20, and 40 pieces of information about the horse and jockey – all personalized to the gambler. What he discovered was groundbreaking. Despite the increased amount of information provided to each gambler, the accuracy of their picks flatlined around 17 percent. What did change, however, was their confidence. When presented with 35 additional pieces of information, their confidence doubled to 34 percent. More information did nothing to improve their prediction accuracy, but it did made them a lot more confident. And if you’re more confident in your picks, you’re much more likely to put higher amounts of money behind them. Thus, making the gamblers more susceptible to higher losses.

The problem wasn’t the information Slovic had provided. He had given them everything they could have possibly needed to improve their predictions. The problem was the gamblers had already made their minds up well before presented with additional information. They didn’t use the additional info to change or modify their initial predictions. They merely cherry picked the information presented to them that confirmed their initial guess. It was a textbook example of confirmation bias.

More wasn’t more helpful. It was more dangerous. The gamblers became blinded by confirmation bias when they had the opportunity to pick and choose which information they believed to be true. And as with much things in life – the more we think we know, the more trouble we get ourselves into. Making predictions in an unpredictable world is not so simple.

But it doesn’t stop us from trying.

…

In 2017, Sendhil Mullainathan – professor of Behavioral Science at the University of Chicago – headed a research study that uncovered a glaring flaw in our judicial system. In the study, him and three others gathered records from 554,689 defendants in New York City from 2008-2013. Of those cases, 400,000 were released by human judges.

Mullainathan’s team decided to test the accuracy of these decisions using machine learning. They built out an artificial intelligence system, fed it the same information, and asked the system to make its own list of people to release. Crimes while awaiting trial were examined to determine efficacy of decisions.

When they compared the two lists, it wasn’t even close.

The people the machine learning system released were 25 percent less likely to commit a crime while awaiting trial. In other words, one out of every four people the human judges released ended up committing a crime before trial. And it gets even better. One percent of the defendants were flagged by the machine learning system as “high risk” – meaning they were more than 50 percent likely to commit a crime prior to trial. The human judges released 48.5 percent of these defendants.

To put this into perspective, human judges released over 2,600 high risk defendants the computer never would have even considered releasing. Mullainathan’s computer didn’t just beat the human judges. It annihilated them.

In Mullainathan’s study, the judges had access to three sources of information when making bail decisions: The record of the defendant (age, residence, work, previous offenses and bail decisions), the testimony of the district attorney and their lawyer, and the evidence they accumulate through their own senses. Every person brought to trial faces the judge eye to eye, as well as their families, spouses, and children – if applicable.

The computer, on the other hand, only had access to two things: The defendant’s age and rap sheet. That’s it. The computer didn’t see if the defendant displayed remorse or if their spouse was in tears. It only saw what they had done. And it was pretty damn good at using it to figure out what they would do next.

As it turns out, the additional information the judge had access to did not help increase the acuity of their decision making. It clouded their judgement. More – as seen with Slovic’s gamblers – was not more useful. It was more dangerous. Which means if we want to improve the predictions we make about the future – which are highly volatile by nature – we need to learn how to learn on people who don’t have the same context as us.

Sometimes less – as this former NBA General Manager discovered – can be more.

…

At this point, Daryl Morey had seen enough. Over the past 10 years, Morey had spent hours interviewing NBA Draft prospects as General Manager of the Houston Rockets. He noticed a theme: Of all the players he interviewed, the ones that seemed to posses the most charm were the ones who stood six inches above everyone else. Extremely tall people seemed to steal the show in draft rooms.

He talked about this in Michael Lewis’s book The Undoing Project. Morey said, “There’s a lot of charming bigs. I don’t know if it’s like the fat kid on the playground or what. But they all have a story.”

In these stories, Morey was enchanted with rags to riches narratives about how these players faced insurmountable odds, fought through obstacles, and ultimately scratched and clawed to have an opportunity to play professional basketball. Initially, it worked. Him and his team often found themselves picturing these players as successful NBA talent. They fell in love with the interviews, until Morey decided to step back and look at them a little differently. You didn’t need charisma to grab a rebound. But his team was becoming suckers for players that seemed to posses it.

Draft interviews were supposed to give teams valuable information about potential professional prospects. In reality, they were magic shows. The stories each player shared weren’t prophecies. They were a prospect’s attempt at luring in a big check by creating a memorable first impression.

And this – as Morey learned – was the key to getting confirmation bias on your side.

…

Consider the following situation:

You’re the Scouting Director of an MLB organization. You just sat down and had an interview with a potential first round MLB draft prospect. The young man greeted you with a firm handshake, made consistent eye contact, and had well thought out responses. He answered every question you asked with ease, thanking his coaches and teammates for the success he has had to this point in his career.

He told a story about how his parents gave every penny they had to help support his dreams in baseball. His biggest goal in life was to pay back their efforts and make them proud by achieving his dream of playing professional baseball. At the end, he concluded with another firm handshake and thanked you for your time. But by that point, your mind had already been made up.

You decide to follow up the interview by going out and watching him play in person. During a big moment in his high school playoff tournament, the young man is pitching with a one run lead and two outs in the sixth inning. On a 1-1 pitch, he paints a breaking ball on the outside corner. He doesn’t get the call. Frustrated, he throws his arms up and barks at the umpire. On the next pitch, he proceeds to yank a fastball into the dirt. His body language still reflects his frustration from the blown call.

On 3-1, he grooves a fastball down the middle that gets belted into the left center gap. His center fielder makes a play on it, but the ball just tips off the edge of his glove. It dribbles towards the warning track, scoring the go ahead run.

He proceeds to get the next out, only to jaw at the umpire all the way back to the dugout. Once in the dugout, he airs out the center fielder for not catching what could have been the third out. His head coach tells him to cool down, only for the young man to get even more frustrated. He throws his glove and takes a seat at the end of the bench, where he remains for the remainder of the game.

As a scout, you now are forced to make a difficult – and surprisingly common – decision: Do you trust your initial impression of him and write this incident off as a poor lapse in judgment, or do you go against it and rethink your initial impression?

If you’re like Slovic’s gamblers, you overlook it. You trust the initial interview. You remember how he answered your questions and how he deflected praise, showed gratitude, and described how important it was to be a great teammate. Considering his competitive nature, you write off the incident as a one time thing. After all, he is a kid. It’s understandable for the emotions and the stakes of the game to get the best of anyone – let alone a young man who can’t even have a beer legally. It happens to big leaguers. And that was a bullshit call, too.

Daryl Morey, on the other hand, would be skeptical. The interview is done in a controlled setting. The prospect can share – or not share – anything he decides to. He knows how to present himself, answer specific questions, and leave you with the impression that he is worth taking a chance on. And he knows – like most of us do – how important a strong first impression is. But what’s a strong impression worth when the walk doesn’t match the talk?

The hard part isn’t putting together predictions. We’re really good at that – with or without information. The problem is recalibrating outdated ones. We’d rather go down with a sinking ship (sunk cost fallacy) than change our stance on something we once believed to be true. We’ll slam tables screaming to take a chance on a kid because of how he made us feel in his draft interview. Everything that doesn’t support our stance gets rationalized away (e.g. he had a bad day, it was an emotional game, the umpire really did suck). And this is how we dig our graves.

If we want to improve the predictions we make in an unpredictable world, we cannot let our reasoning stem from our rationalizations. It must come from somewhere else. As Mullainathan and Morey discovered, a good place to start is by taking human judgement out of the equation.

Baseball’s “new school” revolution of data and analytics is here to stay, but it goes much further than identifying vertical fastballs and high spin breaking balls. Uncovering untapped potential helps the ones who are already there, but the ones that have yet to make it represent a much larger population. And we miss much more than we hit – not because of an absence of information. But because of how that information is filtered.

Which means we need a system that’s unafraid to call bullshit when our filtering system only shows us what we want to see.

…

Computers – as seen in Mullainathan’s study – could care less about what happened and why. Numbers don’t get caught up in the feelings and emotions of human beings. They simply tell us what happened. People – as seen with the judges – try to come up with explanations that lead us to a decision that feels right. But just because it feels right doesn’t mean it is right.

If we want to avoid our tendency to explain away what we don’t wish to be true, we need a system designed to call us out when we’re wrong. We can’t explain away what can be fact check – which is where data comes into play.

Morey – popularized as basketball’s version of Billy Beane – ascended to General Manager (GM) at just 33 years old because of his background in advanced analytics. While with the Boston Celtics, be built out and tested an objective model for player evaluation. He carried this over to Houston, where he used it as the organization’s guide for selecting an evaluating amateur talent.

Some years – like 2007 – it worked really well. But players like Joey Dorsey – 2008 first round selection – were a great reminder that it was still fallible. The one thing it was not, however, was stagnant. If Morey missed on a player, he didn’t blame the data. The numbers only told him what the numbers could see. Better predictions, thus, relied on better data.

But the key to Morey’s system wasn’t the fact that he valued what could be quantified. It was his paranoia. Trailblazers cannot use tradition as their saving grace. If Morey couldn’t figure out a better way to evaluate players, he wouldn’t just be out of a job. He would be laughed out of the league. There was no room for error to fall for a sob story from a 19 year old who just spent the past year on full scholarship at the University of North Carolina Chapel Hill. He had to get it right, so he stopped listening to the people who never admitted they were wrong. Which is what lead him to revolutionize how NBA teams evaluated the draft.

Baseball – as seen with Billy Beane’s “Moneyball” Oakland Athletics – is no stranger to data. But the influence of data goes much further than advanced statistics. What we can quantity gives us the ability to fact check what we aim to rationalize. The greatest test of an evaluator is not the predictions he missed on, but his ability to recalibrate a prediction when faced with newer and better information. Data gives us the opportunity to have these conversations.

Seth Partnow, former Director of Basketball Research for the Milwaukee Bucks, said it best: “Analytics at its heart is about asking the right questions.” If we can ask better questions, we have the ability to make better predictions.

But only if we are seeking to ask questions in the first place.

…

We can never and will never completely eliminate confirmation bias within talent evaluation in baseball – let alone, any sport. But I think we can do a better job at hedging our risk. This starts by understanding and recognizing when we are under its spell. First impressions are not prophecies. They are merely the first bit of information we’ve collected about a prospect. Future information must be collected with an open mind. If it contradicts our initial findings, that’s perfect. We’re not looking for agreement. We’re looking for asymmetries that illuminate hidden red flags.

For every person in the draft interview room, we need several people who never step foot inside. Their job is just as important as the people who are inside asking questions. They don’t have an emotional marriage to the prospect. They simply see the prospect for what they are – or aren’t. And they’ll be quick to call you out when you’ve trying to see more than what exists.

The same principle exists for evaluating games in person. Some people need to be there, but others should step foot in a stadium. Trackman data doesn’t get caught up with how a pitch “looked” or “felt.” It just tells us what the pitch did. It doesn’t mean the pitch has to have off the charts metrics, but we cannot gauge a good breaking ball off the eye test. We need people who can fact check our claims. Our eyes will want us to believe one thing, but the data will always keep us honest in the end.

Baseball is still young in its application and understanding of how data influences performance. While we have much to learn, I think our greatest opportunity for growth involves how we utilize it to mitigate the fallibility of human decision making. If people are involved, it’s not a matter of if – but when: We will jump to conclusions prematurely, explain away behaviors that don’t match our expectations, and fail to modify our predictions when faced with newer and better information. This doesn’t make us bad people. It only makes us human.

The best going forward will learn how to minimize these interactions through a diverse team, an objective system for evaluation, and a healthy sense of paranoia. The ones who are afraid of being left behind will always be ahead, and it’s not because they’re constantly getting it right. It’s because they’re learning how to be a little less wrong with each iteration.

And I think that is a pretty good place to start for all of us.

Leave a Reply Cancel Reply