It’s been a while since I did anything a little more basic stats-y, so here’s something I was thinking of the other day: what are my chances at winning anything in ESPN’s Streak for the Cash competition?

The competition is pretty straightforward: ESPN lists a variety of either/or picks in the world of sports for a given day. Sometimes it’s a straight winner pick (an example for today: who wins between Texas A&M and Texas Tech?), sometimes there’s a spread involved (Ohio State wins by double digits versus Minnesota win or single-digit loss), and sometimes it’s a little more interesting (Florida’s margin of victory versus Alabama’s number of made three-pointers). You can make a pick for any event that’s listed, but you can only make one pick at a time and once a pick has started (usually the start of a game, or occasionally there’s something about the second half only) your pick is locked and you can’t make your next pick until that outcome is decided.

I have to admit I haven’t looked at all my options, but there seem to be two main ways to win. You could have the most correct picks in a month or you could have the longest streak of correct picks in a month. So, what are my chances of winning?

There are a couple things we need to know. I need to know how many picks I can make in a month, and I need to know my accuracy. Since ESPN is doing this for fun, it isn’t terribly difficult to find ‘lines’ that are a bit off. For example, from what I can tell Texas A&M is about a four-point favorite tonight. That’s much better than a 50/50 winning proposition (well, ‘much better’ is a comparative term). So I pick at around 60% on my Streak choices. However, since I’m choosy, I only make one or two picks a day on average; I think in January I made 40 picks. So right off the bat, I can eliminate myself from winning by having the most correct picks: the leader right now (only halfway through the month) has 47 wins, or more correct choices than my total choices.

But, let’s say I want to know my chances. R has a function called rbinom that will be useful. What this does is create a number of binomial outcomes (1s or 0s, like wins and losses) based on some number of observations and some chance of a win happening. For example, if you put in rbinom(1,1,.5) it simulates a single coin flip. If you put in rbinom(10,1,.5) it simulates 10 coin flips. You could actually do that two different ways; you could also do rbinom(1,10,.5). In that case it spits out a single number with the number of ‘heads’ or wins that occurred in those 10 flips. If I wanted to look at five different occasions of flipping a coin 10 times, I would use rbinom(5,10,.5). So let’s say I want to know how many wins I can expect in a month. Let’s assume I make 40 picks a month with an accuracy of .6 (60%), and simulate 1000 months. That would be rbinom(1000,40,.6) (which you might want to save to a variable, like x=rbinom(1000,40,.6), so you don’t fill up your screen and so we can play with the output). I can a bunch of numbers, like 13, 20, 20, 23, 20, 21, and so on. These represent the total number of wins I would get in 40 picks at 60% accuracy in a number of different instances.

So how would I do? If I look for my best outcome, I would check max(x) and get 33. So in these 1000 pretend months of making picks, my best month was 33 wins. Obviously it’s possible to get more; if I try 10,000 ‘months’ I get a max of 35. More observations means more of an opportunity to get extreme values; if I ran enough months I should get a perfect run. But this is my top performance, and I wouldn’t even make the leaderboard for halfway through the month. I simply don’t pick enough; I’m trading picks for accuracy. Let’s say I was willing to pick as often as possible even if it dropped me to chance, or 50/50. Maybe I could crank out three picks per weekday and four on the weekends for about 95 picks a month. max(rbinom(1000,95,.5)) gives me 63. That still doesn’t make the leaderboard for January; the winner made 102 correct picks. I don’t even know how that happens. So, I’m giving up on having the absolute most picks; my chance of winning is essentially zero.

How about the Streak? The current streak leader is at 20, although there are two active streaks at 19. January’s winner actually got up to 27, with the next two entries sitting at 22. What are my chances of getting a streak around 20? We can’t use what we just did, because that just counts total wins, not wins in a row. We could fall back on the old binomial distribution; I need 22 wins in a row with a 60% chance of getting a win. So I could say my chances are .6^22 = .0013%. However, this isn’t quite right; that’s the probability that I get 22 correct picks out of 22. I want to know my chances of getting a streak of 22 out of, say, 40 picks. A streak could start at any point; I could get the first one wrong then rattle of 22 in a row, or go 3-2 in my first five (losing the fifth) then go 22-0, and so on.

It turns out this doesn’t have an easy solution. I could simulate it again using rbinom; I could make a huge series of picks and then look for streaks. For example, rbinom(10000,1,.6) would give me 10,000 ones and zeros. I could then write a little program to sort through all that and count up streaks of 1s, then look at those streaks. But that’s a little complicated, and also loses the fact that my streak starts over at the end of a month even if I’m on the way up. Instead, I’ll use a neat little tool from a gambling forum.

Aside: note that their calculator is geared towards streaks of losses. Why? Because you want to know your chances of going broke. If I pick with 53% accuracy and bet a tenth of my money every time, what’s the probability I go broke? All you need to go broke is a streak of 10 bad choices; with a 47% loss probability and 500 picks, there’s almost a 13% chance of going broke. Thus, don’t bet 10% of your money at a time if you plan on betting long term.

Ok, so back to the Streak. Wins and losses are complements, so I’ll put in 60% as my probability of a loss. I can get off about 40 picks in a month, and I want a streak of 22. The calculator says I have a .011% chance of that coming through, which isn’t great but is 8 times larger than what I estimated using the binomial distribution; those opportunities for different places to start the streak add up. Unfortunately, it isn’t easy to increase that number. If I managed to crank up to 50 picks a month while staying at 60%, the probability only moves to .016%. If I stay at 40 but get my accuracy up to 65%, I make a fairly large jump to .056%, which is still a small number. So I don’t have a great chance of hitting such a streak, but at least there’s a chance. For total wins I can barely imagine making as many picks as last month’s winner had correct picks.

So this is a handy little tool. And since it’s based on the binomial distribution, it can apply to lots of things. What’s the probability that a team with a 70% win percentage will have a 4 game losing streak? In basketball, with 82 games, it’s a fairly high 36.8%. In baseball, with 162 games, it’s 60.2%. So you may or may not see a team like the Miami Heat drop four in a row; it’ll happen about every third season. But you should expect most baseball teams to lose four in a row at some point, particularly because no teams finish the season at .700. The same idea would apply to free throws, throwing a strike in bowling, and so on. The only caveat you would want to apply is if each instance is independent and occurs at the probability you put in. For example, the Heat are unlikely to actually be 70/30 favorites in each game they play due to differences in the opponents, being at home or away, and so on. Free throws are probably cleaner, but the player will have different levels of fatigue or concentration before each shot. So it’s important to remember that this is just an approximation. But it’s a useful approximation that you can use in a number of ways.