As Krajicek Surges, What About the Rest of the Americans?



style=”display:inline-block;width:320px;height:100px”
data-ad-client=”ca-pub-1375597087459488″
data-ad-slot=”2352430372″>

Writer’s Note: I have added a Tableau Public version of the data used here that is very interactive. Follow this link

With Austin Krajicek reaching a career-high #121 in the world, I decided this was a good time start looking into the current top-level Americans and where we stand.

Let me start by saying I am just stunned by the fact that Austin has surpassed Ryan Harrison in the rankings. I am not sure how long this will last, but Lefty has really outperformed ‘my’ expectations. Congratulations to him. I always thought his left-handed, slicing game was unique. When he is serving well, he is a tough out.

But as we head into the clay court season, American fans can start to expect the inevitable mantra of the media — Where Are The Americans? This will evolve into stories about failures in player development throughout late April and May until the declaration of the Next Great Hope, when a young American advances to the semis of the Roland Garros Juniors.

I hope to provide a little insight into where these current Americans stand. I know most people prefer to see something visually, so I have included several charts and graphs.

Americans in Top-200
Americans in Top-200

This will be at least a two part series. Today I will focus on what we have now. In a later piece I will give some ‘comps’ on other players worldwide.

Right now we have 14 Americans in the top-200 of the ATP rankings. I admit, I arbitrarily selected top-200. That and the fact if you are near the top-200, you are in the running to get into Grand Slam main draws.

Development of Americans 4-2015
Development of Americans 4-2015

From the graphic above (click it to enlarge), you can see I have charted each American by ranking at his exact age (year-month). Obviously there is more than one ranking period in each month, so I have averaged them to form a single entry for that ‘age’.

In the United States, our players develop differently. Some start playing ITFs early, some don’t. Some play periodically and play college tennis and some go full-time pro. You can see this in the spaghetti portion of the graph on the left side. (You can also see gaps where guys get injured or take breaks for college).

To get a closer look, let’s break this out into a couple of groups. I will call them the Young Guns and the Old Timers.

THE YOUNG GUNS

Young Guns ages 17-23
Young Guns ages 17-23

Led by Donald Young, the Young Guns all appear to have reached some level of success early. Of these, only Novikov attended college.

Look closely at how they’ve ‘matured’. All of them reached some early success, but you really can see those who climbed the rankings earlier, like Young and Harrison. Donaldson is right there with them. A little bit later you can see Querrey, Kudla and Sock. Novikov and Fratangelo seem to be running at about the same pace. They also both seem to make these dramatic leaps every once in awhile.

Have Querrey and Young ‘peaked’? This graph only shows up to their 23rd birthday. Each has had a few ups and downs since and are currently sitting just inside the top-50. Young is about 25 and a half, while Querrey is a little past 27. Both have a good seven-to-ten years on tour left in them, if they can stay healthy. But how high can they climb? Querrey got as high as #17 back in 2011. Young is near his career-high of 38, which he reached in 2012.

Does Harrison have another run in him? He’s a curious sort, having seemingly been around forever. My memory is probably warped since I have known him since he was about 13 years old. This is a critical time for him and he needs to take advantage of it. A career-high of only 43, I think he can get into the top-30. I like that he and Grant Doyle have rekindled their partnership.

To me though, the most interesting one is Donaldson. Having taken the fast track, he has reached the top-200 by the age of 18 years and four months. Only Donald Young has done this faster in this group. Harrison achieved it at the exact same time. Where he goes next should be fun to watch over the next couple of years.

THE OLD-TIMERS

The Old-Timers
The Old-Timers

I cut this one off since they were later-bloomers. Other than Smyczek, they attended college for at least a period.

Rajeev Ram is the outlier here. He actually reached the top-200 earlier than the others, but never really stuck and bounced around, despite not being out of the top-300 for almost 10 years. He’s the definition of a grinder, only dipping into the top-100 but never really going away.

Isner’s ascendence was almost instant after graduating. Think about this: he didn’t even earn a single point until AFTER he was 20 years old. Heck, I was on the court taking photos when he lost in the second round of the NCAA’s his freshman year to Travis Helgeson. Then in one crazy summer he jumped about 700 places. Isner is an easy top-50 player. He’s lived in the top-20 for a long time and had a drink in the top-10. With his serve, he is always a threat, but how long much longer can he do it?

Smyczek’s career arch mirrors Ram’s quite a bit, however he has been dancing around that top-100 spot for almost a year now. He’s had quite the streak this year already, but started struggling this month. Getting prepared for the American summer will be key for him.

Buchanan, Krajicek and Johnson. What can you say about these three? They each are having success right now after some time in college. Stevie J. is a beast. I’ve watched him in person and once saw him trash a young Soren Hess-Oleson, 6-0, 6-1, from court side.

Buchanan is right there, but still has yet to make the top-100. What is his upside? Can he maintain his current position. Like Smyczek, this summer in the U.S. will be critical to his long-term development.

And then we are back to Krajicek. I am impressed with what he has done this year. He’s taken a less-dramatic path in the rankings than Smyczek, but reaching the same targets over time. I am anxious to see where he takes it.

NEXT: U.S. versus World — How these players stack up?

Acknowledgement: For years I have been tracking this data myself to some extent. I am grateful, however to Jeff Sackmann and the tennis repository he has accumulated at GitHub.

 



style=”display:inline-block;width:320px;height:100px”
data-ad-client=”ca-pub-1375597087459488″
data-ad-slot=”2352430372″>

Shaka v. Barnes



style=”display:inline-block;width:320px;height:100px”
data-ad-client=”ca-pub-1375597087459488″
data-ad-slot=”2352430372″>

With the announcement of Shaka Smart as Texas’ new men’s basketball coach coming as I write this, I thought I would put a few keystrokes down on how the two coaches teams have fared since 2009, when Shaka took over at VCU.

Since I wrote about luck (Pythagorean Luck) earlier in the week, let’s start there. I also want to point out that all of these number utilize regular season data.

Pythagorean Luck calculated using both actual points and adjusted efficiencies.
Pythagorean Luck calculated using both actual points and adjusted efficiencies.

As you can see by the graph, Smart’s teams seem to always outperform Barnes’. It is so bad that while Barnes’ teams only performed above expected values only in 2014, Shaka’s were consistently above expected. This is very comparable to Duke over the same time period.

Next, let’s look at some typical statistics that are very popular today: Tempo, Adjusted Offensive Efficiency and Adjusted Defensive Efficiency.

I’ll start with a few definitions. These are taken pretty much directly from Ken Pomeroy’s website.

Adjusted Tempo – An estimate of the tempo (possessions per 40 minutes) a team would have against the team that wants to play at an average D-I tempo.

Adjusted Offensive Efficiency - An estimate of the offensive efficiency (points scored per 100 possessions) a team would have against the average D-I defense.

Adjusted Defensive Efficiency - An estimate of the defensive efficiency (points allowed per 100 possessions) a team would have against the average D-I offense.

Stats comparison between VCU and Texas since 2009
Stats comparison between VCU and Texas since 2009

When we talk about adjusted tempo, VCU has played faster than Texas, especially over the last three years. This is probably due to his ‘Havoc’ defense. I am actually a little freaked out by the huge difference this season. That seems a little out of whack. Was Texas really just that slow?

When you look into Adjusted Offensive and Defensive Efficiency, Texas appears to seem pretty similar and a little better for the most part. The fact they are similar in all of these stats boils down to them both being defense first teams.

Both teams’ offensive efficiencies are above average (about 95.5). The average defensive efficiency during this time is a little greater (95.9). <why they aren’t the same is a discussion for another day>

One reason for Texas’ superiority in adjusted defense may be the way the statistic is calculated, giving weight to the opponent’s perceived strength versus the average. It is also interesting to note that Texas has much better numbers here in 2011 than VCU’s Final Four team that year. That would be the year Texas squeaked past Oakland before being upset by Arizona.

Who knows what the future brings? I expect Texas to be more exciting to watch, as they try to wreak havoc on their opponents. The Big 12 season is long and treacherous and never easy. The first West Virginia v Texas game should be exciting, as both teams will probably press for 40 minutes.

No matter what I have here, in the end, the only numbers that will matter will come in March.

Had Texas’ Luck Run Out? (a study in Pythagorean Luck)

This article is by no means intended to give a full picture of Rick Barnes’ tenure at Texas, but rather point out why he may have needed to go. I am a Rick Barnes fan and appreciate everything he did for Longhorn basketball, but I also recognize that, in the immortal words of Dr. Seuss’ ‘Marvin K. Mooney’, it was time for Rick to, “Go, Go, Go.”

The concept of Pythagorean Luck is derived from the difference between a team’s Pythagorean Winning Percentage (invented for baseball by the legendary Bill James) and their actual winning percentage. In layman’s terms, this is a difference between their expected winning percentage (based on actual offensive and defensive production) and their actual winning percentage.

Pythagorean Luck may also indicate whether a team is under or over performing. Teams tend to, “regress to the mean” or average out over a lifetime. Some years you have good luck and sometimes you have bad luck.

To find Pythagorean Luck you must first calculate Pythagorean Winning Percentage. This can be done many ways and I have decided to show two of the ways in this example. (All of the mathematical calculations are at the end, so be sure to read the whole article if you are interested in that.)

TEXAS’ LACK OF LUCK

Since 2002, Texas has been a predominantly ‘unlucky’ team. Using Luck, as determined by actual points scored, Texas had been ‘lucky’ (positive luck or performing above expectations) in only four of 14 seasons (2002, 2003, 2008 and 2014). Even in years where Texas seemed to have extremely good seasons, such as 2006 and 2007, they underperformed, based on scoring.

Pythagorean Luck since 2002
Pythagorean Luck since 2002

(NOTE: this evaluation is based solely on pre-tournament data)

Using Offensive and Defensive Efficiency, Texas hasn’t fared any better. Again, four of 14 seasons show positive luck for the Longhorns. This time 2008 and 2014 still rate positively, along with 2004 and 2006.

If we look specifically at results since 2009, only in the 2014 season did the Longhorns seem to perform above what would be expected, based on scoring and defense.

I am sure this is where we could have a more detailed discussion on the lackluster offensive performances of Barnes’ teams at Texas, but the point is this – Texas still should have won more games, even with the offense they were producing.

No team should continually be on the losing side of luck, let alone so far on the other side. As a comparison, Kansas is evenly split with seven years on both sides of the luck spectrum.

TEXAS’ LUCK SINCE 2002

YEAR W L PTS DEFPTS OE DE WIN PCT PTS PYTHAG EFF PYTHAG PTS LUCK EFF LUCK
2002 19 11 2360 2232 113.6185 99.4373 0.6333 0.6147 0.7032 0.0186 -0.0699
2003 22 6 2208 1923 118.7559 96.4505 0.7857 0.7609 0.7935 0.0248 -0.0078
2004 23 7 2314 1992 110.4368 93.177 0.7667 0.7782 0.7502 -0.0115 0.0165
2005 19 10 2243 2001 110.8698 96.7025 0.6552 0.7224 0.7078 -0.0672 -0.0526
2006 27 6 2512 1983 117.1721 94.3678 0.8182 0.8788 0.8023 -0.0606 0.0159
2007 24 9 2713 2374 118.9737 99.5146 0.7273 0.7537 0.7605 -0.0264 -0.0333
2008 27 6 2470 2142 118.6783 95.1518 0.8182 0.7674 0.8068 0.0508 0.0113
2009 22 11 2387 2160 109.2162 93.5709 0.6667 0.6979 0.7311 -0.0312 -0.0645
2010 24 9 2681 2300 111.595 94.5267 0.7273 0.7831 0.7454 -0.0559 -0.0181
2011 27 7 2547 2087 114.2741 89.9466 0.7941 0.8414 0.8248 -0.0473 -0.0306
2012 20 13 2411 2205 111.3908 96.3472 0.6061 0.6788 0.7188 -0.0727 -0.1128
2013 16 16 2084 2073 101.4622 95.2782 0.5000 0.5111 0.6003 -0.0111 -0.1003
2014 23 10 2446 2311 109.4989 96.3269 0.6970 0.6167 0.6962 0.0803 0.0008
2015 20 13 2242 1993 110.29 93.308 0.6061 0.7283 0.7468 -0.1223 -0.1408

This team has been underperforming for years and this year it was in record style.

It boils down to coaching. Luck happens. Luck changes. Poor situation coaching and poor player execution at critical times has haunted this program for a number of years and it was not getting better.

Rick Barnes’ luck had simply run out.

METHODS

As I stated above, to calculate Luck, you need to first calculate Pythagorean Winning Percentage. The first way is by using actual points scored and allowed and is similar to the way baseball calculates it, using points scored and points allowed:

Pythagorean Winning Percentage = (points scored ^ x)/(points scored ^ x + points allowed ^ x), where x is a value such anywhere between 1 and 18. This formula is credited to Bill James, who applied it to baseball. Houston Rockets GM Daryl Morey is credited with creating the first use of it for basketball.

Here’s a second way to calculate PWP, using Adjusted Offensive and Defensive Efficiencies:

Pythagorean Winning Percentage = (Adjusted Offensive Efficiency ^ x)/(Adjusted Offensive Efficiency ^ x + Adjusted Defensive Efficiency ^ x).

I have elected to use these two popular methods. I have also decided to use only from 2002 to the present. The reason for this was primarily a lack of consistent data for Offensive and Defensive Efficiency. Ken Pomeroy provides this data back to 2002 on his website, so I am electing to use it.

I used approximately 8.4 and 6.5, respectively for the two equations. I arrived at this by completing a least-squares (and least square-root) analysis using all regular season games between 2002 and 2015, minimizing the error between actual and expected values.

To then calculate Pythagorean Luck, you must calculate the difference between these values and the team’s actual winning percentage. Sometimes this is calculated as a straight difference and sometimes as a deviation, using something like the Correlated Gaussian Method, popularized by ESPN and former Denver Nuggets statistician, Dean Oliver.

For my purposes, I simply used the difference (subtraction).

A Little Data For Picking Your March Madness Pool

I have been combing through lots and lots of data, as I prepare my own entry to the Kaggle Machine Learning March Mania Contest again this year. I won’t go into how I am managing my entry right now, as the competition is obviously still open, but I thought I would share some of the insights I have accumulated along the way.

First off, you need to have a strategy. You can be the guy with the chalk bracket or the batshit-crazy-upset-dude, but we all know somewhere in the middle is probably where you need to go… just enough chalk, just enough upsets.

To get a good feel for how the tournament has played out over the past 20 years, I have put together a few graphics. The first one shows the winning percentage for each seed against every other seed since 1985. (The winning % is for the seed down the left side)

Seed v. Seed Winning Percentage
Seed v. Seed Winning Percentage

It’s kind of crazy. If look look at it, #1 seeds are only 40% versus #11 seeds since 1985. WTH? This obviously needs context, so here’s the same chart showing how many times each seed has played in that time frame.

Seed v. Seed Matchup Counts
Seed v. Seed Matchup Counts

Now we can calculate that #11 seeds have actually won 3-of-5 times against #1 seeds. Great, but what does this mean?

Hopefully this can help you solidify your strategy once the draw comes out. Maybe you like a certain 11-seed. How far should you maybe consider riding them? It should also be a guide to help you LIMIT your upsets from being just too wacky.

Another thing to consider is just how volatile the tournament will be. I have analyzed each year individually since 1985 and here are a few of my thoughts.

In the past 30 years, 19 of those seasons have been below “average” when it comes to upsets. I have defined these as the Chalk years. They tend to have fewer upsets and fewer large-scale upsets. The list includes: 1987, 1988, 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 2000, 2003, 2004, 2005, 2007, 2008, 2009 and 2012.

Since ’85, there have been an average 17.7 “Upsets” (by seed) and 7.9 “Big Upsets” per season. I define Big Upsets as those where the seed differential was greater than 4 (at least a 6 over a 1). I also used Mean Upset  — the sum of all upset differentials over the number of tournament games.

QUIRKY STAT MOMENT: Two years with the most upsets since 1985? 1999 (23) and 2014 (22). Guess who won both years? UCONN. Strange, huh?

When figuring out the “upset” and “chalk” years, the upset years stood out. Those would be 1985, 1986, 1990, 1999, 2001, 2002, 2006, 2010, 2011, 2013 and 2014. As you can see, four of the last five season fall into this category. Why? That’s a story for another day.

I am sure there are plenty of ways to argue the way I divided up the years, but the concept is solid: there are upset years and there are chalk years… and we seem to be in a time of upsets.

Remember, even though a season is defined as chalk, there are still plenty of upsets. In 2012, the most recent chalk year, two 15-seeds, Lehigh and Norfolk State, both won games over 2-seeds. Also, 10, 11, 12 and 13 seeds all had first round wins. However, the tournament was dominated by lower-seeded players throughout.

I hope some of this helps. Remember get a little crazy, but not too crazy… and it also help to be really lucky.

100k Simulations of All Texas Private Six-Man Brackets

It was just time to get down to it. I had been delaying the inevitable, running 100,000 simulations of each and every private school six-man state bracket. For details on how I did this, please read the earlier posts I have written about the public school brackets and other Monte Carlo simulations I have written. This was very similar….

First build the start bracket using this week’s ratings from my website (www.sixmanfootball.com). Then calculate the probability of each first round game and simulate the result. After each round I update the ratings (not 100% like my formula, but a close enough estimation) and continue…. do this 100,000 times and see what happened.

Well, here’s what happened.

TAPPS D1          
TEAM FIRST QUARTERS SEMIS FINAL CHAMPION
Boerne Geneva 0 16679 33496 12466 37359
Midland Trinity 0 13728 45784 12329 28159
Baytown Christian 0 38588 25904 24432 11076
Watauga Harvest 25725 17419 26487 20469 9900
Rockwall Heritage 35535 37235 13252 10377 3601
Sugar Land Logos Prep 24633 61253 8646 2274 3194
Houston Emery-Weiner 74275 9537 10169 4741 1278
Pasadena First Baptist 64465 24177 6168 3917 1273
Abilene Christian 45374 38952 10399 4169 1106
Round Rock Christian 49480 43205 5492 864 959
Katy Faith West 50520 43067 4867 711 835
Austin Hill Country 54626 34092 7621 2879 782
Waco Vanguard 75367 22068 1715 372 478
TAPPS D2          
TEAM FIRST QUARTERS SEMIS FINAL CHAMPION
Waco Live Oak 0 9711 16354 17471 56464
Austin Veritas 0 20885 34691 29735 14689
Orange Community Christian 0 46910 27849 18231 7010
Dallas Tyler Street 36085 17714 34388 5019 6794
SA Castle Hills 22194 38296 19682 14086 5742
Cedar Park Summit 25660 66166 3416 2406 2352
Denton Calvary 0 69007 26428 2318 2247
Kerrville Our Lady of the Hills 63915 13279 18663 1980 2163
Bulverde Bracken Christian 36088 49169 9136 4452 1155
Dallas Lakehill 77806 14794 4581 2204 615
Conroe Covenant Christian 63912 29946 4061 1649 432
Lubbock Christ The King 74340 24123 751 449 337
TAPPS D3          
TEAM FIRST QUARTERS SEMIS FINAL CHAMPION
Longview Trinity 0 25590 27022 16279 31109
Fredericksburg Heritage 0 29268 21774 25772 23186
WF Notre Dame 0 36128 35968 11996 15908
Fort Worth Covenant Classical 11814 54054 22060 5912 6160
Granbury North Central Texas Academy 43834 15878 24039 10298 5951
Seguin Lifegate 15010 58039 11590 10003 5358
Richardson Canyon Creek Christian 41133 42480 8406 3951 4030
San Marcos Hill Country Christian 56166 14491 19126 6837 3380
Alvin Living Stones 0 69631 22245 5739 2385
WF Wichita Christian 58867 31930 5010 2152 2041
Brenham Christian 84990 12693 1226 805 286
Selma River City Believers 88186 9818 1534 256 206
TCAF D1          
TEAM SEMIS FINAL CHAMPION
Fort Worth Nazarene 25645 32238 42117
Wylie Preparatory 26385 35665 37950
Dallas Inspired Vision 74355 15287 10358
Waco Methodist Childrens Home 73615 16810 9575
TCAF D2          
TEAM SEMIS FINAL CHAMPION
Azle Christian 20861 16452 62687
Granbury Cornerstone 32698 49507 17795
Weatherford Christian 79139 7600 13261
Arlington St. Paul Prep 67302 26441 6257
TCAL D1          
TEAM QUARTERS SEMIS FINAL CHAMPION  
Bryan Allen Academy 1340 2173 2839 93648
SA The Atonement 47104 23154 28492 1250
Tyler King’s Academy 48830 49877 289 1004
Greenville Phoenix 44302 28918 25840 940
EP Faith 52896 22695 23532 877
Bryan Christian Homeschool (BVCHEA) 51170 47717 255 858
Houston Mount Carmel 98660 233 266 841
Clear Lake Christian 55698 25233 18487 582
TCAL D2          
TEAM QUARTERS SEMIS FINAL CHAMPION
Stephenville Faith 2150 4906 24891 68053
Sugar Land HCYA Fort Bend 4188 21572 49549 24691
Corpus Christi Annapolis 12883 64578 18001 4538
Killeen Memorial 37562 58609 2464 1365
SA Sunnybrook 62438 35860 1165 537
Corpus Christi Abundant Life 97850 625 1092 433
Corpus Christi WINGS 87117 11393 1290 200
Lockhart Lighthouse Christian 95812 2457 1548 183
TAIAO D1          
TEAM QUARTERS SEMIS FINAL CHAMPION
Tyler HEAT 9493 28388 28029 34090
SA FEAST Homeschool 23317 29535 20494 26654
Capital City Christian Home School 34148 35468 15424 14960
Temple Centex Homeschool 39095 38257 12965 9683
Fort Worth THESA 65852 22050 7102 4996
Crosby Victory and Praise 60905 27484 7228 4383
Bryan Aggieland Home School (BCAL) 76683 12947 6135 4235
Plano CHANT 90507 5871 2623 999
TAIAO D2          
TEAM QUARTERS SEMIS FINAL CHAMPION
Austin NYOS 0 18228 29436 52336
Bastrop Tribe Consolidated 0 29389 40156 30455
Waco Parkview 19490 54627 17568 8315
San Marcos Homeschool 21741 62559 8584 7116
Weatherford Home School 78259 19213 1626 902
Victoria Home School 80510 15984 2630 876

Obviously for TCAF, I am moving straight into this week since the first round was played last weekend.

Another thing to notice is that teams like Austin NYOS do not lose in the first round. Why? They got a bye.

The biggest shocker at first glance – the fact that Bryan Allen Academy is such a huge favorite. I expected it to be high, but 93.6% to win it all is a little obscene.

So I hope everyone enjoys this… and remember, no wagering.

East and Throckmorton likely to rule UIL D2 Six-Man Playoffs

After 100,000 simulations, the Throckmorton Greyhounds appear to have a 29.8% chance to win the UIL D2 Six-Man State Championship. The biggest challenge it appears will be the dominance of the East bracket, which won a dominating 80.1% of the time in the simulation.

Yesterday I wrote about how the Crowell Wildcats are a somewhat dominant 33.1% to repeat as the D1 UIL State Six-Man Champions. If you would like to read more details on the methods, I have several posted below.

Basic note: The table represents how many times each team LOST in that round or became the champion (final column).

TEAM BI-DISTRICT AREA QUARTERS SEMIS FINALS CHAMPION
Throckmorton 2277 17879 26568 18537 4942 29797
Guthrie 7473 13276 45288 15376 3977 14610
Calvert 7171 26565 28201 20608 3668 13787
Richland Springs 16212 9781 33384 23044 3859 13720
Groom 14392 22605 26298 11473 18726 6506
Follett 20907 9637 30748 13218 19457 6033
Jonesboro 21822 51329 15095 7807 1096 2851
Motley County 36812 49576 7304 3538 821 1949
Buena Vista 24382 31007 18346 15296 9149 1820
Balmorhea 35900 17918 21475 14798 8314 1595
Blanket 26142 37455 17133 12373 5848 1049
Southland 16387 56498 15533 5156 5424 1002
Chillicothe 14573 70298 12063 1898 356 812
Oglesby 83788 4289 8126 2777 309 711
Lueders-Avoca 63188 31122 3323 1401 289 677
Mt. Calm 26113 62182 8582 2343 252 528
Sands 64100 13218 12862 6674 2706 440
McLean 79093 4955 10141 2791 2601 419
Blackwell 30030 45928 14678 6701 2320 343
Mullin 78178 17599 2828 1014 112 269
Sierra Blanca 75618 14193 5708 3167 1158 156
Whitharral 41417 49108 6784 1462 1082 147
Jayton 92527 2983 3809 455 90 136
Lefors 85608 7191 4864 1239 973 125
Trinidad 92829 4507 1933 566 61 104
Loraine 73858 17345 5047 2663 984 103
High Island 73887 23748 1851 406 26 82
Rising Star 69970 22936 4751 1763 507 73
Kress 58583 36300 3781 770 511 55
Lazbuddie 83613 13706 1851 456 333 41
Harrold 97723 1423 667 125 28 34
Forestburg 85427 13443 978 105 21 26

It is interesting to note that while Richland Springs and Calvert have higher ratings at the current time, Guthrie actually has the second-highest chance to win the tournament (14610 to 13720 and 13787, for RS and Calvert, respectively). This is due to the fact that Guthrie has it easier in the first two rounds.

Out West, Groom and Follett (6506 and 6033 wins) have a combined probability that’s less than any of the top-4 from the East. On the bright side, they reach the finals more than each of these, mostly due to the fact that Throckmorton is not in their half of the draw.

It certainly looks like the West is more competitive in the sense that the teams are more even and quite a few more have solid opportunities to reach the semis and finals.

Coming Next: All of the private school draws.

 

Crowell Favorite to Win Six-Man Title with 33.1% Win Probability

I have created several Monte Carlo simulations over the past year to try and determine probabilities for various sporting events. This week I decided to tackle the Texas Six-Man state tournament. (I will publish more bracket evaluations as the week goes on)

For the past 21 seasons, I have been producing rankings for six-man football. For those of you who do not know the history, I would fax my rankings to newspapers across the state and several would actually publish them. I eventually put together a newsletter, The Huntress Report, where I would add scores, game stories, stats and schedules to the rankings and mail (or fax) to subscribers. Eventually I moved to a website, where I would update the information a week behind, so that my subscribers would be getting the freshest information first. That all was scrapped in 1999 when I decided to go 100% to the website (www.sixmanfootball.com).

METHODOLOGY

You can read some of my earlier posts (see below here at sixmanguru.com) where I discuss Monte Carlo simulations if you are interested. In this case I played the UIL Division I tournament 100,000 times using probabilities calculated from the ratings on my website. To account for upsets and a more Bayesian methodology, I modified the teams ratings to also simulate my rating systems (generally) after each round. I also recorded each round a team lost and below are the results.

Crowell, the defending DI state champions, wins the title again a whopping 33.1% of the time and reached the finals over 41% of the time.

TEAM BI-DISTRICT AREA QUARTERS SEMIS FINALS CHAMP
Crowell 8686 12684 25264 11843 8426 33097
Ira 16846 9048 41178 9920 6573 16435
May 1911 8450 19585 28915 26882 14257
Blum 9396 2817 28089 27327 22565 9806
Borden County 16195 24064 21597 25036 4836 8272
Happy 9925 21253 34686 24447 4075 5614
Abbott 26692 4293 40184 16474 9330 3027
Water Valley 22717 63399 8220 2512 1390 1762
Valley 21460 50424 14276 10825 1381 1634
Gordon 30614 18265 35123 9414 5227 1357
Knox City 83154 4192 9627 1484 664 879
Grady 42027 41247 11329 4300 508 589
Highland 91314 3270 3476 935 478 527
Aquilla 73308 2813 17360 4297 1795 427
Sterling City 44824 47109 6529 782 373 383
Zephyr 17303 55678 21972 3477 1296 274
Anton 83805 8484 4666 2544 254 247
Newcastle 69386 11555 15094 2678 1046 241
Garden City 55176 39651 4328 411 224 210
Ropes 57973 32379 6938 2288 218 204
Marfa 77283 20647 1378 374 151 167
Nazareth 78540 17028 2724 1409 143 156
Milford 90604 972 5262 2335 697 130
Santa Anna 48258 46737 2667 1705 539 94
Rochelle 51742 44020 2323 1416 429 70
Spur 90075 5121 3784 890 64 66
Leverett’s Chapel 35599 59216 4515 531 121 18
Eden 82697 14502 2461 249 77 14
Chester 44796 52912 1717 462 99 14
Tioga 98089 793 775 280 50 13
Campbell 55204 43299 1154 281 51 11
Savoy 64401 33678 1719 159 38 5

The good news is every teams has a chance to win it all — even Savoy. The bad news — it appears they only an approximate 5 in 100,000 chance. I did run this a few times and they did get as high as 12 in one of the iterations. Tioga, a team that loses 98.1% of the time in the first round actually has a better chance than Savoy with 12 wins.

Another thing that stands out would be the fact that Ira, despite winning the title a theoretical 16.4% also seems to lose in the first round (16.8%) much more often than teams like Crowell (8.7%) or May (an amazing 1.9%). This goes to show that despite the 45-point expected spread on the Ira-Knox City game, it is still a much more difficult match-up for the Bulldogs than Highland or Tioga will be for Crowell and May, respectively.

Also interesting to note is that the East wins a dominant 70.2% of the time.

The most common final is a rematch of last year’s, May v. Crowell, with Blum v Crowell coming in next. The good news for May is they reach the final 41.1% of the time, which is a very good season. Blum is expected to reach the final about 32.4% of the time.

Wednesday I will release my UIL DII simulation results (they are already done, but it is my anniversary and we are going out for dinner). I will release the private school results either late Wednesday or early Thursday.

Quick Post on MLB Probabilities (100k Monte Carlo Simulations)

I just did a quick run of 100,000 playoff simulations and wanted to share the quick results. I will try to get some finer detail or maybe look into a few changes, but here are the raw World Series champion results.

Detroit — 4950
Baltimore — 18592
LA Angels — 31876
Kansas City — 9058
Washington — 19768
San Francisco — 4246
St. Louis — 1662
LA Dodgers — 9848

So the Angels win it all 31.8% of the time, with Washington and Baltimore in a tight race for second most.

Oakland, Pittsburgh slight favorites in Wild Card probabilities

With the MLB Playoffs beginning this evening, I figured it was time to test my rankings and pull out the old probability calculator. I created the MLB Ratings based on a simple least squares NLP Optimization that I have discussed before.

Oakland at Kansas City

The Royals are in the playoffs for the first time in ages and they get to host a game. Unfortunately, they didn’t seem to have a home field advantage during the regular season, so I am not sure how much this helps (although in reality we can assume it does, at least a little). The numbers say the A’s are the better team by almost 0.7 of a run (per game, for the season). I show them as a 63.5% favorite.

San Francisco at Pittsburgh

These teams appear to be very evenly match. On a neutral field, the Giants look to be a 0.15 run favorite. However, this game is not on a neutral field and Pittsburgh has one of the few home field advantages in the playoffs (if we assume the regular season is any indication). This swing makes the Pirates about a 0.215 run favorite tomorrow night, giving them about a 54.3% chance of winning.

Detroit v. Baltimore

Neither team appears to have a home field advantage, so looking at it straight-up, we find that Baltimore looks to be about a 0.4 run favorite (or 57.9%) per game. In a five-game series, the results look like this:

([0.0747, 0.1297, 0.1501], 0.3545, [0.194, 0.2451, 0.2064], 0.6455)

Overall, Baltimore is 64.6% to win the series. The most likely outcome is a Baltimore 3-1 win (24.5%).

Los Angeles v. St. Louis

With neither team holding a home field advantage, the Dodgers look to be about 0.445 runs (or 58.8%) better than the Cards. The five-game series probabilities are:

([0.2033, 0.2512, 0.207], 0.6615, [0.07, 0.1234, 0.1451], 0.3385)

Los Angeles looks about 66.2% to win the series overall. Again, the highest likelihood for an outcome is a 3-1 Dodger win (25.1%).

I will update the probabilities and try to run a Monte Carlo simulation with the data later in the week after we see who wins the Wild Card games.

Generic Sports Series Probability Calculator

With the baseball playoffs upon us, I have decided to start building a simulator to determine series outcomes once they start. I decided to make this as generic as possible. This simulator is not specific to baseball or even to a particular series length.

Obviously, the first parts to think about I addressed in my previous post relating to home field advantage, ratings and the probability a team would win a single game versus a specific opponent.

I will come back to this later in the month, as we get closer to the playoffs and I tie this all together.

Let’s assume for today that we know the probability a specific that Team A will defeat Team B. Let’s also assume, for matters of simplicity, that this single-game probability remains the same throughout the a series, regardless of any possible home field advantage.

Since we are dealing with a single probability and no perceived home field advantage, all we need for inputs are: p(Team A wins a single game), the current series record of the two teams and the numbers of games to win the series (e.g., 1 for a one-game series, 3 for a five-game series and 4 for a seven-game series).

All of my code is listed here on github, https://gist.github.com/sixmanguru

INPUTS
Like I said, let’s keep this simple. Probabilities, current series record, length of series.

seriesProb(.54,0,0,4)

The function calls for the series probabilities, give Team A holding a 54% chance to win a single game, the series is just beginning (0-0) and it takes for games to win the series (seven-game series).

That’s all.

OUPUT
Here’s the abbreviated (rounded to four digits).

([0.085, 0.1565, 0.1799, 0.1655], 0.5869, [0.0448, 0.0967, 0.1306, 0.141], 0.4131)

The first list contains the probabilities that Team A wins the series EXACTLY 4-0, 4-1, 4-2 or 4-3. The number trailing is the total probability Team A wins the series.

The second list contains the probabilities Team A loses the series EXACTLY 0-4, 1-4, 2-4, 3-4, with the total probability they lose the series following.

ALTERNATE EXAMPLES
Let’s assume the only thing you change is the fact that Team A now leads the series 3-0.

seriesProb(.54,3,0,4)

([0.54, 0.2484, 0.1143, 0.0526], 0.9553, [0, 0, 0, 0.0448], 0.0448)

As you can see above, there exists no change for Team B to win the series now 4-0, 4-1 or 4-2 and they have a 4.5% chance to even win the series at all. This can be verified by 0.46^4, which is approximately 0.0448.

Now let’s assume that it is a one game series.

seriesProb(.54,0,0,1)

([0.54], 0.54, [0.46], 0.46)

As you can see, it is one game, so the original probabilities are returned.

Finally, as a test, we say Team A trails the series 3-4 in a seven-game series.

seriesProb(.54,3,4,4)

It quickly returns (0,1). It is impossible for Team A to win and certain that Team B will win.

LIMITATIONS
The two biggest limitations to resolve (assuming you accept the theory that you can actually assign a probability to the function at all) remain to be the possibility of a home field advantage and how it would play out based on the series’ format (i.e., 2-3-2 vs. 2-2-1-1-1 and such)

Lastly, I would like to thank Jeff Sackmann, the author of Tennis Abstract and several other endeavors. His original python code for simulating a tennis match was the foundation for this project. His Python code for tennis Markov Chains can be found here, http://summerofjeff.wordpress.com/2011/01/13/python-code-for-tennis-markov/