Committed to Open Source… Again
Introducing fancy decision tree plots to scikit-learn.
Perhaps egocentrically, I put my own team as a big red dot on the chart. The horizontal red lines represent the scores achieved from publicly-shared “beat the benchmark” code from the forums. The gaps in the plot show where many people have submitted those benchmarks unedited and then failed to improve upon it, ie. ties.
The “hockey stick” shaped formations between these ties is pretty striking, especially in the top 75 competitors. While incremental improvements over the benchmark code show up as an asymptotic portion of the curves on the low-end, once competitors break out of that zone, you see an amazing linear pattern heading upwards in the rankings.
I would have expected another asymptote right at the top as people pushed against what is possible to learn from the data. But as of right now, it is still linear right up to the best teams, indicating they are still finding more information as they approach the perfect score.
It wouldn’t be a post about Kaggle without a prediction, right? As the competition deadline approaches, I suspect this pattern will break, at least in the top hockey stick where there’s almost no more room left to improve. I anticipate new asymptote up against full classification in the end. Hopefully my team’s big red dot heads upwards too!
I’ll update you in a couple of months.
Leave a Comment