Committed to Open Source… Again
Introducing fancy decision tree plots to scikit-learn.
Of course I will keep ploughing ahead in future comps next year with the same raw and animal-spirit like energy (thanks Jeff). I’d have to say that competing in a few competitions has really made a big difference in school. My Kaggle experiences have allowed me to look at the new techniques we are taught daily and imagine how I would put them to use, giving immediate context to new concepts. I would go so far as to say that giving a big Kaggle competition a go should be a mandatory prerequisite for anyone planning to do an analytics degree.
Looking ahead, I’m extremely fortunate to begin my new internship next year with AutoGrid, a hot new startup in smart grid analytics. Next semester also brings a number of subjects that I’m very excited about: text mining, data visualization, and distributed computing, to name just a few. I’ll also be watching Kaggle for the next competition that catches my eye and hope to bump my rank into the top 1000 before I graduate!
I now have a week off before school gets going again, which means I’ll actually have a bit of spare time to write for once. Since first starting out in Kaggle on the excellent beginners competition, Titanic: Machine Learning from Disaster, I’d wanted to write a guide on how to get started in R on this dataset. I feel that it would be a good compliment to the existing Python and Excel guides on the official site. So, over the next few weeks, I’m planning to write a series of posts on this topic, covering the basics, decision trees, random forests and feature engineering. I hope that you’ll enjoy them!