I suppose this is the appropriate time of year to look back and reflect on where I’ve been, and project forward to see where I’m heading. It’s been an incredible 6 months in the MSAN program at USF. I can hardly imagine a more hectic, or more fulfilling time. Since starting this journey I’ve had amazing experiences, met fantastic people and learnt a massive amount in a relatively short amount of time.
My statistical skills have grown in leaps and bounds. As an ex-engineer, uncertainty and confidence intervals came as a bit of a transition, but regression and time series analysis have kicked my butt, and changed the way I think. I am now equipped with a statistical armoury to be reckoned with, and put these techniques to use daily.
I can also say that my programming skills are heading in the right direction too. Since getting started in Python and R mere months ago, I’ve become comfortable and confident in getting any task that I set myself done in an efficient manner. I’ve implemented my own versions of naive Bayes, KNN, stepwise regression and bagging, become pretty handy with SQL, Hive and Mongo, and had a lot of fun messing around with public APIs (and been banned from LinkedIn for about a month for massively exceeding my quota, see here.
I’ve also taken on the world’s best in Kaggle and am extremely proud to have pinned a couple of these bad boys to my profile:
Of course I will keep ploughing ahead in future comps next year with the same raw and animal-spirit like energy (thanks Jeff). I’d have to say that competing in a few competitions has really made a big difference in school. My Kaggle experiences have allowed me to look at the new techniques we are taught daily and imagine how I would put them to use, giving immediate context to new concepts. I would go so far as to say that giving a big Kaggle competition a go should be a mandatory prerequisite for anyone planning to do an analytics degree.
Looking ahead, I’m extremely fortunate to begin my new internship next year with AutoGrid, a hot new startup in smart grid analytics. Next semester also brings a number of subjects that I’m very excited about: text mining, data visualization, and distributed computing, to name just a few. I’ll also be watching Kaggle for the next competition that catches my eye and hope to bump my rank into the top 1000 before I graduate!
I now have a week off before school gets going again, which means I’ll actually have a bit of spare time to write for once. Since first starting out in Kaggle on the excellent beginners competition, Titanic: Machine Learning from Disaster, I’d wanted to write a guide on how to get started in R on this dataset. I feel that it would be a good compliment to the existing Python and Excel guides on the official site. So, over the next few weeks, I’m planning to write a series of posts on this topic, covering the basics, decision trees, random forests and feature engineering. I hope that you’ll enjoy them!