Committed to Open Source… Again
Introducing fancy decision tree plots to scikit-learn.
Let’s assume that this was a 50% down-payment for two tickets, so $300 could be used for his fare. A little wikipedia tells us he was born on September 13, 1857, which would mean he was 54 years old when the ship left port on April 10, 1912. He was also trying to get from England to the US, so let’s assume that he was sailing from Southampton, though I have been unable to find the exact port he was planning to embark at.
Since we never used the ticket number, or cabin number, for our predictions, we can just leave these as
NA values. So let’s build a special Hershey dataframe and combine it to the combi dataframe we built in the tutorial (before we transformed it to build the engineered variables):
> Hershey <- data.frame(Pclass=1, Sex='Male', Age=54, SibSp=1, Parch=0, Fare=300, Embarked='S', PassengerId=NA, Survived=NA, Name='Hershey, Mr. Milton S.', Ticket=NA, Cabin=NA) > combi <- rbind(train, test, Hershey)
Okay. So now we run through the rest of the tutorial and make our engineered variables, but this time, when we split it back up into the train and test sets, we also break out the Hershey dataset:
> Hershey <- combi[1310,]
We then train our model as before, and finally make our prediction on whether he was a lucky guy or not:
> predict(fit, Hershey, OOB=TRUE, type = "response")  0
Oh dear! Imagine a world without kisses!
So, sadly, our model tells us that Hershey would have perished in the Titanic disaster. Perhaps you would like to dig into whether some of the other famous people who were almost aboard the famous boat would have escaped or not?