HW10

This homework will reuse the candy dataset, not with the goal of predicting whether a candy contains chocolate.

Recall: The data was obtained from https://www.kaggle.com/fivethirtyeight/the-ultimate-halloween-candy-power-ranking. The original article and analysis is available at https://fivethirtyeight.com/features/the-ultimate-halloween-candy-power-ranking/.

candy <- read_csv('http://math.montana.edu/ahoegh/teaching/stat446/candy-data.csv')
candy <- candy %>% mutate(chocolate_factor = as.factor(chocolate))
Context

What’s the best (or at least the most popular) Halloween candy? That was the question this dataset was collected to answer. Data was collected by creating a website where participants were shown presenting two fun-sized candies and asked to click on the one they would prefer to receive. In total, more than 269 thousand votes were collected from 8,371 different IP addresses.

Content

candy-data.csv includes attributes for each candy along with its ranking. For binary variables, 1 means yes, 0 means no. The data contains the following fields:

Acknowledgements:

This dataset is Copyright (c) 2014 ESPN Internet Ventures and distributed under an MIT license. Check out the analysis and write-up here: The Ultimate Halloween Candy Power Ranking. Thanks to Walt Hickey for making the data available.

1 (3 points)

Use the glm() function to fit a regression model to understand if other factors can be used to model whether a candy item contains chocolate. Make sure to include winpercent but you are free to consider other variables too. Interpret the output.

2. (8 points)

Using Stan implement Bayesian logistic regression.

a. (2 points)

Write out the model, including priors. Note you did this in the last quiz.

b. (3 points)

Implement Bayesian logistic regression using Stan. This Stan link will be helpful.

b. (3 points)

Interpret your output and explain the results without statistical jargon.

3. (9 points)

Building on Quiz 9. Write your own Metropolis-Hastings sampler to estimate the parameters in the model specified in Q2a. Print out trace plots for a few of your variables. Note your answers should be very similar to those found with Stan.