Pretend you are working for a fictional company, and you have been asked to explore a dataset with regard to a certain policy/action/decision. You can take some liberties with this scenario, but make sure there is a concrete plan of action that you recommend at the end. Write a report describing the problem of interest, using descriptive statistics to learn more about the data, running a prediction analysis, and recommending a plan of action based on the results of your analyses.
Data
You have been given a dataset of another wave of the Pulse of the Nation survey, this time collected in August 2018. This is available in the 201808-CAH_PulseOfTheNation.csv file. This survey contains some different questions compared to the dataset we have worked with in this class so far. The variable names and the question associated with each one is provided in the PulseOfTheNation-201808 Codebook.txt file. Many of the variable names make it easy to see what the question was, but it might be harder to determine for others, so make sure you read through it carefully! The list of variable names and questions is also provided at the end of this document.
Project Guidelines
In your report, you must include:
- A clear description of the question of interest and the variables you are studying.
- Descriptive statistics and visualizations that help the reader understand the data better.
- At least one bootstrap confidence interval.
- A prediction model using linear regression or classification.
- A clear conclusion based on your analysis, including a clear plan of action.
The bootstrap confidence interval can be part of the prediction model if you want (for example, a bootstrap confidence interval for the coefficient). Make sure you have a reason for looking at the variables that you did, and motivate why you chose those variables in the introduction. Write your report in a separate Jupyter Notebook, formatting it so that the code is included with the text. Make sure to include a title and section headings by using ‘#’ symbols in Markdown formatting.
Problem Description
You should start the project with a short description of the problem of interest. This must be something that you can propose an action on. Examples of good formulations of the problem are:
- Is smoking associated with a higher rate of low birthweight babies?
- What are the main factors associated with higher grades in a class?
In each of these cases, a conclusion and associated plan of action could be aimed at answered the following questions:
- Should we ban smoking during pregnancy?
- What should we do to improve grades?
Descriptive Statistics
You must describe the key variables in your project. Your graphs must be related to the problem of interest and the key outcome(s). Make sure you include numerical and/or graphical summaries of each of your variables. Generally, it might be helpful to use this section to explore the relationship between your outcome variable and your other variables.
Bootstrap Confidence Interval
You can choose to do a separate confidence interval or to do a confidence interval within the prediction section. Either way, make sure you include a description of the process that you used, as well as your interpretation of the confidence interval.