In addition to these priorities, increased funding and energy has been spent on
addressing this problem through data to detect behaviors and come up with actionable
insights. The Center for Disease Control and Prevention has established an initiative to
award funds to states who are specifically using data-driven insights to combat this
epidemic.
bit.ly/JAMAOpioids
bit.ly/CDCDrugOverdose
bit.ly/NIDAOverdoseCrisis
For this datathon, we invite you to explore the data sources we have provided to
understand the underlying predictors of this crisis. At the core, your final deliverable
should be centered around what key factors influence drug-related deaths. Your analysis
should include a thorough examination of the sources provided and use a sound
statistical approach to this problem that accounts for various external factors. This can
involve predictive models (ie. machine learning, regression-based techniques) to predict
drug deaths based on a defined feature set, or a descriptive modeling to quantify
relationships you may find in the data.
Beyond this core question, further research questions that build off of your preliminary
insights are open to the creativity of your team. Two examples of such driving questions
are given below:
1. Within a specific timeline, can you find and predict spikes in overdose death in
counties across the US?
2. Can you identify the impact of state-wide policies on drug overdose deaths
between 2010 and 2017? Which were most effective/least effective, and what
kinds of policies would be most effective for the future?
Answering the core question is the primary focus of the Blueprint Datathon. However,
including supplementary analysis or proposing actionable policies based on your findings
will enhance your team’s submission. This part should show an understanding of your
analytical methods and how you used them to generate your suggestions.
Deliverables:
The competition will be judged in two parts. First, each team will be required to
submit a 2-page write-up and a link to a GitHub repository c ontaining all relevant code
by 10 AM Sunday, 3/11. This write-up should summarize your analysis methods and final
conclusions. Judges will choose the top 5 teams based on these submissions. These
teams will be invited to give a 5 minute presentation during our closing ceremony,
followed by 5 minutes of questions from the judges. The presentation should expand on
the 2-page write-up.
For both deliverables, teams should:
● Describe the methods used
● Interpret results, concentrating on what you learned through the Datathon
● Emphasize challenges in carrying out the analysis
● Illustrate the originality and novelty of your approach
● Reference any external sources you used to help you complete the task
The Data:
Data has been obtained from 2 main sources. Your primary data source will be the
County Health Rankings dataset. This dataset is a compilation of several other public
datasets from 2010-2017 grouped at the county level, and contains granular information
for many relevant features. Before starting, we highly recommend going through the
dataset and identifying interesting categories and gaps in the data.
The second dataset provided is from census.gov. Similar to the first dataset, it
contains data organized by county, but spans a different set of years and contains more
features that can include housing, income and poverty, social programs, and employment
statistics.
County Health Rankings:
bit.ly/CountyHealthRanking
Census.gov:
bit.ly/CensusGovData
Additionally, although not a required part of your analysis, using policy datasets can
strengthen your team’s submission. We have included two of these sources below.
Policy Datasets
2015 - 2018:
bit.ly/Policy2015-2018
2009 - 2014:
bit.ly/Policy2009-2014
Judging:
Judging will broadly be based upon the following:
● Soundness of the approach taken (including, but not limited to, statistical
significance, auROC)
● Potential scientific, societal, and policy impacts of the results
● Originality and novelty of the approach
● Quality of the description of the data and tools used, especially reproducibility
● Quality of the 2-page write-up and execution of presentation