UNIVERSITY PARK, Pa. – Suicide is a main cause of death in the United States, but the models used to predict suicide rates weigh risk factors equally and rely on data for large geographic areas, limiting the accuracy of the predictions, Penn State researchers said. Now, the researchers have developed a machine learning-based model that uses their newly developed Suicide Vulnerability Index, which weighs risk factors, to identify at-risk communities at the U.S. counties.
The approach was recently published in npj Mental Health Researcha Nature Portfolio magazine.
“Our goal was to develop a new suicide vulnerability index for US counties using a machine-based suicide prediction model,” said co-author Soundar Kumara, Allen E. Pearce and Allen M. Pearce Professor of Industrial Engineering. at Penn State, which is also affiliated with the College of Information Sciences and Technology. “By identifying the counties at higher risk for increased suicide rates, the model can help establish targeted intervention programs.”
The researchers analyzed 2010-19 data at the provincial level in the 3,140 U.S. counties, the smallest possible geographic classification available in the Centers for Disease Control and Prevention database. They identified 17 characteristics used to predict suicide rates that can be categorized under demographics, socioeconomic factors and health. The researchers suspected that some of these 17 characteristics would influence suicide rates more than others, and they wanted to determine which factors influenced suicide rates and how much.
To identify the impact of each factor, the researchers used SHapley Additive exPlanations (SHAP), a game theory-based approach that explains how each variable contributes to the model’s prediction.
“SHAP values examine the impact of each function by comparing the prediction results with and without that function,” said study co-author Kristin Sznajder, an assistant professor of public health sciences at the Penn State College of Medicine, who is also affiliated with the University of Washington. Huck Institutes of the Life Sciences and the Population Research Institute. “Using the SHAP values, the importance of all 17 features used in the prediction model training set was identified. By identifying and isolating the five key features from our analysis, we developed the Suicide Vulnerability Index. such indexes were created by including all variables without considering their effects on output.
The top five key county features driving suicide prediction results were population, percentage of African American population, percentage of white population, median age, and percentage of female population, with higher population, percentage of white population, and median age correlating with an increase in suicides while higher percent African American population and percent female population saw a decrease in suicide rates.