The factors determining the preference for COVID-19 Vaccine

Published: 22 February 2021| Version 3 | DOI: 10.17632/kbzskd37zy.3
Pritish Mondal,


A large section of the community is still skeptical about accepting COVID-vaccine. To understand the ground reality, we conducted a nationwide online survey study in the USA. The working hypothesis was that the sociodemographic factors such as age, race, education level, family income, gender would influence the decision of vaccine acceptance, and a prediction model built on those factors would precisely predict the vaccine acceptance. It was an anonymous survey, and we did not ask for any identifier. The survey questionnaire was built in Redcap and was primarily divided into three sections. The survey period was between 5/2020-01/2021. The primary outcome of the study was COVID-vaccine acceptance vs. hesitance. The study participants were divided into categories based on several sociodemographic factors. Fifty US states were regrouped into nine regions based on the census bureau's recommendation. We further asked whether the participant was a healthcare worker (HCW) (yes vs. no) and if they were satisfied with healthcare access (1-5 Likert scale). We divided the participants into pre vs. post-vaccine launch, through November 1st, since right after Pfizer announced a successful vaccine trial and a new COVID wave hit the USA. We also included questions to evaluate COVID-related knowledge and the source of COVID-related information. A new variable was computed to estimate perceived threat based on old age, HCW, chances of having severe COVID infection, state of COVID in his area. Several metrics of knowledge and perceived threat were consolidated into two continuous variables using factor analyses. Prediction models were built using binary backward logistic regression and neural networks. Nine variables- age, race, education level, family income, gender, US region, HCW, healthcare access, and vaccine launch period, were considered as potential predictors of vaccine acceptance. The US-based participants who completed all three sections of the survey and answered the key question of vaccine acceptance were considered eligible participants. Among 2978 eligible respondents, 81.1% of the participants indicated their willingness to accept the vaccine. Based on chi-square tests, all the predictors demonstrated an association with vaccine acceptance except the vaccine-launch period. Both the models identified age, education, and race as the top predictors, while the vaccine-launch period and gender were the least significant. People of middle-age, lower education, and black race were the most skeptical. Fear of side effects was the predominant cause of vaccine refusal affecting 84.2% of vaccine-hesitant vs. 51.0% of vaccine-compliant participants. The vaccine-complaint than the vaccine-hesitant group had more COVID-related knowledge, higher perceived threat, and better followed the CDC or other official health websites. There was an association between influenza and COVID-vaccine acceptance (χ2=316.6(df=4,N=2976),p<0.001).


Steps to reproduce

The basic structure in the excel document is unchanged from the original downloaded CSV file from the Redcap. The following variables were re-coded or added to facilitate study analyses. A. Recoding: All the re-coded variables were marked in green. 1) Child, hosp_rate, hosp_rate, death_rate, incubation, corona, animal, screening, climate, sunlight, nasal_spray, hot_beverages, baby_wipes: we changed the scoring compared to the codebook. We scored response 0 in the codebook as 1, and response 1 in the codebook as 0. 2) vaccine_pcv_flu: response 1-3 in the codebook were incorrect response (scored 0), and response 4 in the codebook was correct response (scored 1) 3) mask_gloves_2: We swapped 0 and 1, and 1 in our datasheet should represent a correct response now. 4) social_distance_feet, isolation: response 1,3,4 in the codebook were incorrect, thus scored 0, and response 2 was correct and scored 1. 5) corona live: response 1,3 in the codebook were incorrect and scored 0, and response 2 scored 1. 6) spread: response 3,4 in the codebook were incorrect and scored 0, and response 1,2 scored 1. 7) chloroquin: response 1,2 codebook were incorrect and scored 0, response 3 codebook was partially correct and scored 0.5, response 4,5 in codebook was correct and scored 1. 8) hospital, vent, administration: Response 1 to 5 in the codebook was upside down; now they are scored from 5 to 1, respectively. B. Addition: Few other variables were computed and highlighted in yellow. For example, we used factor analyses to consolidate all metrics of perceived threat and knowledge scores in single nominal scale (0-10). C. Reorganize categorization: We regrouped Racial distribution. All the other races apart from whites, African-American, Asian, Hispanic were categorized as others. Educational categories were categorized too. Middle school and high school were categorized as <high school.


Penn State College of Medicine Department of Pediatrics


Public Health, Vaccine, Preventive Medicine, Sociodemographics, COVID-19