One-hot encoding was used to covert PMH data into categorical variables. Instead of grouping past medical history elements into categories (e.g., history of cancer or history of diabetes), specific diagnosis codes were included, to assess differential risk that may arise from different severity or manifestations of medical conditions. ROS and PE findings were represented by two binary variables, one representing occurrence of a “pertinent positive” (e.g., positive review of systems for headache) the other when occurring as a “pertinent negative” (e.g., negative review of systems for nausea). Absent or missing documentation was therefore represented as a null value for both pertinent positive and pertinent negative variables. Numerical values (e.g., age, pain rating) were coded as continuous variables. Additional details of feature encoding are given in Item 4 in S1 File.
Free full text: Click here