Data Management and Visualization: Making Data Management Decisions
For the week 3 assignment of the "Data Management and Visualization" course, I am to perform some Data Management techniques, which I will describe in detail bellow.
First, I will be coding out missing data to some of the selected variables. After that I will be creating a new variable with values based on response codes of another variable. And lastly, I will create a secondary variable by multiplying a couple of variables (in fact the very same variables that were given in the lecture examples of the NESARC dataset). Before we start, let's just rewind what the original question of the project was: "Have ex-cigarette smokers smoked, on average, less cigarettes than current ones, prior to their quitting?", where less was described with a couple of meanings: "less as a count" and "less as breifer duration periods". Now, let us begin!
1. Coding out missing data.
As you can see, before I started coding out the missing data, I made a copy dataframe on which to make changes.All of the changes were made successfully!
2. Creating a new variable with values based on response codes of a variable
As evident from the picture, I assigned the values with a data dictionary for the new variable "USFREQMO" based on the existing values of the variable "S3AQ3B1".Again, no issue occured while making the changes!
3. Creating a secondary variable by combining values from a couple of other variables
As evident from the picture, I multiplied the values of "USFREQMO" (usual frequency of smoking cigarettes per month) and "S3AQ3C1" (Usual quantity when smoked cigarettes) to create a variable "NUMCIGMO_EST" which indicates the estimated number of cigarettes smoked per month.After concluding these steps, I made a couple of other data management decisions.
Firstly, I subsetted the data for only a few chosen variables, as shown on the picture bellow:
Коментари
Публикуване на коментар