|
|||||
Changing The Underlying Value of a Factor Variable in an R DataframeWhen you import a table into R, it automagically converts some variables to factors. Factors are essentially categorical variables. While you can easily recode and change factors, this does not mean that you are changing the underlying values! This can result in unexpected results since some functions and plots in R will reflect the changes you made to the factors and others will not. To fix the underlying data and not just the factor levels, you need to convert the factor to a character variable (or numeric in some cases). Once you do that, you can then fix the underlying data. Lastly, once you fix the underlying data, you need to convert the variable back to a factor. This process is not as easy as it seems though, this website describes the issue. I will highlight the relevant parts: 1: When are factor variables a big pain?
2: Factor variables are a pain when you're cleaning your data because they're hard to update. My approach has always been to convert the variable to character with as.character(), then handle the variable as a character vector, and then convert it to factor (using factor() or as.factor()) at the end. 3: What's one secret to converting factors to character vectors in a data frame? 4: Here's an interesting fact. Remember how you can refer to columns of a data frame either in matrix style or in list style? When you use the matrix-style notation S-Plus will often factorize your character variables automatically. That's not true for list-style notation, so list-style is often what you want. Here's an example: 5:
6: #
7: # Create a simple data frame.
8: #
9: > d <- data.frame (a = c("a", "b", "c", "d")) 10: #
11: # The data.frame() function automatically converts characters to factors.
12: #
13: > is.factor (d[,1]) 14: [1] T
15: #
16: # This should convert it back, but it doesn't. 17: #
18: > d[,1] <- as.character(d[,1])
19: > is.factor (d[,1])
20: [1] T
21: #
22: # This looks the same, but using list notation on the left makes all the difference.
23: #
24: > d$a <- as.character(d[,1])
25: > is.factor (d[,1])
26: [1] F
27: #
28: # This would have worked: the I() function ("I" standing for "identity") says
29: # "leave this just as I give it to you; don't convert it".
30: #
31: > d[,1] <- I(as.character(d[,1])) Once you have converted the factor to a character variable, I would use the recode function from the car package. This is one example of using the recode function: 1: newgroup = recode(group,"'f' = 'F'")
Lastly, you will need to convert it back. Just use the same strategy as was used to convert it to a character variable in the first place. People who looked at this item also looked at…Related items |
|||||
|
Copyright © 2010 Distant Traveler - All Rights Reserved |
|||||