jeudi 27 juillet 2017

Gathering/ melting multiple column sets, integer values for the variable column?

first of all, I am sorry, I know that my question overlaps with couple of other questions here, but I am really new to R and I could not find an answer that fills the task completely and that I could actually understand.

my data set is something like that:

ID … Exam_t_minus_3 Exam_t_minus_2 Exam_t_minus_1 Grade_2012 Grade_2013 Grade_2014
1       Math        Physics         Chemestry         98         70         76
2       English     French          Russian           77         85         59
3       English     Chemistry       Biology           65         77         69

and I want it to become something like that:

ID  …   i(as t_minus_i_)    Exam         Grade
1       3                   Math          98
1       2                   Physics       70
1       1                   Chemistry     76
2       3                   English       77
2       2                   French        85
2       1                   Russian       59

I have gone through swirl() course and thought I could do it with tidyr, but I could not figure out how could I gather more than one set of columns. I ended up with a result like:

ID  …   ................    Exam         Grade
1       .                   Math          98
1       .                   Math          70
1       .                   Math          76
1       .                   Physics       98
1       .                   Physics       70
1       .                   Physics       76

I looked into this: Gather multiple sets of columns and some others, but most of them lost me on half way.

I tried the answer for this question as well (I changed the earning columns to end with T_minus_* beforehand): Gather multiple columns with tidyr

i.e I tried

library(data.table)
res = melt(setDT(sample_df), 
measure.vars = patterns("^Exam", "^Grade"), 
variable.name = "i")
res[, i := factor(i, labels = c("3","2", "1"))]

It does the most of the thing I need, however, I need to the values in my i columns to be integers not categorical variables, because I need them for the future calculations.

I tried to do res$i <- as.numeric(res$i), but that changed the order, i.e "3" was evaluated to 1, "1" to 3. I tried to just leave it out, but that gives me 1,2,3 as well for the i column values.

As I changed the earnings columns to be Earnings_T_minus_* could I perhaps just get those * values in the i column somehow?

library(data.table)
res = melt(setDT(sample_df), 
measure.vars = patterns("^Exam_T_minus_*", "^Grade_T_minus_*"), 
variable.name = "i")

Sorry, the question became a bit long and perhaps confusing, but hope someone can lead me in right direction.

Aucun commentaire:

Enregistrer un commentaire