comportamento anomalo con select in dplyr

Sto riscontrando un comportamento strano con la funzione select di dplyr. Non sta lasciando cadere la variabile dal frame di dati.comportamento anomalo con select in dplyr

Ecco i dati originari:

orig <- structure(list(park = structure(c(4L, 4L, 4L, 4L, 4L), .Label = c("miss", 
"piro", "sacn", "slbe"), class = "factor"), year = c(2006L, 2009L, 
2006L, 2008L, 2009L), agent = structure(c(5L, 5L, 5L, 7L, 5L), .Label = c("agriculture", 
"beaver", "development", "flooding", "forest_pathogen", "harvest_00_20", 
"harvest_30_60", "harvest_70_90", "none"), class = "factor"), 
    ha = c(4.32, 1.17, 3.51, 2.07, 9.18), loc_01 = structure(c(9L, 
    5L, 9L, 5L, 5L), .Label = c("miss", "non_miss", "non_piro", 
    "non_sacn", "non_slbe", "none", "piro", "sacn", "slbe"), class = "factor"), 
    loc_02 = structure(c(5L, 1L, 5L, 1L, 1L), .Label = c("none", 
    "piro_core", "piro_ibz", "slbe_mainland", "slbe_southmanitou" 
    ), class = "factor"), loc_03 = structure(c(1L, 1L, 1L, 1L, 
    1L), .Label = "none", class = "factor"), cross_valid = c(1L, 
    1L, 1L, 1L, 1L)), .Names = c("park", "year", "agent", "ha", 
"loc_01", "loc_02", "loc_03", "cross_valid"), row.names = c(NA, 
5L), class = "data.frame")

Assomiglia:

> orig 
    park year   agent ha loc_01   loc_02 loc_03 cross_valid 
1 slbe 2006 forest_pathogen 4.32  slbe slbe_southmanitou none   1 
2 slbe 2009 forest_pathogen 1.17 non_slbe    none none   1 
3 slbe 2006 forest_pathogen 3.51  slbe slbe_southmanitou none   1 
4 slbe 2008 harvest_30_60 2.07 non_slbe    none none   1 
5 slbe 2009 forest_pathogen 9.18 non_slbe    none none   1 
> str(orig) 
'data.frame': 5 obs. of 8 variables: 
$ park  : Factor w/ 4 levels "miss","piro",..: 4 4 4 4 4 
$ year  : int 2006 2009 2006 2008 2009 
$ agent  : Factor w/ 9 levels "agriculture",..: 5 5 5 7 5 
$ ha   : num 4.32 1.17 3.51 2.07 9.18 
$ loc_01  : Factor w/ 9 levels "miss","non_miss",..: 9 5 9 5 5 
$ loc_02  : Factor w/ 5 levels "none","piro_core",..: 5 1 5 1 1 
$ loc_03  : Factor w/ 1 level "none": 1 1 1 1 1 
$ cross_valid: int 1 1 1 1 1

Poi faccio un breve riassunto ...

library (dplyr) 
    summ <- orig %>% 
    + group_by(park,cross_valid,agent) %>% 
    + summarise(ha_dist=sum(ha)) 
    summ 
    Source: local data frame [2 x 4] 
    Groups: park, cross_valid 

     park cross_valid   agent ha_dist 
    1 slbe   1 forest_pathogen 18.18 
    2 slbe   1 harvest_30_60 2.07 
    str(summ) 
    Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 4 variables: 
    $ park  : Factor w/ 4 levels "miss","piro",..: 4 4 
    $ cross_valid: int 1 1 
    $ agent  : Factor w/ 9 levels "agriculture",..: 5 7 
    $ ha_dist : num 18.18 2.07 
    - attr(*, "vars")=List of 2 
     ..$ : symbol park 
     ..$ : symbol cross_valid 
    - attr(*, "drop")= logi TRUE

Allora provo a cadere 'cross_valid '...

sel <- select (summ,-cross_valid) 
summ 
Source: local data frame [2 x 4] 
Groups: park, cross_valid 

    park cross_valid   agent ha_dist 
1 slbe   1 forest_pathogen 18.18 
2 slbe   1 harvest_30_60 2.07 
str(summ) 
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of 4 variables: 
$ park  : Factor w/ 4 levels "miss","piro",..: 4 4 
$ cross_valid: int 1 1 
$ agent  : Factor w/ 9 levels "agriculture",..: 5 7 
$ ha_dist : num 18.18 2.07 
- attr(*, "vars")=List of 2 
    ..$ : symbol park 
    ..$ : symbol cross_valid 
- attr(*, "drop")= logi TRUE 
- attr(*, "indices")=List of 1 
    ..$ : int 0 1 
- attr(*, "group_sizes")= int 2 
- attr(*, "biggest_group_size")= int 2 
- attr(*, "labels")='data.frame': 1 obs. of 2 variables: 
    ..$ park  : Factor w/ 4 levels "miss","piro",..: 4 
    ..$ cross_valid: int 1 
    ..- attr(*, "vars")=List of 2 
    .. ..$ : symbol park 
    .. ..$ : symbol cross_valid

E non scenderà summ$cross_valid

Se uso di base R a cadere cross_valid, funziona ...

base.sel <- summ[-2] 
base.sel 
Source: local data frame [2 x 3] 
Groups: 

    park   agent ha_dist 
1 slbe forest_pathogen 18.18 
2 slbe harvest_30_60 2.07

posso cadere orig$cross_valid utilizzando select ...

drop.orig <- select (orig,-cross_valid) 
drop.orig 
    park year   agent ha loc_01   loc_02 loc_03 
1 slbe 2006 forest_pathogen 4.32  slbe slbe_southmanitou none 
2 slbe 2009 forest_pathogen 1.17 non_slbe    none none 
3 slbe 2006 forest_pathogen 3.51  slbe slbe_southmanitou none 
4 slbe 2008 harvest_30_60 2.07 non_slbe    none none 
5 slbe 2009 forest_pathogen 9.18 non_slbe    none none

Dato che posso rilasciare la variabile con la base R, non è un grosso problema, ma ho pensato che potrebbe esserci qualche problema con dplyr. Probabilmente è qualcosa con la struttura della variabile, ma non so cosa sarebbe.

Grazie ..

-cherrytree

fonte

2014-09-08 cherrytree

provare ungroup()

summ%>% 
ungroup() %>% 
select(-cross_valid) 
# park   agent ha_dist 
#1 slbe forest_pathogen 18.18 
#2 slbe harvest_30_60 2.07 



groups(summ) 
#[[1]] 
#park 

#[[2]] 
#cross_valid

fonte

2014-09-08 16:08:04 akrun

Sì, non è possibile '-select' una variabile di raggruppamento. – Ajar

Grazie a @akrun. Non sapevo che non si potesse rimuovere una variabile di raggruppamento ... molto interessante e buona da conoscere. – cherrytree

@cherrytree Nessun problema. Sono contento che ci abbia aiutato – akrun

comportamento anomalo con select in dplyr

risposta

Problemi correlati