risultato Combine da top_n con una categoria "Altro" in dplyr

Ho un frame di dati DAT1risultato Combine da top_n con una categoria "Altro" in dplyr

Country Count 
1  AUS  1 
2  NZ  2 
3  NZ  1 
4  USA  3 
5  AUS  1 
6  IND  2 
7  AUS  4 
8  USA  2 
9  JPN  5 
10  CN  2

Prima di tutto voglio riassumere "Count" per "Country". Poi i primi 3 conteggi totali per paese dovrebbero essere combinati con una riga aggiuntiva "Altri", che è la somma dei paesi che non fanno parte della top 3.

Il risultato atteso, pertanto potrebbe essere:

Country Count 
1  AUS  6 
2  JPN  5 
3  USA  5 
4  Others 7

Ho provato il codice seguente, ma non sono riuscito a capire come posizionare la riga "Altri".

dat1 %>% 
    group_by(Country) %>% 
    summarise(Count = sum(Count)) %>% 
    arrange(desc(Count)) %>% 
    top_n(3)

Questo codice dà attualmente:

Country Count 
1  AUS  6 
2  JPN  5 
3  USA  5

Qualsiasi aiuto sarebbe molto apprezzato.

dat1 <- structure(list(Country = structure(c(1L, 5L, 5L, 6L, 1L, 3L, 
    1L, 6L, 4L, 2L), .Label = c("AUS", "CN", "IND", "JPN", "NZ", 
    "USA"), class = "factor"), Count = c(1L, 2L, 1L, 3L, 1L, 2L, 
    4L, 2L, 5L, 2L)), .Names = c("Country", "Count"), class = "data.frame",  row.names = c("1", 
    "2", "3", "4", "5", "6", "7", "8", "9", "10"))

fonte

2016-01-31 abhy3

associati Q & A: [Creazione di un “altro” campo] (http://stackoverflow.com/questions/23730067/creating-an-other-field]). – Henrik

Invece di top_n, questo mi sembra un buon esempio per la funzione di convenienza tally. Utilizza summarise, sum e arrange sotto il cofano.

Quindi utilizzare factor per creare una categoria "Altro". Utilizzare l'argomento levels per impostare "Altro" come ultimo livello. "Altro" sarà quindi posizionato per ultimo nella tabella (e in ogni successiva trama del risultato).

Se "Paese" è factor nei dati originali, è possibile avvolgere Country[1:3] in as.character.

group_by(df, Country) %>% 
    tally(Count, sort = TRUE) %>% 
    group_by(Country = factor(c(Country[1:3], rep("Other", n() - 3)), 
          levels = c(Country[1:3], "Other"))) %>% 
    tally(n) 

# Country  n 
# (fctr) (int) 
#1  AUS  6 
#2  JPN  5 
#3  USA  5 
#4 Other  7

fonte

2016-01-31 14:02:02 Henrik

Potremmo farlo in due fasi: in primo luogo creare una data.frame ordinato, e quindi rbind le prime tre file, con una sintesi delle ultime file:

d <- df %>% group_by(Country) %>% summarise(Count = sum(Count)) %>% arrange(desc(Count)) 

rbind(top_n(d,3), 
     slice(d,4:n()) %>% summarise(Country="other",Count=sum(Count)) 
    )

uscita

Country Count 
    (fctr) (int) 
1  AUS  6 
2  JPN  5 
3  USA  5 
4 other  7

fonte

2016-01-31 12:46:45 scoa

Questa è un'opzione che utilizza data.table. Convertiamo 'data.frame' in 'data.table' (setDT(dat1)), raggruppato per 'Paese otteniamo il sum di' Count ', quindi order da' Count ', abbiamo rbind le prime tre osservazioni con lo list di' Altri 'e il sum di' Conteggio 'del resto delle osservazioni.

library(data.table) 
setDT(dat1)[, list(Count=sum(Count)), Country][order(-Count), 
    rbind(.SD[1:3], list(Country='Others', Count=sum(.SD[[2]][4:.N]))) ] 
# Country Count 
#1:  AUS  6 
#2:  USA  5 
#3:  JPN  5 
#4: Others  7

o utilizzando base R

d1 <- aggregate(.~Country, dat1, FUN=sum) 
i1 <- order(-d1$Count) 
rbind(d1[i1,][1:3,], data.frame(Country='Others', 
    Count=sum(d1$Count[i1][4:nrow(d1)])))

fonte

2016-01-31 14:01:46 akrun

Si può anche usare xtabs() e manipolare il risultato. Questa è una risposta di base R.

s <- sort(xtabs(Count ~ ., dat1), decreasing = TRUE) 
setNames(
    as.data.frame(as.table(c(head(s, 3), Others = sum(tail(s, -3)))), 
    names(dat1) 
) 
# Country Count 
# 1  AUS  6 
# 2  JPN  5 
# 3  USA  5 
# 4 Others  7

fonte

2016-01-31 16:02:11

Una funzione alcuni potrebbero trovare utile:

top_cases = function(v, top, other = 'other'){ 
    cv = class(v) 
    v = as.character(v) 
    v[factor(v, levels = top) %>% is.na()] = other 
    if(cv == 'factor') v = factor(v, levels = c(top, other)) 
    v 
}

> table(state.region) 
state.region 
    Northeast   South North Central   West 
      9   16   12   13 
> top_cases(state.region, c('South','West'), 'North') %>% table() 
. 
South West North 
    16 13 21 

iris %>% mutate(Species = top_cases(Species, c('setosa','versicolor')))

fonte

2017-02-08 14:33:17 geotheory

Per chi fosse interessato, nel caso per ottenere categorie costituite da maggiore di una certa percentuale inseriti in un 'altra. 'categoria, ecco un codice.

Per questo, qualsiasi valore inferiore al 5% va nella categoria "altro", la categoria "altro" viene sommata e include un'etichetta del numero di categorie aggregate nella categoria "altro".

othernum <- nrow(sub[(sub$value<.05),]) 
sub<- subset(sub, value >.05) 
toplot <- rbind(sub,c(paste("Other (",othernum," types)", sep=""), 1-sum(sub$value)))

fonte

2018-02-13 19:54:57

risultato Combine da top_n con una categoria "Altro" in dplyr

risposta

Problemi correlati