Aggiungi una nuova riga al dataframe, a un determinato indice di riga, non aggiunto?

149

Il codice seguente combina un vettore con un dataframe:Aggiungi una nuova riga al dataframe, a un determinato indice di riga, non aggiunto?

newrow = c(1:4) 
existingDF = rbind(existingDF,newrow)

Tuttavia questo codice inserisce sempre la nuova riga al termine del dataframe.

Come posso inserire la riga in un punto specificato all'interno del dataframe? Ad esempio, diciamo che il dataframe ha 20 righe, come posso inserire la nuova riga tra le righe 10 e 11?

fonte

2012-07-19 luciano

Utilizzare un indice conveniente e ordinare? – Roland

+21

'existingDF = rbind (existingDF [1:10,], newrow, existingDF [- (1:10),])' – Pop

Con un semplice ciclo e una condizione se necessario, le righe possono essere aggiunte da un dataframe all'altro. Un esempio di codice è come mostrato sotto 'newdataframe [nrow (newdataframe) +1,] <- existingdataframe [i,]' – kirancodify

144

Ecco una soluzione che evita la (spesso lento) rbind chiamata:

existingDF <- as.data.frame(matrix(seq(20),nrow=5,ncol=4)) 
r <- 3 
newrow <- seq(4) 
insertRow <- function(existingDF, newrow, r) { 
    existingDF[seq(r+1,nrow(existingDF)+1),] <- existingDF[seq(r,nrow(existingDF)),] 
    existingDF[r,] <- newrow 
    existingDF 
} 

> insertRow(existingDF, newrow, r) 
    V1 V2 V3 V4 
1 1 6 11 16 
2 2 7 12 17 
3 1 2 3 4 
4 3 8 13 18 
5 4 9 14 19 
6 5 10 15 20

Se la velocità è meno importante di chiarezza, quindi @ soluzione di Simon funziona bene:

existingDF <- rbind(existingDF[1:r,],newrow,existingDF[-(1:r),]) 
> existingDF 
    V1 V2 V3 V4 
1 1 6 11 16 
2 2 7 12 17 
3 3 8 13 18 
4 1 2 3 4 
41 4 9 14 19 
5 5 10 15 20

(Nota indice di noi r in modo diverso).

E infine, parametri di riferimento:

library(microbenchmark) 
microbenchmark(
    rbind(existingDF[1:r,],newrow,existingDF[-(1:r),]), 
    insertRow(existingDF,newrow,r) 
) 

Unit: microseconds 
                expr  min  lq median  uq  max 
1      insertRow(existingDF, newrow, r) 660.131 678.3675 695.5515 725.2775 928.299 
2 rbind(existingDF[1:r, ], newrow, existingDF[-(1:r), ]) 801.161 831.7730 854.6320 881.6560 10641.417

benchmark

Come @MatthewDowle punta sempre a me, parametri di riferimento devono essere esaminati per la scalatura come la dimensione del problema aumenta. Eccoci allora:

benchmarkInsertionSolutions <- function(nrow=5,ncol=4) { 
    existingDF <- as.data.frame(matrix(seq(nrow*ncol),nrow=nrow,ncol=ncol)) 
    r <- 3 # Row to insert into 
    newrow <- seq(ncol) 
    m <- microbenchmark(
    rbind(existingDF[1:r,],newrow,existingDF[-(1:r),]), 
    insertRow(existingDF,newrow,r), 
    insertRow2(existingDF,newrow,r) 
) 
    # Now return the median times 
    mediansBy <- by(m$time,m$expr, FUN=median) 
    res <- as.numeric(mediansBy) 
    names(res) <- names(mediansBy) 
    res 
} 
nrows <- 5*10^(0:5) 
benchmarks <- sapply(nrows,benchmarkInsertionSolutions) 
colnames(benchmarks) <- as.character(nrows) 
ggplot(melt(benchmarks), aes(x=Var2,y=value,colour=Var1)) + geom_line() + scale_x_log10() + scale_y_log10()

@ soluzione di Roland scale abbastanza bene, anche con la chiamata a rbind:

               5  50  500 5000 50000  5e+05 
insertRow2(existingDF, newrow, r)      549861.5 579579.0 789452 2512926 46994560 414790214 
insertRow(existingDF, newrow, r)      895401.0 905318.5 1168201 2603926 39765358 392904851 
rbind(existingDF[1:r, ], newrow, existingDF[-(1:r), ]) 787218.0 814979.0 1263886 5591880 63351247 829650894

tracciate su una scala lineare:

linear

E un scala log-log:

log-log

fonte

2012-07-19 13:56:01

L'inserimento di una riga alla fine dà un comportamento strano! – Maarten

@Maarten Con quale funzione? –

Immagino che sia lo stesso strano comportamento che sto descrivendo qui: http://stackoverflow.com/questions/19927806/efficiently-add-numeric-columns-and-rows-with-na-and-not-knowing-colnames – PatrickT

insertRow2 <- function(existingDF, newrow, r) { 
    existingDF <- rbind(existingDF,newrow) 
    existingDF <- existingDF[order(c(1:(nrow(existingDF)-1),r-0.5)),] 
    row.names(existingDF) <- 1:nrow(existingDF) 
    return(existingDF) 
} 

insertRow2(existingDF,newrow,r) 

    V1 V2 V3 V4 
1 1 6 11 16 
2 2 7 12 17 
3 1 2 3 4 
4 3 8 13 18 
5 4 9 14 19 
6 5 10 15 20 

microbenchmark(
+ rbind(existingDF[1:r,],newrow,existingDF[-(1:r),]), 
+ insertRow(existingDF,newrow,r), 
+ insertRow2(existingDF,newrow,r) 
+) 
Unit: microseconds 
                expr  min  lq median  uq  max 
1      insertRow(existingDF, newrow, r) 513.157 525.6730 531.8715 544.4575 1409.553 
2      insertRow2(existingDF, newrow, r) 430.664 443.9010 450.0570 461.3415 499.988 
3 rbind(existingDF[1:r, ], newrow, existingDF[-(1:r), ]) 606.822 625.2485 633.3710 653.1500 1489.216

fonte

2012-07-20 21:17:47 Roland

Questa è una soluzione interessante. Non riesco ancora a capire perché sia molto più veloce della chiamata simultanea a 'rbind', ma sono incuriosito. –

-4

ad esempio si desidera aggiungere file di variabili da 2 a variabili 1 di dati denominato "bordi" proprio fare in questo modo

allEdges <- data.frame(c(edges$V1,edges$V2))

fonte

2015-04-28 00:21:01 user3670684

Si dovrebbe cercare dplyr pacchetto

library(dplyr) 
a <- data.frame(A = c(1, 2, 3, 4), 
       B = c(11, 12, 13, 14)) 


system.time({ 
for (i in 50:1000) { 
    b <- data.frame(A = i, B = i * i) 
    a <- bind_rows(a, b) 
} 

})

Uscita

user system elapsed 
    0.25 0.00 0.25

In contrasto con l'utilizzo rbind funzione

a <- data.frame(A = c(1, 2, 3, 4), 
       B = c(11, 12, 13, 14)) 


system.time({ 
    for (i in 50:1000) { 
     b <- data.frame(A = i, B = i * i) 
     a <- rbind(a, b) 
    } 

})

uscita

user system elapsed 
    0.49 0.00 0.49

C'è del guadagno di prestazioni.

fonte

2015-12-12 13:48:58

Aggiungi una nuova riga al dataframe, a un determinato indice di riga, non aggiunto?

risposta

Problemi correlati