2015-07-11 11 views
6

Ho due dataset grandi, df1 e df2. Il primo set di dati, df1, contiene le colonne 'ID' e 'actual.data'.Verifica se la data è compresa tra due date in R

df1 <- data.frame(ID=c(1,1,1,2,3,4,4), actual.date=c('10/01/1997','2/01/1998','5/01/2002','7/01/1999','9/01/2005','5/01/2006','2/03/2003')); 
dcis <- grep('date$',names(df1)); 
df1[dcis] <- lapply(df1[dcis],as.Date,'%m/%d/%Y'); 
df1; 

    ID actual.date 
1 1 1997-10-01 
2 1 1998-02-01 
3 1 2002-05-01 
4 2 1999-07-01 
5 3 2005-09-01 
6 4 2006-05-01 
7 4 2003-02-03 

Il secondo set di dati, DF2, contiene due campi data, 'prima, la data' e 'after.date', che rappresentano una data di inizio e di fine, rispettivamente:

df2 <- data.frame(ID=c(1,1,1,2,3,4,4,4), before.date=c('10/1/1996','1/1/1998','1/1/2000','1/1/2001','1/1/2001','1/1/2001','10/1/2004','10/3/2004'), after.date=c('12/1/1996','9/30/2003','12/31/2004','3/31/2006','9/30/2006','9/30/2005','12/30/2004','11/28/2004')); 
dcis <- grep('date$',names(df2)); 
df2[dcis] <- lapply(df2[dcis],as.Date,'%m/%d/%Y'); 
df2; 

    ID before.date after.date 
1 1 1996-10-01 1996-12-01 
2 1 1998-01-01 2003-09-30 
3 1 2000-01-01 2004-12-31 
4 2 2001-01-01 2006-03-31 
5 3 2001-01-01 2006-09-30 
6 4 2001-01-01 2005-09-30 
7 4 2004-10-01 2004-12-30 
8 4 2004-10-03 2004-11-28 

Il mio obiettivo è per creare una nuova colonna alla fine di df1 chiamata 'match' che indica se il 'actual.date' per ogni riga in df1 è tra 'before.date' e 'after.date' di una qualsiasi delle osservazioni dello stesso ID in df2. Se è tra, voglio dare alla colonna 'match' un valore di 1, altrimenti 0 (include istanze senza corrispondenza 'ID').

Questa è l'uscita spero in:

ID before.date match 
1 1 1997-10-01 0 
2 1 1998-02-01 1 
3 1 2002-05-01 1 
4 2 1999-07-01 0 
5 3 2005-09-01 1 
7 4 2006-05-01 0 
8 4 2003-02-03 1 

Penso che questo può essere fatto con un ciclo for(), ma non sono troppo informato circa R.

Dati di esempio:

DF1 

structure(list(cikcode = c("20", "20", "20", "20", "20", "20", 
"20", "20", "20", "20", "20", "20", "20", "20", "20", "20", "20", 
"20", "20", "1750"), auditorkey = c("4", "4", "5", "5", "5", 
"5", "6", "6", "6", "6", "6", "6", "6", "6", "6", "6", "6", "6", 
"6", "4"), yearendeddate = structure(c(4L, 4L, 2L, 2L, 3L, 3L, 
5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 1L), .Label = c("2000-05-31", 
"2000-12-30", "2001-12-29", "2002-12-28", "2004-01-03", "2005-01-01", 
"2005-12-31", "2006-12-30", "2007-12-29", "2009-01-03", "2010-01-02" 
), class = "factor"), source = structure(c(1L, 3L, 1L, 3L, 2L, 
3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L, 2L), .Label = c("10-K", 
"10-K405", "DEF 14A"), class = "factor"), sourcedate = structure(c(6L, 
7L, 2L, 3L, 4L, 5L, 8L, 9L, 10L, 20L, 11L, 12L, 13L, 14L, 15L, 
16L, 17L, 18L, 19L, 1L), .Label = c("2000-08-24", "2001-03-26", 
"2001-03-28", "2002-03-20", "2002-03-25", "2003-03-27", "2003-03-31", 
"2004-04-01", "2004-04-06", "2005-03-31", "2006-03-23", "2006-03-28", 
"2007-03-09", "2007-03-27", "2008-03-12", "2008-04-04", "2009-03-13", 
"2009-04-06", "2010-03-15", "2005-04-04"), class = "factor"), 
    financialsdate = structure(c(4L, 4L, 2L, 2L, 3L, 3L, 5L, 
    5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 1L), .Label = c("2000-05-31", 
    "2000-12-30", "2001-12-29", "2002-12-28", "2004-01-03", "2005-01-01", 
    "2005-12-31", "2006-12-30", "2007-12-29", "2009-01-03", "2010-01-02" 
    ), class = "factor"), auditopinionkey = c("3538", "NA", "66900", 
    "NA", "78252", "NA", "39225", "NA", "84748", "NA", "102979", 
    "NA", "120889", "NA", "148621", "NA", "171604", "NA", "192814", 
    "156138"), auditorstatecode = structure(c(2L, NA, 2L, NA, 
    2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, 
    1L), .Label = c("IL", "PA"), class = "factor"), auditorstatename = structure(c(2L, 
    NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 2L, NA, 
    2L, NA, 2L, 1L), .Label = c("ILLINOIS", "PENNSYLVANIA"), class = "factor"), 
    goingconcern = structure(c(1L, NA, 1L, NA, 1L, NA, 1L, NA, 
    1L, NA, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 1L, 1L), .Label = "No", class = "factor"), 
    goingconcernissuekeylist = structure(c(1L, NA, 1L, NA, 1L, 
    NA, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 1L, 1L 
    ), .Label = "", class = "factor"), goingconcernissuephraselist = structure(c(1L, 
    NA, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 
    1L, NA, 1L, 1L), .Label = "", class = "factor"), isadditionalopinion = structure(c(1L, 
    NA, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 1L, NA, 
    1L, NA, 1L, 1L), .Label = "No", class = "factor"), restatement = c("NA", 
    "0", "NA", "0", "NA", "0", "NA", "0", "NA", "0", "NA", "0", 
    "NA", "0", "NA", "0", "NA", "0", "0", "NA"), yearended = c("NA", 
    "2002", "NA", "2000", "NA", "2001", "NA", "2003", "NA", "2004", 
    "NA", "2005", "NA", "2006", "NA", "2007", "NA", "2008", "2009", 
    "NA"), assets = c("50,459,000", "50,459,000", "54,421,000", 
    "54,421,000", "47,644,000", "47,644,000", "83,081,000", "83,081,000", 
    "93,016,000", "93,016,000", "89,110,000", "89,110,000", "140,996,000", 
    "140,996,000", "184,118,000", "184,118,000", "199,444,000", 
    "199,444,000", "204,236,000", "740,998,000"), auditfees = c("123,700", 
    "123,700", "200,000", "200,000", "185,000", "185,000", "137,100", 
    "137,100", "225,000", "225,000", "244,000", "244,000", "574,000", 
    "574,000", "669,000", "669,000", "680,000", "680,000", "643,000", 
    "NA"), auditor = c("KPMG LLP", "KPMG LLP", "Arthur Andersen LLP", 
    "Arthur Andersen LLP", "Arthur Andersen LLP", "Arthur Andersen LLP", 
    "Grant Thornton LLP", "Grant Thornton LLP", "Grant Thornton LLP", 
    "Grant Thornton LLP", "Grant Thornton LLP", "Grant Thornton LLP", 
    "Grant Thornton LLP", "Grant Thornton LLP", "Grant Thornton LLP", 
    "Grant Thornton LLP", "Grant Thornton LLP", "Grant Thornton LLP", 
    "Grant Thornton LLP", "KPMG LLP"), earnings = c("3,284,000", 
    "3,284,000", "5,838,000", "5,838,000", "1,048,000", "1,048,000", 
    "NA", "", "NA", "", "NA", "", "NA", "", "21,321,000", "21,321,000", 
    "25,773,000", "25,773,000", "21,555,000", "35,163,000"), 
    naicscode = c("334513", "334513", "334513", "334513", "334513", 
    "334513", "334513", "334513", "334513", "334513", "334513", 
    "334513", "334513", "334513", "334513", "334513", "334513", 
    "334513", "334513", "334613"), revenue = c("68,231,000", 
    "68,231,000", "84,912,000", "84,912,000", "71,819,000", "71,819,000", 
    "94,676,000", "94,676,000", "112,494,000", "112,494,000", 
    "118,940,000", "118,940,000", "148,127,008", "148,127,008", 
    "201,677,000", "201,677,000", "243,018,000", "243,018,000", 
    "190,774,000", "1,024,333,000"), siccode = c("3823", "3823", 
    "3823", "3823", "3823", "3823", "3823", "3823", "3823", "3823", 
    "3823", "3823", "3823", "3823", "3823", "3823", "3823", "3823", 
    "3823", "3720"), statecode = c("NJ", "NJ", "NJ", "NJ", "NJ", 
    "NJ", "NJ", "NJ", "NJ", "NJ", "NJ", "NJ", "NJ", "NJ", "NJ", 
    "NJ", "NJ", "NJ", "NJ", "IL"), statename = c("NEW JERSEY", 
    "NEW JERSEY", "NEW JERSEY", "NEW JERSEY", "NEW JERSEY", "NEW JERSEY", 
    "NEW JERSEY", "NEW JERSEY", "NEW JERSEY", "NEW JERSEY", "NEW JERSEY", 
    "NEW JERSEY", "NEW JERSEY", "NEW JERSEY", "NEW JERSEY", "NEW JERSEY", 
    "NEW JERSEY", "NEW JERSEY", "NEW JERSEY", "ILLINOIS"), ticker = c("NA", 
    "", "NA", "", "NA", "", "NA", "", "NA", "", "NA", "", "NA", 
    "", "NA", "", "NA", "", "", "AIR")), .Names = c("cikcode", 
"auditorkey", "yearendeddate", "source", "sourcedate", "financialsdate", 
"auditopinionkey", "auditorstatecode", "auditorstatename", "goingconcern", 
"goingconcernissuekeylist", "goingconcernissuephraselist", "isadditionalopinion", 
"restatement", "yearended", "assets", "auditfees", "auditor", 
"earnings", "naicscode", "revenue", "siccode", "statecode", "statename", 
"ticker"), row.names = c(NA, 20L), class = "data.frame") 


DF2 

structure(list(cikcode = c(320193L, 72971L, 72971L, 200406L, 
40545L, 40545L, 1114448L, 19617L, 19617L, 1067983L, 70858L, 313807L, 
1578845L, 1113172L, 64803L, 1135644L, 731766L, 14272L, 14272L, 
66740L), auditoratdisclosuredate = c("KPMG LLP", "KPMG LLP", 
"KPMG LLP", "PricewaterhouseCoopers LLP", "KPMG LLP", "KPMG LLP", 
"PricewaterhouseCoopers LLP", "PricewaterhouseCoopers LLP", "PricewaterhouseCoopers LLP", 
"Deloitte & Touche LLP", "PricewaterhouseCoopers LLP", "Ernst & Young LLP", 
"PricewaterhouseCoopers LLP", "PricewaterhouseCoopers LLP", "KPMG LLP", 
"KPMG LLP", "Deloitte & Touche LLP", "PricewaterhouseCoopers LLP", 
"PricewaterhouseCoopers LLP", "PricewaterhouseCoopers LLP"), 
    auditoratdisclosuredatekey = c("4", "4", "4", "1", "4", "4", 
    "1", "1", "1", "3", "1", "2", "1", "1", "4", "4", "3", "1", 
    "1", "1"), auditorduringrestatedperiod = c("|KPMG LLP|", 
    "|KPMG LLP|", "|KPMG LLP|", "|PricewaterhouseCoopers LLP|", 
    "|KPMG LLP|", "|KPMG LLP|", "|PricewaterhouseCoopers LLP|PricewaterhouseCoopers LLP (PricewaterhouseCoopers AG, SWITZERLAND)|", 
    "|PricewaterhouseCoopers LLP|", "|PricewaterhouseCoopers LLP|", 
    "|Deloitte & Touche LLP|", "|PricewaterhouseCoopers LLP|PricewaterhouseCoopers LLP (Price Waterhouse & Co SRL, ARGENTINA)|", 
    "|Ernst & Young LLP|Ernst & Young LLP (Ernst & Young LLP, UNITED KINGDOM)|", 
    "|PricewaterhouseCoopers LLP|", "|Deloitte & Touche LLP (Deloitte Touche Tohmatsu Auditores Independentes, BRAZIL)|Deloitte & Touche LLP (KPMG Auditores Independentes, BRAZIL)|KPMG LLP (Deloitte Touche Tohmatsu Auditores Independentes, BRAZIL)|KPMG LLP (KPMG Auditores Independentes, BRAZIL)|", 
    "|KPMG LLP|", "|KPMG LLP (KPMG Deutsche Treuhand-Gesellschaft AG WPG, GERMANY)|", 
    "|Arthur Andersen LLP|Deloitte & Touche LLP|", "|PricewaterhouseCoopers LLP|", 
    "|PricewaterhouseCoopers LLP|", "|PricewaterhouseCoopers LLP|" 
    ), auditorduringrestatedperiodkeys = c("|4|", "|4|", "|4|", 
    "|1|", "|4|", "|4|", "|1|", "|1|", "|1|", "|3|", "|1|", "|2|", 
    "|1|", "|3|4|", "|4|", "|4|", "|5|3|", "|1|", "|1|", "|1|" 
    ), auditoropinionperiodendduringrestatedperiod = c("|KPMG LLP|", 
    "|KPMG LLP|", "|KPMG LLP|", "", "|KPMG LLP|", "|KPMG LLP|", 
    "|PricewaterhouseCoopers LLP[PricewaterhouseCoopers AG]|", 
    "|PricewaterhouseCoopers LLP|", "", "", "|PricewaterhouseCoopers LLP|", 
    "|Ernst & Young LLP[Ernst & Young LLP]|", "|PricewaterhouseCoopers LLP|", 
    "|KPMG LLP[KPMG Auditores Independentes]|", "", "|KPMG LLP[KPMG Deutsche Treuhand-Gesellschaft AG WPG]|", 
    "|Arthur Andersen LLP|Deloitte & Touche LLP|", "|PricewaterhouseCoopers LLP|", 
    "|PricewaterhouseCoopers LLP|", "|PricewaterhouseCoopers LLP|" 
    ), auditoropinionperiodendduringrestatedperiodkeys = c("|4|", 
    "|4|", "|4|", "", "|4|", "|4|", "|1|", "|1|", "", "", "|1|", 
    "|2|", "|1|", "|4|", "", "|4|", "|5|3|", "|1|", "|1|", "|1|" 
    ), restatementkey = c("10196", "12617", "42632", "44280", 
    "857", "22310", "7995", "10187", "38972", "5379", "6357", 
    "9495", "49566", "32005", "45575", "15890", "13410", "695", 
    "3752", "36923"), restatedperiodbegin = c("1996-10-01", "1998-01-01", 
    "2005-01-01", "2012-01-01", "2001-01-01", "2001-01-01", "2003-01-01", 
    "2003-01-01", "2012-01-01", "2001-07-01", "2001-01-01", "2000-01-01", 
    "2014-01-01", "2007-01-01", "1997-01-01", "2000-10-01", "1994-01-01", 
    "2001-01-01", "1999-01-01", "2010-01-01"), restatedperiodended = c("2006-04-01", 
    "2003-09-30", "2007-09-30", "2012-07-01", "2006-03-31", "2006-09-30", 
    "2004-12-31", "2006-03-31", "2012-03-31", "2001-09-30", "2005-09-30", 
    "2004-12-31", "2014-12-31", "2008-12-31", "1998-12-31", "2006-09-30", 
    "2006-09-30", "2003-09-30", "2002-06-30", "2011-03-31"), 
    disclosure = c("Press Release", "10-Q/A", "10-K", "10-Q", 
    "8-K", "8-K", "20-F", "8-K", "8-K", "10-Q/A", "8-K", "20-F/A", 
    "10-Q", "20-F", "8-K", "20-F", "8-K", "8-K", "Press Release", 
    "10-Q"), disclosuredate = c("2006-06-29", "2004-01-16", "2008-02-29", 
    "2012-11-09", "2005-05-06", "2007-01-19", "2006-01-30", "2006-08-03", 
    "2012-07-13", "2002-03-15", "2006-02-22", "2006-06-13", "2015-05-11", 
    "2010-04-21", "1999-11-15", "2006-12-11", "2006-04-07", "2004-01-29", 
    "2002-10-22", "2011-08-04"), cumulativechangeinnetincome = c(NaN, 
    NaN, 0, 0, 3.04344437838208e-315, NaN, NaN, 0, NaN, 0, 2.66795448754273e-315, 
    0, 0, 0, 0, NaN, NaN, NaN, NaN, 0), accountingrulegaapfasbapplicationfailures = c("|Deferred, stock-based and/or executive comp issues|Deferred, stock-based options backdating only (subcategory)|", 
    "|Lease, SFAS 5, legal, contingency and commitment issues|Lease, leasehold and FAS 13 (98) only (subcategory)|", 
    "|Cash flow statement (SFAS 95) classification errors|Lease, SFAS 5, legal, contingency and commitment issues|", 
    "|Cash flow statement (SFAS 95) classification errors|", 
    "|Financial derivatives/hedging (FAS 133) acct issues|", 
    "|Financial derivatives/hedging (FAS 133) acct issues|Foreign, related party, affiliated, or subsidiary issues|", 
    "|PPE intangible or fixed asset (value/diminution) issues|Acquisitions, mergers, disposals, re-org acct issues|Consolidation issues incl Fin 46 variable interest & off-B/S|Deferred, stock-based and/or executive comp issues|Tax expense/benefit/deferral/other (FAS 109) issues|GAAP - Changes in Acct Principles FASB/EITF or Foreign GAAP|Acquisitions, mergers, only (subcategory) acct issues|PPE issues - Intangible assets, goodwill only (subcategory)|Pension and other post-retirement benefit issues|", 
    "|Accounts/loans receivable, investments & cash issues|Cash flow statement (SFAS 95) classification errors|", 
    "|Revenue recognition issues|", "|Consolidation issues incl Fin 46 variable interest & off-B/S|Intercompany, investment in subs./affiliate issues|", 
    "|Financial derivatives/hedging (FAS 133) acct issues|", 
    "|Revenue recognition issues|Financial derivatives/hedging (FAS 133) acct issues|Inventory, vendor and/or cost of sales issues|GAAP - Changes in Acct Principles FASB/EITF or Foreign GAAP|", 
    "|Debt, quasi-debt, warrants & equity (BCF) security issues|Foreign, related party, affiliated, or subsidiary issues|Cash flow statement (SFAS 95) classification errors|Foreign, subsidiary only issues (subcategory)|", 
    "", "|Acquisitions, mergers, disposals, re-org acct issues|Acquisitions, mergers, only (subcategory) acct issues|", 
    "|Tax expense/benefit/deferral/other (FAS 109) issues|", 
    "|Deferred, stock-based and/or executive comp issues|Deferred, stock-based options backdating only (subcategory)|", 
    "|Acquisitions, mergers, disposals, re-org acct issues|Consolidation issues incl Fin 46 variable interest & off-B/S|Accounts/loans receivable, investments & cash issues|Tax expense/benefit/deferral/other (FAS 109) issues|Fin Statement, footnote & segment disclosure issues|Pension and other post-retirement benefit issues|", 
    "|Revenue recognition issues|Accounts/loans receivable, investments & cash issues|Inventory, vendor and/or cost of sales issues|", 
    "|Cash flow statement (SFAS 95) classification errors|"), 
    financialfraudirregularitiesandmisrepresentations = c("", 
    "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
    "", "", "", ""), errorsaccountingandclericalapplications = c("", 
    "", "", "", "", "", "", "", "", "", "", "", "", "|EPS, ratio and classification of income statement issues|", 
    "", "", "", "", "", ""), othersignificantissues = c("", "", 
    "", "", "", "", "", "", "|Z - Material Weakness - Section 404 or 302 issues identified|", 
    "", "", "", "", "", "", "", "", "", "|Y - Loan covenant violations/issues|", 
    ""), secinvestigation = c("", "", "Y", "", "Y", "Y", "", 
    "", "", "", "", "", "", "", "", "", "", "", "", "")), .Names = c("cikcode", 
"auditoratdisclosuredate", "auditoratdisclosuredatekey", "auditorduringrestatedperiod", 
"auditorduringrestatedperiodkeys", "auditoropinionperiodendduringrestatedperiod", 
"auditoropinionperiodendduringrestatedperiodkeys", "restatementkey", 
"restatedperiodbegin", "restatedperiodended", "disclosure", "disclosuredate", 
"cumulativechangeinnetincome", "accountingrulegaapfasbapplicationfailures", 
"financialfraudirregularitiesandmisrepresentations", "errorsaccountingandclericalapplications", 
"othersignificantissues", "secinvestigation"), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20"), class = "data.frame") 
+1

Si può usare 'foverlaps' da' data.table' – akrun

+0

ottengo alcuni errori da copiare/incollare i dput cioè 'Errore nella struttura (Lista (cikcode = c ("20", "20"," 20 "," 20 "," 20 "," 20 ",: oggetto 'auditorkey' non trovato 'Mi chiedo perché alcuni non sono quotati alla fine? Per esempio' auditorkey',' auditopinionkey', ... – akrun

+0

I Ho provato a inserire le virgolette e poi a eseguirlo nuovamente. Ora l'errore è "Errore nella struttura (c (1L, 3L, 1L, 3L, 2L, 3L, 1L, 3L, 1L, 3L, 1L, 3L, 1L,: oggetto 'K405' non trovato 'Riesci a eseguire nuovamente il dput per vedere se è tutto a posto – akrun

risposta

8

È possibile utilizzare foverlaps da data.table. Converti sia 'data.frame in' data.table 'con colonne' inizio/fine '. Impostare la colonna chiave come i nomi delle colonne di ciascun set di dati. Utilizzare foverlaps per ottenere l'indice numerico che può essere convertito in "corrispondenza" binaria in base ai valori NA in esso contenuti.

library(data.table)#v1.9.5+ 
dt1 <- data.table(ID=df1$ID, start=df1$actual.date, end=df1$actual.date) 
setkeyv(dt1, colnames(dt1)) 
dt2 <- as.data.table(df2) 
setnames(dt2, 2:3, c('start', 'end')) 
setkeyv(dt2, colnames(dt2)) 
indx <- foverlaps(dt1, dt2, type='within', which=TRUE, mult='first') 
dt1[, match:= +(!is.na(indx))][,end:=NULL] 
setnames(dt1, 1:2, colnames(df1)) 
dt1 
# ID actual.date match 
#1: 1 1997-10-01  0 
#2: 1 1998-02-01  1 
#3: 1 2002-05-01  1 
#4: 2 1999-07-01  0 
#5: 3 2005-09-01  1 
#6: 4 2003-02-03  1 
#7: 4 2006-05-01  0 
+0

questa soluzione incorpora un join esterno dei dati? – michael

+0

qualcosa è andato storto. Ho ricevuto i messaggi di errore: 1) Errore nei foverlaps (dt1, dt2, type = "within", che = TRUE, mult = "first"): length (by.x)! = Length (by.y). Le colonne specificate in by.x devono corrispondere alle colonne specificate in by.y e devono avere le stesse lunghezze. 2) Errore nel setnames (dt1, 1: 2, colnames (merged_combined_files12)): 'vecchio' è lunghezza 2 ma 'nuovo' è lunghezza 25. Ho guardato i miei dati e ci sono solo tre colonne che mostrano: 'ID', 'start' e 'match'. E per "match" c'è un valore 1 per ogni singola riga – michael

+0

@michael Se guardi sia i dataset 'dt1' che' dt2', i nomi delle colonne sono gli stessi. Sto usando la versione di 'devel' di' data.table'. Hai ricevuto l'errore con i dati di esempio (che ha funzionato bene per me)? – akrun

2

Ecco una soluzione con dplyr

library(dplyr) 
dat <- inner_join(df1, df2, by = "ID") 
dat %>% rowwise() %>% 
     mutate(match = ifelse(between(actual.date, before.date, after.date), 1, 0)) %>% 
     select(-c(before.date, after.date)) %>% 
     arrange(actual.date, desc(match)) %>% 
     distinct(actual.date) 

l'uscita è leggermente diverso perché ordinare l'actual.date, forse questo è un problema, io cancello la mia soluzione, se del caso.

Source: local data frame [7 x 3] 

    ID actual.date match 
1 1 1997-10-01  0 
2 1 1998-02-01  1 
3 2 1999-07-01  0 
4 1 2002-05-01  1 
5 4 2003-02-03  1 
6 3 2005-09-01  1 
7 4 2006-05-01  0 
+0

questa soluzione incorpora un join esterno dei dati? – michael

+0

@michael si unisce ai dati, ho usato la funzione 'inner_join'. Lo so ... quando ho sviluppato la soluzione mi è sembrato un modo semplice. – SabDeM

+0

Non correlato, ma hai idea del motivo per cui i dati nella mia colonna vengono cancellati quando provi a convertire i fattori di conversione e i caratteri in date? per esempio, provo df2 $ restatedperiodbegin <- as.Date (df2 $ restatedperiodbegin, "% m /% d /%") – michael

Problemi correlati