Come raschiare una tabella con rvest e xpath?

utilizzando il seguente documentation ho cercato di raschiare una serie di tabelle da marketwatch.comCome raschiare una tabella con rvest e xpath?

qui è quella rappresentata dal codice sotto:

Il collegamento e XPath sono già inclusi nel codice:

url <- "http://www.marketwatch.com/investing/stock/IRS/profile" 
valuation <- url %>% 
    html() %>% 
    html_nodes(xpath='//*[@id="maincontent"]/div[2]/div[1]') %>% 
    html_table() 
valuation <- valuation[[1]]

ottengo il seguente errore:

Warning message: 
'html' is deprecated. 
Use 'read_html' instead. 
See help("Deprecated")

Grazie in anticipo.

fonte

2016-02-29 Alex Bădoi

rimuovere 'html()' e sostituire con 'read_html()' – cory

che non è un errore, è un avvertimento. il tuo codice verrà comunque eseguito con questo avviso. – SymbolixAU

grazie. fisso. –

Quel sito Web non utilizza una tabella html, quindi html_table() non riesce a trovare nulla. Utilizza acutamente le classi divcolumn e data lastcolumn.

Così si può fare qualcosa di simile

url <- "http://www.marketwatch.com/investing/stock/IRS/profile" 
valuation_col <- url %>% 
    read_html() %>% 
    html_nodes(xpath='//*[@class="column"]') 

valuation_data <- url %>% 
    read_html() %>% 
    html_nodes(xpath='//*[@class="data lastcolumn"]')

O anche

url %>% 
    read_html() %>% 
    html_nodes(xpath='//*[@class="section"]')

Per ottenere la maggior parte del tragitto.

Si prega di leggere anche il loro terms of use - in particolare 3.4.

fonte

2016-03-01 00:30:14 SymbolixAU

Come raschiare una tabella con rvest e xpath?

risposta

Problemi correlati