Come ottenere il sorgente HTML di un sito web con PhantomJS

Di seguito è riportato un esempio di PhantomJS che ottiene qualche elemento dal DOM id da una pagina web esterna:Come ottenere il sorgente HTML di un sito web con PhantomJS

var page = require('webpage').create(); 
console.log('The default user agent is ' + page.settings.userAgent); 
page.settings.userAgent = 'SpecialAgent'; 
page.open('http://www.httpuseragent.org', function(status) { 
    if (status !== 'success') { 
    console.log('Unable to access network'); 
    } else { 
    var ua = page.evaluate(function() { 
     return document.getElementById('myagent').textContent; 
    }); 
    console.log(ua); 
    } 
    phantom.exit(); 
});

voglio ottenere l'intero sorgente HTML di una pagina web ... Come faccio a fare questo?

fonte

2013-11-24 MOB

Se si desidera che il sorgente HTML, quindi usare qualcosa come [il modulo HTTP] (http://nodejs.org/docs/v0.5.2/ api/http.html # http.request) piuttosto che eseguire la pagina attraverso un browser (che eseguirà JS e manipolerà il DOM con esso). – Quentin

puoi mostrarmi un esempio? – MOB

Tutto quello che dovete fare è usare page.content

var page = require('webpage').create(); 
page.onError = function(msg, trace) { 
    //prevent js errors from showing in page.content 
    return; 
}; 
page.open('http://www.httpuseragent.org', function() { 
    console.log(page.content); //page source 
    phantom.exit(); 
});

fonte

2013-11-24 12:11:02 Hessam

questo fornisce l'html dal dom (che è stato elaborato dal browser con javascript e ha subito un certo grado di correzione della sintassi) in contrapposizione all'html grezzo non elaborato servito dai server –

Questo è il punto di utilizzo di uno strumento come PhantomJS. Se si desidera che i dati grezzi utilizzino uno strumento di livello inferiore come curl o wget. –

Come ottenere il sorgente HTML di un sito web con PhantomJS

risposta

Problemi correlati