Come si ottengono tutte le righe da una determinata tabella usando BeautifulSoup?

Sto imparando Python e BeautifulSoup per analizzare i dati dal web e leggere una tabella HTML. Posso leggerlo in Open Office e dice che è la Tabella n. 11.Come si ottengono tutte le righe da una determinata tabella usando BeautifulSoup?

Sembra che BeautifulSoup sia la scelta preferita, ma qualcuno può dirmi come prendere un tavolo particolare e tutte le righe? Ho esaminato la documentazione del modulo, ma non riesco a capirlo. Molti degli esempi che ho trovato online sembrano fare più del necessario.

fonte

2010-01-06 Btibert3

Questo dovrebbe essere abbastanza semplice se si dispone di una porzione di codice HTML da analizzare con BeautifulSoup. L'idea generale è di navigare verso la tabella usando il metodo findChildren, quindi è possibile ottenere il valore di testo all'interno della cella con la proprietà string.

>>> from BeautifulSoup import BeautifulSoup 
>>> 
>>> html = """ 
... <html> 
... <body> 
...  <table> 
...   <th><td>column 1</td><td>column 2</td></th> 
...   <tr><td>value 1</td><td>value 2</td></tr> 
...  </table> 
... </body> 
... </html> 
... """ 
>>> 
>>> soup = BeautifulSoup(html) 
>>> tables = soup.findChildren('table') 
>>> 
>>> # This will get the first (and only) table. Your page may have more. 
>>> my_table = tables[0] 
>>> 
>>> # You can find children with multiple tags by passing a list of strings 
>>> rows = my_table.findChildren(['th', 'tr']) 
>>> 
>>> for row in rows: 
...  cells = row.findChildren('td') 
...  for cell in cells: 
...   value = cell.string 
...   print "The value in this cell is %s" % value 
... 
The value in this cell is column 1 
The value in this cell is column 2 
The value in this cell is value 1 
The value in this cell is value 2 
>>>

fonte

2010-01-06 02:03:25

Questo era il trucco! Il codice ha funzionato e dovrei essere in grado di modificarlo secondo necessità. Grazie molto. Un'ultima domanda. Posso seguire il codice tranne quando cerchi nel tavolo i bambini th e tr. È semplicemente cercando nella mia tabella e restituendo sia l'intestazione della tabella che le righe della tabella? Se volessi solo le righe della tabella, potrei semplicemente cercare tr soltanto? molte grazie ancora! – Btibert3

Sì, '.findChildren (['th', 'tr'])' sta cercando elementi con tipo di tag 'th' o' tr'. Se vuoi solo trovare gli elementi 'tr' userai' .findChildren ('tr') '(nota non una lista, solo la stringa) –

Vale anche la pena notare che [PyQuery] (https://pythonhosted.org /pyquery/api.html) è davvero una bella alternativa a BeautifulSoup. –

Come si ottengono tutte le righe da una determinata tabella usando BeautifulSoup?

risposta

Problemi correlati