2012-08-05 9 views
13
# -*- coding: utf-8 -*- 
# Python3 
import urllib 
import urllib.request as url_req 
opener = url_req.build_opener() 
url='http://zh.wikipedia.org/wiki/'+"毛泽东" 
opener.open(url).read() 
# opener.open(url.encode("utf-8")).read() 
# # doesn't work either 

quando l'eseguo, si lamenta che:Come gestire la stringa unicode nell'URL in python3?

UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-12: ordinal not in range(128)

ma non posso usare .encode() sia come si lamenterà:

Traceback (most recent call last): 
    File "t.py", line 8, in <module> 
    opener.open(url.encode("utf-8")).read() 
    File "/usr/local/Cellar/python3/3.2.2/lib/python3.2/urllib/request.py", line 360, in open 
    req.timeout = timeout 
AttributeError: 'bytes' object has no attribute 'timeout' 

Chiunque sa come trattare con quel ?

+1

I parametri URL devono essere correttamente citati utilizzando urllib.quote() –

risposta

19

si potrebbe usare urllib.parse.quote() per codificare la sezione percorso di URL.

#!/usr/bin/env python3 
from urllib.parse import quote 
from urllib.request import urlopen 

url = 'http://zh.wikipedia.org/wiki/' + quote("毛泽东") 
content = urlopen(url).read() 
11

La fantastica libreria requests lo fa per voi, fuori dalla scatola:

>>> url='http://zh.wikipedia.org/wiki/'+"毛泽东" 
>>> import requests 
>>> r = requests.get(url) 
>>> len(r.content) 
818747