星期日, 11月 22, 2009

htmlParser malformed start tag errors

今天寫的時候遇到了這樣的問題

Traceback (most recent call last):
File "/home/cacaegg/programs/script/spider.py", line 22, in
par.feed(htmlSource)
File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
self.goahead(0)
File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
k = self.parse_starttag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
endpos = self.check_for_whole_start_tag(i)
File "/usr/lib/python2.6/HTMLParser.py", line 301, in check_for_whole_start_tag
self.error("malformed start tag")
File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
raise HTMLParseError(message, self.getpos())
HTMLParseError: malformed start tag, at line 46, column 3275

所以參考了http://bugs.python.org/issue736428來解決
主要就是做兩件事
1.override the class's error function

def error(self, message):
print message


2.add a line after line 301 in HTMLParser.py

self.updatepos(i, j)
self.error("malformed start tag")
return j # ADDED THIS LINE


暫時就先這樣解決吧!

沒有留言: