Forum:Using pywikipedia bot to delete whole lines from an article

HI, i'm using the Python based wikipedia bot (as controlled by the code at replace.py), and I'm trying to figure out a regular expression (RE) that will let me strip out a clunky HTML box and replace it with a template. Adding the template to a whole series of pages in a category is straightforward enough, but the problem is getting rid of the HTML table.

Since every line of the table begins with a sharp bracket (<) and indeed ends with one (>), my initial thought is that somehow I've got to make the RE hook on to that somehow. Anybody got an idea of the exact syntax that would translate the following English:
 * Take everything from the < to the line break and delete it

or
 * Delete everything including and between

or
 * Delete everything from the point before (^) the first < you encounter to the point after ($) the last > you encounter

I can get pywikipedia to FIND the characters, but I don't know how to get it to find those characters and everything between them.

Thanks for your help.  Czech Out  ☎ | ✍  16:08, March 30, 2010 (UTC)

-regex "^(<.*\r\n){2,}" "\n" seems to work for me. If you provide a link, it'd be more likely to give you something more tailored to exactly what you're dealing with. Joey (talk) 00:10, March 31, 2010 (UTC)