Forum:Using pywikipedia bot to delete whole lines from an article

HI, i'm using the Python based wikipedia bot (as controlled by the code at replace.py), and I'm trying to figure out a regular expression (RE) that will let me strip out a clunky HTML box and replace it with a template. Adding the template to a whole series of pages in a category is straightforward enough, but the problem is getting rid of the HTML table.

Since every line of the table begins with a sharp bracket (<) and indeed ends with one (>), my initial thought is that somehow I've got to make the RE hook on to that somehow. Anybody got an idea of the exact syntax that would translate the following English:
 * Take everything from the < to the line break and delete it

or
 * Delete everything including and between

or
 * Delete everything from the point before (^) the first < you encounter to the point after ($) the last > you encounter

I can get pywikipedia to FIND the characters, but I don't know how to get it to find those characters and everything between them.

Thanks for your help.  Czech Out  ☎ | ✍  16:08, March 30, 2010 (UTC)

-regex "^(<.*\r\n){2,}" "\n" seems to work for me. If you provide a link, it'd be more likely to give you something more tailored to exactly what you're dealing with. Joey (talk) 00:10, March 31, 2010 (UTC)


 * Thanks for your reply. Let's see, tons of links possible here, but w:c:tardis:2010 is as good a place as any.  Trying to get rid of the navigational table at the bottom of the page.


 * As I'm new to the world of regex, could you possibly explain the syntax above in English? I kinda understand what the characters are trying to say, but I don't understand what they mean together.  Kinda like understanding the words in a sentence but not how they fit together to describe an event that happened in the conditional past tense.  I don't understand the {2,} at all, though, nor what the  is trying to call since it's redlinked.


 * Oh wait, I'm an idiot. That's the template I'd be replacing the HTML table with isn't it? Ahhh, bit of a snag there then since I want to delete from the bottom of the page, and put my replacement on the very top line, and very first position, of the article.  So I guess I want to simply delete, then go back on a second pass and add. (I know that's not what I said originally, but the template has evolved into being something that better sits in the position of a traditional infobox.) You can see an example of where I want the template on w:c:tardis:21st century.    Czech Out   ☎ | ✍