Forum:Listing HTML usage with DPL

I'd like to list deprecated HTML tags so make it easier to update them, as per User_blog:Ohmyn0/Making_your_wiki_HTML5_compliant.

The DPL help pages deal with usage of templates, links and images, but I can't find any examples of using DPL to perform a text or html search.

Does anyone know how to do this? -452 20:12, November 26, 2012 (UTC)
 * Well, I'm gonna say that DPL is not how I'd go here — but I will give you my best shot at doing it with DPL.


 * The reason I'd strongly recommend a bot in this case is because DPL is primarily category based. As far as I know — and of course I could be wrong — you can't have a single DPL instance that will produce a list of every page of your wiki, unless every page on your wiki is in a single category. With a bot, you've got much more flexibility on the kind of runs you can make. You can basically do your wiki per namespace or per contributor or per  with a bot. But with DPL, I'm pretty sure it has to be per category.


 * But, if you're happy to knit together a lot of different runs, you could do something like this.


 * This should produce a list of everything in the category "cat name" which contains " ". I may have gotten the syntax slightly wrong on the regex, because DPL uses a slighly different regex flavor than python, but that's about it. I've always found the regex implementation in DPL to be a bit ... odd, but with enough fiddling I can usually get there.


 * In any case, like I said, I think DPL is really inefficient for this task, unless your wiki is small and your category tree is but a sapling. 00:35: Tue 27 Nov 2012
 * Yeah, the way I would go about it would just be to export the pages, do a find/replace and import the pages. What I'm really interested in is the DPL to do the search as I have some other applications in mind, but figured a simple example would be more likely to get solved.
 * I've been playing around with includematch, but I wasn't sure whether my regex was off (it was working in my text editor) or whether includematch just couldn't do what I wanted (every example I can find was for matching template parameters.
 * The code you posted looks like it should work, but it's returning all contents of the category - includematch isn't effecting the output.
 * At the moment, I'm thinking that includematch can only be used to match template parameters, and that's why I can't get it working.
 * Btw, you can use to list the first 500 pages on the wiki, use offset=500 to see the next 500. -452  01:25, November 27, 2012 (UTC)