FANDOM


  • Outside of all the tools in Special:SpecialPages, is there a way to see which pages are linking to an article multiple times? For example, [[Luke Skywalker]] might be referenced in the lead paragraph on [a hypothetical article], again in the body of the article, again in an image caption, and yet again in a bulleted list further down.

    Special:MostLinkedPages comes close and provides a proxy to Special:WhatLinksHere, but WhatLinksHere doesn't tell you which of those pages are linking to an article more than once in its body.

    Does such a tool exist? Thanks in advance :)

      Loading editor
    • No tool that I know of.

      You could post something at Script Suggestions board on Dev wiki.

        Loading editor
    • Thanks, I didn't know about that board. I'll check it out.

        Loading editor
    • AWB can do it. Not so good, still. Look at red "Multiple wiki-links". U can delink even. Slowly and glitchy.

        Loading editor
    • Oh, interesting. Thanks for the screenshot. I've got a lot to go through, so it will be slow. But it's something. Thank you!

        Loading editor
    • I've got some funcs from AWB and wrote preprocess module (tools-make module, VB, paste inbetween "Summary" and "Return"):

      dim removepiped as new Regex("\[\[([^\]\|]+)\|([^\]]*)\]\](.*[.\n]*)\[\[\1\|\2\]\]", RegexOptions.Singleline or RegexOptions.Compiled)

      dim removeunpiped as new Regex("\[\[([^\]]+)\]\](.*[.\n]*)\[\[\1\]\]", RegexOptions.Singleline or RegexOptions.Compiled)

      dim i

      for i=1 to 10

      ArticleText = removepiped.replace(ArticleText, "[[$1|$2]]$3$2")

      ArticleText = removeunpiped.replace(ArticleText,"[[$1]]$2$1")

      next

      it will remove some dupl links. IDK why not all of them, still something. If one familiar with vb\c#, can upgrade this code to full processing. i guess.

        Loading editor
    • Thanks, that's very helpful. I've been running it for a bit and figured out I can automatically skip if it doesn't find at least two occurrences of a string being referenced in the article using [[foo]].*[[foo]] as a regex.

      So, this is pretty much exactly what I was looking for. Thanks again!

        Loading editor
    • Hm, maybe that regex didn't work after all.

      Is there a regex I can use that will skip the page if it doesn't find multiple occurrences of [[foo]]?

        Loading editor
    • Regexp apparently works. It comes from AWB source code (i've added singleline option only, to process whole page).

      Sure, module can determine occurences count and set Skip=true. Need further research. Me not familiar with AWB program itself. With her API only, imported to another software.

        Loading editor
    • Thanks, friend. I appreciate it. I thought I found it while googling around and got lucky with [[foo]].*[[foo]], but I guess that didn't do it.

        Loading editor
    • If you want to hunt specific link (foo), then you need 2 rules:

      1. remove unpiped doubles ([[foo|bar]]...[[foo]]->[[foo|bar]]...foo): \[\[(?<s1>(foo)\|?(.*?))\]\](.*?)\[\[(\1)\]\]
        1. replacement rule: [[${s1}]]$3$1
      2. remove piped doubles ([[foo|bar]]...[[foo|nobar]]->[[foo|bar]]..nobar): \[\[(?<s1>(foo)\|?(.*?))\]\](.*?)\[\[(\1)\|?(.*?)\]\]
        1. replacement rule: [[${s1}]]$3$5

      Opt "singleline" on. If you change "foo" to ".*?", then all duplicates will be affected.

        Loading editor
    • Thanks, I'll give that a try.

        Loading editor
    • Using AWB's Database Scanner is it possible to find pages with multiple occurrences of [[Luke]] but ignore those pages that only have [[Luke]] once?

        Loading editor
    • Text - Contains; regex, singleline, ignore comments = on.

      Pattern: \[\[(?<s1>(Luke)(\|.*?)?)\]\](.*?)(\[\[\1\]\]|\[\[\1\|.*?\]\])

      It works on the test page, but i did not test it on whole database.

        Loading editor
    • That worked brilliantly, thank you so much!

        Loading editor
    • Sorry to ping you here, but I have a similar regex question.

      Starting with an already set list in AWB, if I want to replace Han Solo with [[Han Solo]] but only the first time it appears on the page (ignoring the rest), could I do that with regex?

      I tried some variations of what you worked out earlier this month for me (thanks again for that), but I couldn't get it to work right.

        Loading editor
    • find: ^.*?(han solo). replace: [[\1]]. opt: singleline.

        Loading editor
    • Hm, that replaces all the text in the article up to the first match.

        Loading editor
    • How about finding ^(.*)?(han solo) and replacing it with \1[[\2]]? (It sounds to me like it will replace the last occurrence but if the previous one replaced the first occurrence incorrectly it might be worth a try)

        Loading editor
    • it was long night before i posted the "solution"... substitution in the ms regex must be $1, not \1.

      here is working one (opt: singleline):

      find
      (^(.*?)han solo)
      
      replace
      $2[[han solo]]
      
        Loading editor
    • It looks like that works — you're a genius, thank you!

        Loading editor
    • A FANDOM user
        Loading editor
Give Kudos to this message
You've given this message Kudos!
See who gave Kudos to this message
Community content is available under CC-BY-SA unless otherwise noted.