Hmm, the code on dev wiki has the same problems as in the forum article, which isn't surprising, since they have the same original author. It won't necessarily find all the duplicate files if there are a lot of them, because it ignores the possibility of dfcontinue in favor of gaifrom exclusively and because it doesn't use dflimit.
Suggesting only limited edits to that source:
Edit 1
url = "/api.php?action=query&generator=allimages&prop=duplicatefiles&gailimit=500&format=json";
should be
url = "/api.php?action=query&generator=allimages&prop=duplicatefiles&gailimit=500&dflimit=500&format=json";
Edit 2
if (gf) {
url += "&gaifrom=" + gf;
}
should be
if (gf) {
if (gf.indexOf('|') > -1) {
url += '&dfcontinue=' + encodeURIComponent(gf);
gf = gf.split('|')[0];
}
url += '&gaifrom=' + encodeURIComponent(gf);
}
Edit 3
if (data["query-continue"]) findDupImages(encodeURIComponent(data["query-continue"].allimages.gaifrom).replace(/'/g, "%27"));
should be
if (data['query-continue']) {
if (data['query-continue'].duplicatefiles) {
findDupImages(data['query-continue'].duplicatefiles.dfcontinue);
}
else {
findDupImages(data['query-continue'].allimages.gaifrom);
}
}
Moving encodeURIComponent makes it easier to test for '|'. I'm unconvinced ' needs to be encoded if encodeURIComponent skips it.
I noticed Bobogoobo also removed the rate limit. The documentation about 500 files every 2 seconds should also be edited.
HTH