Search for duplicate files (Redux)

KrytenKoro·10/19/2015 in Technical Advice & Coding Help

Following up from Forum:Search for duplicate files, how would I get this to work on the new wikia skin? I have tried adding the posted javascript to my own wikia.js and the common.js on digimon.wikia.com, and posted the div code at DigimonWiki:Sandbox, but nothing's showing up. I've found a few results going the api route, but we have a lot of images, and something that can automatically pull the duplicates would be very helpful.

(Edited by KrytenKoro)

Bobogoobo·10/20/2015

Try w:c:dev:DupImageList, that works well for me.

(Edited by Bobogoobo)

KrytenKoro·10/20/2015

Thanks! Any chance I could get a copy of the code?

(Edited by KrytenKoro)

KrytenKoro·10/20/2015

I've added the importscript line to my wiki's common.js, but http://digimon.wikia.com/wiki/DigimonWiki:Sandbox still isn't showing anything. Can anyone point out what I'm doing wrong?

(Edited by KrytenKoro)

Bobogoobo·10/20/2015

Under the new JS review process, you need to use Test Mode to allow unreviewed code to run until you have submitted the latest revision and it has been reviewed. You can enable Test Mode from the module on Common.js or on Special:JSPages on your wiki.

(Edited by Bobogoobo)

KrytenKoro·10/20/2015

How do I get it reviewed?

(Edited by KrytenKoro)

Bobogoobo·10/20/2015

Click the Submit for Review button in the module at the top right on the Common.js page. Not sure how long that usually takes.

(Edited by Bobogoobo)

Saftzie·10/20/2015

Hmm, the code on dev wiki has the same problems as in the forum article, which isn't surprising, since they have the same original author. It won't necessarily find all the duplicate files if there are a lot of them, because it ignores the possibility of dfcontinue in favor of gaifrom exclusively and because it doesn't use dflimit.

Suggesting only limited edits to that source:

Edit 1

url = "/api.php?action=query&generator=allimages&prop=duplicatefiles&gailimit=500&format=json";

should be

url = "/api.php?action=query&generator=allimages&prop=duplicatefiles&gailimit=500&dflimit=500&format=json";

Edit 2

if (gf) {
  url += "&gaifrom=" + gf;
}

should be

if (gf) {
  if (gf.indexOf('|') > -1) {
    url += '&dfcontinue=' + encodeURIComponent(gf);
    gf = gf.split('|')[0];
  }
  url += '&gaifrom=' + encodeURIComponent(gf);
}

Edit 3

if (data["query-continue"]) findDupImages(encodeURIComponent(data["query-continue"].allimages.gaifrom).replace(/'/g, "%27"));

should be

if (data['query-continue']) {
  if (data['query-continue'].duplicatefiles) {
    findDupImages(data['query-continue'].duplicatefiles.dfcontinue);
  }
  else {
    findDupImages(data['query-continue'].allimages.gaifrom);
  }
}

Moving encodeURIComponent makes it easier to test for '|'. I'm unconvinced ' needs to be encoded if encodeURIComponent skips it.

I noticed Bobogoobo also removed the rate limit. The documentation about 500 files every 2 seconds should also be edited.

HTH

(Edited by Saftzie)

Bobogoobo·10/20/2015

I'd be impressed if there were more than ten duplicates of one file. You can see more as you get rid of them anyway, but feel free to propose changes on the talk page there.

(Edited by Bobogoobo)

Saftzie·10/21/2015

It's not 10 duplicates of 1 file. It's 10 duplicates, period.

Let's say you have 1000 files. 500 of them start with the letter A. 500 are duplicates and start with the letter Z. Each A file has 1 duplicate Z file.

In the first request, you'd get a list of 500 A files, 10 of which would have Z duplicates listed. The other 490 duplicates would be skipped. gaifrom would be the first Z file.

The second get would have the 500 Z files, 10 of which would have A duplicates listed. The other 490 duplicates would be skipped.

However, dfcontinue from the first request would specify starting with the eleventh A file.

There are other scenarios where the code on dev is deficient. That's just one of them.

(Edited by Saftzie)

Saftzie·10/23/2015

For anyone who might care, I've started a code review of DupImageList.

(Edited by Saftzie)

What do you think?