Forum:Search for duplicate files

Hello, I'm just wondering if there's a way to list all of the duplicate files on a wiki. I know there's Special:FileDuplicateSearch, but that only looks at one file at a time. Is there a special page (or an extension, or anything) that will churn out a list of all of the duplicates? Thanks, Cook Me Plox
 * The API: /api.php?action=query&generator=allimages&prop=duplicatefiles -- 02:50, July 1, 2010 (UTC)
 * Sorry, but I'm not really sure what I do with that. I got this, but I don't know how to use it.  Cook Me Plox 06:56, July 1, 2010 (UTC)

You take the url and see the   at the top? you keep adding that to the url to get to the next page. remember that you have to URLencode (google it for a table to do it by hand) certain things, and spaces turn into underscores. so that first one would become and so on and so on. If there are any dupes, you will see it. --Uberfuzzy 07:27, July 1, 2010 (UTC)
 * Thanks for that. But I'm still not seeing where the dupes are.  Is it in the pageid?  Sorry I'm not seeing this.  Cook Me Plox 20:13, July 1, 2010 (UTC)
 * Probably a better URL with an example is http://runescape.wikia.com/api.php?action=query&generator=allimages&prop=duplicatefiles&gailimit=500 where at least for me it shows up in the first result (notably there are some oddities in the file names of the first few that might merit other further attention). -- 20:26, July 1, 2010 (UTC)
 * Okay, I see the first duplicate file among all the other things. But how do I get to the next page?  Adding, for isntance, gaifrom="(Swamp) Snake hide.png", it doesn't start with that one.  I'm rather confused.  Sorry I'm not grasping this :/ Cook Me Plox 20:49, July 1, 2010 (UTC)
 * The continuation URL for the previous one is this. You really should probably contact Wikia about the first few entries on there, as there is some oddity going on with images with double colons between them and their namespace as well as other similar weirdness. Also see this URL to show more duplicate images for each image. -- 21:06, July 1, 2010 (UTC)
 * EDIT: The duplicate first few files (especially those without extensions) appear to be "uploaded videos" which are just links to other sites. I would say this is still a bug and should still be reported but it doesn't seem to be exclusive to your site - it occurs on WoWWiki too. -- 21:11, July 1, 2010 (UTC)

Is there any way to automate this process to only output duplicated files? Duskey ( talk ) 19:03, August 25, 2010 (UTC)
 * Not really, you could use a regular expression to eliminate the non-duplicated. -- 19:06, August 25, 2010 (UTC)


 * It might be simplest to use a google search for the duplicate file notice in your files. -- ◄mendel► 20:37, August 25, 2010 (UTC)

I have written some JavaScript to list these for you (by AJAX). First, put this code in your Special:Mypage/monaco.js (or Special:Mypage/global.js): dil = new Array; function findDupImages(gf) { output = ""; url = "/api.php?action=query&generator=allimages&prop=duplicatefiles&gailimit=500&format=json"; if (gf) url += "&gaifrom=" + gf; $.getJSON(url,function (data) { if (data.query) { pages = data.query.pages; for (pageID in pages) { dils = ","+dil.join; if (dils.indexOf(","+pages[pageID].title) == -1 && pages[pageID].title.indexOf("File::") == -1 && pages[pageID].duplicatefiles) { output += " "+pages[pageID].title+" \n\n"; for (x=0;xFile:"+pages[pageID].duplicatefiles[x].name+"\n"; dil.push("File:"+pages[pageID].duplicatefiles[x].name.replace(/_/g," ")); } output += "\n\n" } } $("#mw-dupimages").append(output); if (data["query-continue"]) setTimeout("findDupImages('"+data["query-continue"].allimages.gaifrom+"');",5000); } }); } $(function { if ($("#mw-dupimages").length) findDupImages; });

Then create a page with this content:

Then you can browse to that page and it will create a list of duplicate images for you (every 5 seconds it will add more until it exhausts the list). Please let me know if you have any questions. -- 21:45, August 26, 2010 (UTC)


 * Just tested it out and it seems to work, thanks pcj. Duskey ( talk ) 13:42, August 27, 2010 (UTC)