Forum:How to .. quickly permanently re-sort a very large table according to one column?

The page in question is on the Call of Duty wiki. The page and section in question are here: http://callofduty.wikia.com/wiki/Callsigns#Titles

What is the most efficient way one may go about PERMANENTLY modifying the table to sort the contents by number instead of alphabetically by name? Instead of manually rearranging the code in the correct order, is there an automatic process that may speed it up? It would make future edits to this page much easier and faster if this can be done.

Thanks!

--Yamamoto114 07:13, November 28, 2009 (UTC)


 * Are you familiar with either AWK/GAWK or perl? Either one would make very short work of this.
 * First strip the newlines (\n)
 * replace |- with \n
 * use | as your field separator to read field values into an associative array indexed by your sort key
 * in an END block use a foreach loop to print the array contents in alphabetical order using wikitable markup
 * -- najevi 13:03, November 28, 2009 (UTC)


 * Unfortunately no, I'm not familiar with any programming languages. =/


 * --Yamamoto114 23:47, November 29, 2009 (UTC)


 * In that case I suggest you read mw:Help:Sorting and use that method which works within the wiki markup coding experience you will already have. 60.226.124.163 05:00, November 30, 2009 (UTC)


 * Here you go - w:c:callofduty:Call_of_Duty_Wiki:Sandbox
 * -- najevi 15:38, November 30, 2009 (UTC)

One method (using: tr, sed, gawk)
You know I tried the method described at the above mediawiki link and whilst it works fine for small tables if the desired sort key happens to be the first column. However, it does not handle the following:
 * scale to support the 287 entry table Yamamot114 was dealing with (probably some pre-parser buffer limit was reached)
 * lend itself to sorting numerically by the first column (the classic 1, 02 and 003 digit numeral problem)
 * lend itself to sorting by any arbitrary column

For future reference the scripts/tools that I used for this are: sed, tr and gawk. These can be run in a windows command shell if you install the corresponding GNUwin32 tools.

BEGIN { RS="@"	# could not use '|-' because it also matches '|' used as FS	FS="|" maxkey=0 maxNF=0 } {			key=$2	# the field you wish to sort by - handles as a string by default key+=0	# converts the string to a number - this is necessary to sort numerically if (key>maxkey) maxkey=key if (NF>maxNF) maxNF=NF list[key]=NF+1 j=1	# note that field $1 will be any format/style used in the new row "|-" line while (j<=NF) { t[key,j]=$j j++ } } END {	# use asorti if alphabetical sort is required for (i=1; i<=maxkey; i++) if(list[i]>0) { # printf "%s\n",list[i]	#uncomment (& comment out other 2 lines) to see if NF is same for all records printf "|-%s\n",t[i,1] for ( j=2; j<=maxNF; j++) printf "|%s\n",t[i,j] } }
 * 1) awk script used to change the default order of wiki markup code for a table
 * 2) designed for numerical sort based on the value in one column
 * 3) for alphabetical sort  - disable the conversion of "key" from string to number
 * 4) STEPS
 * 5) 1. copy only the markup for the wikitable rows to a file: raw.txt
 * 6) 2. strip newlines: tr -d '\n' < raw.txt >stripped.txt
 * 7) 3. replace the new row '|-' with any unused character e.g. '@' : sed 's/|-/@/g' stripped.txt > smod.txt
 * 8) 4. gawk -f smod.txt > tabdos.txt
 * 9) 5. strip the MS-DOS CR character:  tr -d '\r'  tabunix.txt
 * 10) 6. copy the newly sorted wiki markp into the original wiki article replacimg the old
 * 1) 6. copy the newly sorted wiki markp into the original wiki article replacimg the old
 * As noted in the comments some additional/alternate code is required if an alphabetical sort is required. Google the man page for gawk and search for "asorti"
 * Also it would be wise to flag an error if number of fields (NF) is different for any record of the table.

There must be a dozen ways to achieve the same result so if you know of a more elegant method or are willing and able to develop a generic tool then please contribute here. ... mmm I wonder if this is something a Gadget could be written to handle. I imagine it comes up often because last year I needed to do almost the same thing for 350 skills for an MMORPG wiki.

-- najevi 05:29, December 1, 2009 (UTC)