FANDOM


  • In some cases, people will be tagging pages on a wiki with a category for months, even YEARS before someone actually creates an official page for the category to organize it under a tree structure.

    Although this information is buried somewhere within the edit history of articles, I'm wondering if anyone can think of an easy way to (automated) put together a timeline of when categories were added to pages, including preceding the date of the category's page creation.

    In some cases a person who makes the page for the category is given credit for thinking up the category when that isn't actually true, they may simply notice it was already tagged on dozens of pages and then created the page for the category to organize it under the structure.

    So I'd like to know if there's a way to find who actually first started tagging pages with a particular category. Is there some way to search the sourcecode diffs of user histories or page histories to find this out?

    In some cases, it's a lot of work manually going through every edit diff for a page (especially without informative summaries, as many neglect) to figure out who added a category and when they did it.

    Is software somehow able to dig through that and recognize category additions and return some kind of report on the times/users who added them, either for the wiki as a whole or narrowed down to particular articles?

      Loading editor
    • Google actually might be your best bet, unless the changes were very recent.

        Loading editor
    • I thought Google only scrolled current versions of articles, not old ones. Part of the problem is once a category is removed from an article, you don't know to look in it's history to see when the article was added to it.

      There isn't any kind of way to generate an "articles that used be in category A" type result.

        Loading editor
    • Oh I misunderstood what you wanted. I guess I don't understand the need to find out which pages were a part of a category, but aren't anymore. Why do you need to know this?

        Loading editor
    • If there is a later dispute about who to credit for its addition.

        Loading editor
    • the only reliable way i do see it is sorted database dump. full one, with edit history. sort pages by revision id, then do search for category:cat: 1st match will be in the oldest revision.

      the problem is: full dump might be big, so big so js will not sort it in sane amount of time. also, you will not be able to use notepad or other standard text processing tools, by same reason: they will not open hundreds of megabytes.

      although, it might be simpler if you will search for 1 page only.

        Loading editor
    • According to the description, you should be able to use api.php. api.php is located at "https://domain.fandom.com/api.php" where "domain" is the domain for your wiki (ex. "community" for Community Central). I have not verified the accuracy yet but assuming it works, you will need to do 2 queries. The first is to find the page that was first added, the second is to find out who added it.

      Query String for Finding First Page Added

      action=query&list=categorymembers&cmtitle=Category:Name&cmsort=timestamp&cmdir=asc&cmlimit=1&cmprop=ids|title|timestamp
      

      Replace "Name" with the name of the category.

      Query String for Finding Who Added the Category

      action=query&pageids=PageID&prop=revisions&rvprop=timestamp|user|content&rvstart=Timestamp
      

      Replace "PageID" with the page ID as specified by the result from the first query and replace "Timestamp" with the timestamp as provided by the first query.

        Loading editor
    • Okay, I did some testing. It appears that my solution only accounts for current members. So, if the first member has since been removed, using api.php in the manner I suggested will not work correctly. However, if you want use it, below is a revised version of the queries that eliminate unnecessary things. You need to do them in order as the second one requires information from the first.

      cmtitle=Category:CategoryName&action=query&list=categorymembers&cmsort=timestamp&cmlimit=1&cmprop=ids|timestamp
      

      Replace "CategoryName" with the name of the category.

      pageids=PageID&rvstart=Timestamp&action=query&prop=revisions&rvlimit=1&rvprop=user
      

      Replace "PageID" with the "pageid" value from the first query. Replace "Timestamp" with the "timestamp" value from the first query. The user's name will be the value of "user" from the second query.

      Example using Category:Help

      [https://community.fandom.com/api.php?cmtitle=Category:Help&action=query&list=categorymembers&cmsort=timestamp&cmlimit=1&cmprop=ids|timestamp first query]
      
      [https://community.fandom.com/api.php?pageids=2829&rvstart=2010-03-06T11:10:16Z&action=query&prop=revisions&rvlimit=1&rvprop=user second query]
      

      first query

      second query

      User: Robin Patterson


      Edit:

      While Fngplg's method would allow for searching removed members, both of our methods are unable to identify deleted pages. So, if the first member was deleted, neither method would catch that.

        Loading editor
    • Do members even get removed from fandom projects though? I thought edits were listed forever, even if banned.

      I'm reviewing the instructions now, thanksĀ :)

      Do you know if this would still work if a category is removed from a page though?

      I'm wondering if we could somehow test this here using a test category.... maybe at the sandbox?

      I'm kind of confused by the timestamp though. Why would it be in the year 2010 when the Help category was created in 2005 by Angela? Surely people were adding pages to the category between 2005-2009 prior to 2010...

      One thing I did notice also was that "title="Community Central:Babel" output from your 2nd command will also give the title of the page which corresponds to the page ids= number, which can help with searching a user's history to find the diff associated with the output.

      That said... 03-06 means March 6th and when I check March 2010 I don't see any edit on March 6th to Community Central: Babel.

      Is there maybe some kind of weird error going on with rvstart?

      22 May 2005 looks like when this actually got added...

        Loading editor
    • "Member" means category member. So page.

        Loading editor
    • Ah okay, so this would mean that if a category was first added to a page which is no longer up, it would cause an error and return nothing at all, rather than just finding the earliest date-added among existing pages?

      That could explain why I'm not getting a result trying this for some categories.

        Loading editor
    • I am a bit confused now. Are we talking about projects or categories; the two are different. My answer was geared at categories and, as Tupka217 already clarified, "member" means page. From what I can gather, that is where most of the confusion has come from.

      Adding pages to categories is not considered an edit to the category; it is an edit to the page. One reason is that you can use a category before creating its page. Using api.php in the manner I suggested retrieves a list of current members. If a page was removed from a category, it will not show in the query results. In this case, the method I presented will not give you accurate results. Instead, it will give you the earliest among those still in the category rather than the absolute earliest. I imagine (but have not confirmed) that re-adding a page is effectively the same as not re-adding it as far was the results of the query are concerned.

      Adding a page to a category is part of that page's edit history and, as far as I know, is not tracked anywhere else. That is why Fngplg suggested searching the database dump. By doing that, you can inspect every single revision and determine which was first to use the category. The downside of that is that your database could be large and would require offline data processing (i.e. you can't do it in your web browser). What I have proposed can be done quickly in browser but is limited to finding the earliest among current members. Both methods are incapable of accounting for deleted pages.

      Does that make more sense?

      Which categories returned nothing?

        Loading editor
    • What is "rvstart"?

        Loading editor
    • rvstart - From which revision timestamp to start enumeration (enum)

        Loading editor
    • Andrew, I tried this code for this category

      The category was present in the first version of an article so I'm trying to figure out if JJB was the first to introduce the category in Feb 2013, or if there was some earlier edit which introduced it.

      Do you think it's possible that if the category was "added" to a page in its very first version that this might be what is preventing it from returning in the code? Like maybe this command only returns when a category is added in subsequent edits but something about being present from the very first version prevents the data from returning?

        Loading editor
    • As I said in my earlier replies, using api.php is limited to searching current members. That category currently has no members which is why it isn't finding anything.

        Loading editor
    • Ah okay. I noticed someone added it to a page recently so I tried the code out again and it got a result this time:

      cm pageid="3820" timestamp="2019-12-05T19:50:25Z"

      Which I used to construct 2nd code:

      title="Toy Chica"
      user="AdrianC385"

      So this is useful to at least see when the earliest CURRENT member of a category was added, but I guess there's still no way of finding the earliest EVER member, in respect to acknowledging pages which were previously in a category but were later removed from it.

        Loading editor
    • As I mentioned, Fngplg's solution will help get you closer to the ideal case. It would allow you to find pages that are no longer members. However, it is likely to require more of an effort with regards to searching. Either way, the result you still not be guaranteed absolutely earliest because neither method accounts for deleted pages. There is a way you can, but it would be in addition to one of the two other methods.

      In short, the more accurate you want to result to be, the more effort has to go into the process. You can get the absolute earliest but that would require a lot of searching. You can put in minimal effort but that would only tell you about current members.

        Loading editor
    • I guess I'm wondering what besides notepad I could use to search a full-history dump. Plus I'm not sure how to sort them by revision ID, that's not how it's organized by default?

        Loading editor
    • Tycio#19
      what besides notepad I could use to search a full-history dump

      i do use far (file and archive manager): it has sane view tool, that doesn't load whole file into ram.
      Tycio#19
      that's not how it's organized by default

      nope. they organized by page id. page1 can have revs 1, 3 and 5, page2 can have revs 2 and 4; if the category added by revisions 2 (1st) and 5 (2nd), then, in the dump, revision 5 (page1) will be earlier than rev 2 (page2).
        Loading editor
    • Depending on the size of the dump, you can open it in a browser and use the browser's "find" feature to locate the category in the revisions.

        Loading editor
    • A FANDOM user
        Loading editor
Give Kudos to this message
You've given this message Kudos!
See who gave Kudos to this message
Community content is available under CC-BY-SA unless otherwise noted.