User blog:TimmyQuivy/Examining How Fandom Used Generative AI in 2023

There was no bigger topic in technology than Generative Artificial Intelligence (Generative AI, or even GenAI for short) in 2023. Generative AI really burst onto the scene around this time last year with OpenAI's introduction of ChatGPT, a basic chat interface that allows you to ask complex questions into their large-language model and get back a really detailed, and often creative, response. Since then, we've seen it used, for better or worse in a variety of ways across the internet.

We're not going to explain Generative AI at length in today's blog. If you're just hearing about Generative AI, chances are your name is Steve Rogers and you have a lot more to catch up on than just this topic. Here's a solid explainer on the basics of GenAI: https://www.techrepublic.com/article/what-is-generative-ai/.

What we're instead going to talk about is how we're using Generative AI at Fandom - how we've already used it, what's being actively developed, and where we see this continuing in the future.

Our Philosophy Regarding Generative AI
Understandably, Generative AI was a topic we wanted to talk to our core creators about during Community Connect 2023. Our CTO, Adil Ajmal, gave an hour long presentation where he walked over our high level ideas for GenAI.

At the heart of his presentation was a core commitment to the community - enablement, not replacement.

By that we meant that any GenAI tools we were developing and researching would be used to either better understand and highlight user-generated content (UGC) or otherwise be toolsets that aid you in creating that content more efficiently. Conversely, we were not looking at any tools that would be used to write full-fledged content articles or otherwise do the core community building or writing that wiki editors do.

We continue to believe that this is the right philosophy for Fandom. First and foremost, the authenticity and subject knowledge of editors like you make wiki content qualitatively better. Part of the appeal of a Fandom article is the subtle ways readers can pick up on the genuine, invested knowledge editors are imparting in their edits.

Second, this is in line with the expectations and desires of our editors. We've been monitoring editor sentiment towards AI throughout the year. Through surveys, research groups, Community Connect, and other conversations with the community, we've seen an openness or at least cautious interest to the idea of using some level of AI on Fandom. That is, of course, with some core conditions - such as the original data for the model being ethically sourced, which has been a large concern in image generation.

Thus far, our developments have been conservative, in line with the commitments that Adil talked about at Connect. So let's look at precisely how we've done that this year.

Generative AI in Image Review
Moderating images is a much trickier proposition than moderating text. If I type an inappropriate word into a wiki, you are able to see it and revert it because there's only one pattern a word can take (a certain, specific sequence of letters). We also have tools that catch that word and can prevent it from being saved, like a local AbuseFilter and global Phalanx content filters staff add.

But pictures can have infinite permutations of a concept - only if you physically look at the picture can you understand what is being presented. This is why, as a moderation action, we have for years manually reviewed all images uploaded to Fandom. Just like there can be vandalous text edits, there can be inappropriate images added to a wiki (porn, gore, nudity) - and it is important to us to maintain a safe, appropriate-for-all-ages-and-backgrounds network.

Now, due to advances in technology, AI has practically caught up to this need. Using models containing thousands of images where someone has manually defined what is an image, AI is now able to say "I know that Image A is an image about X, I determine Image B very closely resembles Image A, so therefore I can say with Y% certainty that Image B is also an image about X". The more you reinforce that model - explicitly telling it "you're right"/"you're wrong" - the more accurate that model becomes. Earlier this year, Fandom partnered with a great company called CoActive to begin to replace the manual review with AI review. This process was completed in September. What CoActive does is programmatically review every single image uploaded to Fandom and give us a score on a scale of 0-100 across multiple concepts (i.e. there are separate checks seeing how likely it is something contains porn, versus gore, versus hate speech). If the image scores very highly for a certain one of those concepts, it gets auto-deleted. If the image scores at a very low level for ALL the concepts (i.e. CoActive believes it contains none of those bad things), then it gets approved.

Right now, CoActive is helping Fandom auto-approve or auto-reject about 90% of our uploads. The other 10% - images where some scores are moderately high - still get reviewed, and then we use that to teach the model "this was actually bad" or "this was actually good". Thus, over time, we will get nearer and nearer to 100% auto-review.

I do want to emphasize again, since this isn't something we've talked much about before, that we have been manually reviewing all images uploaded to the platform for about ten years now to ensure that they are in line with our Terms of Use and that we can reduce exposure to harmful images. By switching to automated review, we're able to not only continue that same function uninterrupted but do so in a way that reduces the number of bad images that our own Community staff team has to look at.

So why switch from manual to automated? We felt AI review would be better than manual review because:


 * It can detect and remove an inappropriate image within milliseconds of being uploaded, preventing anyone from having to see it. Auto-removal is a major change, so we're slowly ramping functionality up to ensure maximum accuracy..
 * Admins, staff, and other moderators do not have to deal with the extra time (plus mental fatigue) of dealing with a particularly nasty troll.
 * It frees up countless hours of staff time to handle less mundane tasks
 * There are certain things humans still struggle with understanding or detecting - like seeing a word of hate speech in an image containing lots of text. Once AI learns something, it continues to get better at detecting that.

So if you were to take anything away from this section, my hope is that it's that we are using this AI moderation solution in a way that allows us to keep your wikis safe from bad images, lowering the impact on our most vulnerable users, and lowering the impact on our moderators.

Generative AI in Understanding Page Content
While words are easily understood by technology, another area it's been "catching up" in for years has been understanding meaning. If I said "Baby Yoda's actual name is 'Din Grogu'", you as the reader can quickly understand the meaning (Unless you are, of course, Steve Rogers and are still, for some reason, reading this blog. Start with Star Wars on Episode IV. Yes, I know that doesn't make a lot of sense either.).

Technology has lagged that basic understanding until relatively recently. Google and other search engines for years would really just be serving you search results purely on whether or not the page matched your input (i.e. "Baby Yoda"). Relatively recently, search engines have begun to understand that basic premise. So if you google "What is Baby Yoda's real name?", Google now knows to show you a top search result where that is explicitly answered.

One area this is still lagging though is in tables. Pretty much every wiki has the core info about the article subject in a table, which, because it's not written in sentence form, search engines mostly can't read still. They literally ingest the text as "Name Din Grogu" versus "The name of this character is Din Grogu". What Google can easily do is understand table data when it's presented in a specific form Google can read.

This is where GenerativeAI is coming in. What we are doing is ingesting infobox content, using AI to understand what each field means, and then generate a JSON file that Google can crawl and understand. Understanding what each field in a table is sounds easy, but let's consider an example that explains why AI is so useful.

(Fun fact: if you've been on the platform for a while and remember the rollout of Portable Infoboxes many years ago, you may remember that it was a foray into structured data. This, finally, is one of the opportunities we're working on to pay that off after all this time.)

Wookieepedia tells me Din Grogu was born in 41 BBY. "Born" is the terminology they use. But your wiki may use - and these are just examples off the top of my head - Date of birth, birthdate, birthday, Year of birth, Birthyear, Birth year, Birth day, … You get the picture. Large-language models know that these are all the same concept - so when we send Google the information it needs, it's putting all of those terms into the singular lexicon that Google uses (Schema.org concepts). Because Google now knows that our page contains this information - whereas it may not if it was looking purely at the long-form article content - your pages rank higher in search results. Higher rankings lead to more visitors reading and enjoying your content.

AI generation of these JSON files have occurred at various points throughout the year. We have seen meaningful boosts of visits to wikis with these files versus wikis that have not yet gone through this process. And because this is a backend invisible change, there is nothing that has affected any editing process for a user and no extra steps are required for our editors.

Generative AI in Quick Answers
We first discussed Quick Answers in this blog in September, when Brandon posted a full blog outlining the tool and what had been done to that point. I highly recommend reading that blog in full, but we're going to give you a fast recap to catch you up:

Quick Answers is a product that reformats long-form information from wiki pages into a Q&A (questions & answers) format that would appear as a module on the wiki page it's based on. These modules are directly relevant to, and answer questions about, the subject from the page.

The advantage of this approach is that the format allows for additional and easy search indexing from Google, in a structure that Google finds really informative and likely to rank well.

Over the summer, we began experimenting with Generative AI models, via a GenAI partner, to leverage information from Fandom pages to create an initial set of answers to questions that were pulled from related Google searches - the theory being that we would use your own expert content to pre-seed the questions to save the time and effort communities would need to get this project going. The questions and answers that were live during the experiment were reviewed by Fandom staff for accuracy, and then posted once they were reviewed.

Unfortunately, when we did the next level of scaling (about 6500 character pages), the accuracy of the AI answers decreased significantly. We quickly pulled the modules back and decided to rethink our approach.

This was a good learning for us, as testing and learning is ultimately part of feature development. We tried something, we learned from it (both what went right and what went wrong), we adapted, and we tried something else. And we're continuing to fine-tune the model to work on improving it even further, finding the right balance of generation efficiency and accuracy.

Now that we have the model in place, we are currently finishing our already-announced tool that will allow community moderators to help review and curate the answers. 12 wikis are beta testing that new curation dashboard and their reaction has been generally positive. When we get back from the holiday break, we'll walk you more through this new Quick Answers experience. But to highlight: We are giving admins a powerful new tool to help readers find answers efficiently while also increasing SEO, and it should launch fully early into the new year.

AI is helping lower the need by communities to get this feature up and running on their site, but at the end of the day, communities ultimately have the power over what goes up on their site via the dashboard. We'll talk more about the dashboard when beta testing of it concludes next month, and we're adding a seven-day "vetting" period to make the manual review less hurried. AI is here to aid, not replace.

Next Steps
We've walked you through the three biggest things GenerativeAI did on Fandom this year. Two of these tools - ImageReview and JSON table structures - are tools that merely help us understand your manually generated content. The other - Quick Answers - is an aid to help you get a beneficial new content display up and running without creating undue burden on your wiki editors. The "enablement, not replacement" philosophy will continue to be our philosophy in 2024.

While each product development team at Fandom are still working on their plans for 2024, we do see the broad strokes of how GenerativeAI could be used on Fandom.

First, we want to continue to use GenAI to better help us structure the backend of a page, like we did with the JSON files on infoboxes. The more we turn tables and other graphical data into machine-readable information, the more we can then reuse that information in novel and useful ways. There are ways of displaying information right now that are difficult - if you've ever tried to code out a multi-branch family tree on a wiki for instance - that become much more doable with tools once we know what data goes where.

Second, we see the opportunity for admin tools to be benefitted with AI. Much like we have a probability score for whether or not an image should be removed from our network, we have also played around with whether or not we can present similar scores for text edits. Imagine a world where, as an admin, I see a filter on Recent Changes that presents to me edits that need to be reviewed based on a risk score. This is a tangible example of the kind of admin-facing tool we foresee as being obtainable and scalable in the near future.

We're not sure everything that "hit it big" in 2023 is here to stay (I'm rooting for you Taylor and Travis!) But GenerativeAI certainly is, mostly because there are real-world opportunities that GenAI has already unlocked. Our challenge as a company is to use it wisely - a complement to the tremendous community that already exists and edits here. We hope that you see our deliberate approach as a benefit to your communities, and hope you will continue to come alongside us and test out these new and exciting ideas in the years to come.