Forum:Site performance issues and improvements

Hi all,

We wanted to give you a brief update regarding some of the work Wikia is doing to increase site stability and performance. As mentioned here, there are a number of initiatives underway.

Here are some issues were working on:
 * ISP Replacement: The Internet connection in our San Jose, CA data center was unstable and had resulted in downtime of the site. Failing over to our new data center in Iowa will not be reliable until it is 100% complete. In response to the downtime of our San Jose ISP, we added a new connection to Internap and relegated the problematic ISP to serve as a backup connection.
 * Database Servers: As our traffic has grown, our database servers have occasionally been overwhelmed. While the traffic is great, MediaWiki doesn't handle the situation gracefully (i.e. using a less-busy database server), so the result is downtime or slowness. These database issues have also caused Wikia Stats and database dumps to fall out of date.  To resolve these issues, we are upgrading our existing servers (adding additional memory among other things), and installing a new Virident database server.
 * Cache Servers: Recently we have had a few situations when our cache servers ran out of memory. While this does not result in downtime, it can occasionally cause blank pages to show up rather than the desired content, and sometimes changes to a wiki don't show up immediately. This is especially apparent when updating site CSS or Javascript. To prevent this, we are increasing the capacity of our Varnish servers, and we created a patch for Varnish which significantly improves performance.
 * Creating Redundancies: We are putting the finishing touches on our new Iowa data center and expect it to be fully online within the next three weeks. We are already sending traffic (about 200Mb/sec. of 500Mb/sec. total) through Iowa to decrease page load times, but these final steps will create complete redundancy in our infrastructure and eliminate the remaining single points of failure that can cause downtime.
 * Page Load Times: In addition to the above improvements, much of our engineering team will be spending the next few months optimizing both the MediaWiki and extension back-end code, as well as the Javascript that is executed in the browser on each page load. The combined impact of greater code efficiency should result in a noticeable reduction in page load times.

Some of the projects above may result in brief site issues. Please bear with us while we create a little dust. As always, don't hesitate to let us know if you think something is amiss, but also know that we're likely to already be working on a fix. --KyleH (talk) 16:51, 14 April 2009 (UTC)