Help talk:Bots

Where to get a Bot?
It'd be nice if it would also have some links on where/how to get a decent working bot code. --Light Daxter 19:25, October 2, 2009 (UTC)


 * There are two main bots for use on wikis - AutoWikiBrowser and pywikipediabot. AWB is probably the easiest to work with. Beware compatibility issues though. Kirkburn (talk) 15:15, October 5, 2009 (UTC)

Read-only bots
Does a bot that only downloads many pages at once need a bot flag? --WGH, 22:28, October 30, 2010 (UTC)
 * No. But the bot flag makes this a bit more efficient (e.g. normal user can get 500 pages per each request via API, with 'bot' flag it can get 5000). 23:23, October 30, 2010 (UTC)

Limited editing rate?
The page says that a bot should use a limited editing rate, which is perfectly fine. However, the word limited is very subjective. What would be the limit, in terms of edit per minute, or request per minute? Right now i'm working with 60 requests per minute (1 per second) but I was wondering if I could speed up the process, or if it was still too heavy.

Basically, the bot fetches all the pages from a category and its sub-categories, builds a list, and update the list if data has been updated. The bot runs once a day at midnight (EST) (I assume the traffic is lower at that time). See the bot page.

Thanks

Hunter789 19:02, June 13, 2011 (UTC)


 * As a side note, I have not asked to flag the bot as a bot yet since it is still in development. Hunter789 19:05, June 13, 2011 (UTC)

Answer by wikia staff
This was answered by uberfuzzy on the community forums. No exact rules, but more like general helpful guidelines.

As you said on that talk page, your edit rate of 1/sec is a good baseline to start. If its running late night/offpeak times, you could probably let it run faster without too much problems. It also depends on what you are doing. Provided your not hammering us non-stop with 100s of requests a second for solid blocks of minutes, you probably wont even show up on Ops' logs/radar (though the chain reaction of internal mediawiki actions from updating categories/redlinks/etc might, but only if you are doing something very wrong, or very right ;P). I dont think you grasp the amount of edits/actions/pageviews does every second already, a few more from a bot on one wiki wont matter in the large scale of things for the brief period of time.

Most bots I run I generally dont put a throttle in, and let it just go at the pace that the server can handle the actions. Generally, most edit actions I've noticed take about 1sec from open connection, send, response, close. When I do that, I usually put some sort of loop logic in, so that every X things, it does a sleep for Y seconds (usually something 100/1 or 1000/2), just to give a little pause/break for my connection and the server. (I also tend to use that window to save some sort of continue data, so that if it does crash, I can resume from a known point, not at the top of the list.) Remember that doing an edit is fairly "cheap" to do, provided theres not dozens of templates that have to update, but moving/deleting pages may be a little more "expensive" and take longer to process, so when doing these, may want to look at adding delays/throttles, just to give a possible chance for things to update.

Something to look at, is look in the api docs, for the "maxlag" param stuff. You can actually set up your bot to detect when our servers are reporting they are loaded up, and to slow down for a while. This rarely happens on Wikia (and usually not that often on wikipedia anymore either), because of how our databases are now spread across multiple clusters, and have lots caches infront of the actual servers. You should also try to detect when your saves/actions are failing, and if you fail a whole bunch, or get a bunch of "cant resolve host" type errors, to stop, because it usually means the servers are having a fit, and are not available. I'd say about 80% of the code you are going to write to make a good bot will be error handling. There is only one "correct" condition, but there are dozens of ways it can fail.

Something else to remember is that bandwidth is cheap and large, but connections to server are finite. Once your flagged a bot, your account will be able to grab more data in a single request (500's turn to 5000's as an example), so try to do more per request when you are getting data.

--uberfuzzy, lover of bots and api's 08:51, June 15, 2011 (UTC)

Hunter789 14:28, June 16, 2011 (UTC)