Thursday 17 May 2012

Server maintenance part 2 - technical fix


My last post whined about server shutdowns.  My game time is interrupted enough by real life; so I don't want the server to have it's own interruptions.

'Come back in a few minutes' message at least leaves anticipation.  You are still in the real world and can do real world things. 'Good bye' in the middle of a gaming session throws you out of your immersive experience, throwing you back into the real world; often ending your day's play on a sour note.



On a conceptual level - Eve shutdowns do more than just annoy us.  They really are the equivalent of a maintenance for your car.  It's annoying to be without your car, and it costs time and money to do.  CCP (or any other game company) would not have server shutdowns if there was an easy way around it.



I am simplifying concepts when I can.  There will be specifics that I get wrong; but it is the big picture that I am looking for.  I am showing I 'get' the problems; establishing my credentials to show I also 'get' solutions.  Simplifications include but are not limited to:
  • Program includes procedure, function, and system calls.
  • Computer resources includes memory, disk space, semaphores.
  • Server includes physical machines, virtual machines, and services.

The problem


Programs use computer resources (mostly memory but others too).  They grab resources when they want to do something, and are meant to release them when finished.   However not every program is as well written as you would like - and those resources are not always released when a program finishes.  Then there are the programs that crash part way through, never even having the opportunity to clean up.  These are very easy to clean up as part of a restart; and very hard to clean up at any other time.

We all want to get our data quickly.  Your data might be your skill list, your inventory list, a market, or even how much damage your weapons do.  These are stored in databases (or structured lists of items).  Now imagine you have a notepad listing all your inventory.  At first you add the data in a date received order, and cross out anything you sell or destroy.  After a while it gets painful to find your widget x, so you copy it to an alphabetical list and leave blank space for new inventory.  Then even that gets painful.  Now imagine you are managing this list for not just your inventory but for every player, every mail, inventory, skill, market in the game.  This is the the database problemIndexes speed the search but insertions and deletions cause trouble.  It is easiest to clean up an index when no-one is using the system.


Frequently, the server software changes.  It is much easier to install new software when no-one else is using it.  Sometimes the client software changes.  Having servers to cope with two (sometimes incompatible) client versions is a pain.  Server shutdowns allow a clean break between old and new, making things simpler.

A solution


There are ways to do all of the above without shutting a server down regularly.  Nuclear power plants, life support equipment all do so.  Generally this involves having a complete copy of your system, and switching your software over so quickly that the end user doesn't see anything more than a blip.  Of course you do want your game fees to double don't you?  No?  Then live with shutdowns.

But you can have staggered shutdowns; not all the servers shutdown all at the same time.  With the strategy below - you will always be able to do something on one of your characters, even if the total disruption time is longer.  It is about ensuring you can do most activities in most locations (almost) all of the time. (The default current situation is that you can do all activities in all locations most of the time).

There will be servers to process:
  • User login
  • In game mail - both sending and receiving
  • In game chat
  • Markets (we already know these are divided by region)
  • Inventory (probably has live characters active, with offline characters stored away).
  • System movement and combat (we already know these are system based, and can be dynamically allocated according to system load).
Where a game client update is required - then everyone is booted out at normal maintenance times.  (There are ways to work around this requirement but lets get the easier to fix issues done first)

Restart each region individually.   Some regions go down early and come back up early, most go down normally and come back up normally.  Some regions go down late and come back up late.
  • You can not move into a region that is down (or possibly is about to go down).  
  • If you are active in a region that is down - then you get disconnected from that character but not from the character selection screen.
  • You can only work a market for a region that is up.



There would still be a maximum single session login time.  Currently this is a little less than 24 hours; and could remain.  There will be (at least) 2 inventory servers. When you login, and when you change regions your current inventory/stats/standings etc (for most purposes 'you') are allocated to an inventory server. Offline character data maintenance can be done whenever.


While the mail servers are down, you can not send a mail and you won't get new mail.  However it may be possible to allow you to see existing mail (depending on the mail implementation).

Split the chat software up into multiple parts; local; player to player; and general.  Local chat would retart during regional server restarts.  Player to player and general chat channels would restart on different timers.
When the login server is down - you obviously can't login , but would have no other impact.  If there was one server you would make high availability (i.e. extra redundancy) this would be it.


As far as I know, one or more system movement and combat servers are allocated to systems.  These could be restarted when the relevant region is restarted.


The player experience.

The ideal player experience is one where a player does not have to deal with server shutdowns.  The next best thing is one where you can do what you want 'somewhere' even if you can't do it everywhere whenever you want.

With the outline above, apart from when the actual login authentication servers are down, users will be able to logon to the game and play somewhere (unless they have all 3 alts in one the same game timezone).  When a shutdown is about to start, instead of being told that you are about to be booted, instead you are told that this region is about to be rebooted and be told where you can play instead. 


I would recommend that at least one newbie system has a different restart time than the others; Newbie retention is the hardest part of the game, and you would much prefer it to be as available as possible.  Newbies would be 'suggested' by the game to start in timezones that fit their preferences (either implicit by when they login or explicit as part of account creation)

When newbies and veterans of differing experience learn what parts of the universe are available at different times, they will gravitate towards the areas that are up when they are, minimizing their gaming disruption.

Game availability - while not entirely zero-sum is still a winners and losers game.  Given all other things remain equal; for me to have access to a patch of the universe when I am awake probably means that others do not have access to that same patch when they are.

Fortunately Eve lends itself to player determination very well.  What timezones get access to what parts of the universe is a topic for another post.

3 comments:

  1. The problem with using several servers for different tasks (mail, chat, regions) is problematic in a competitive game where it would allow meta-gaming.

    Some examples:
    when local chat is down, you don't see ships entering the system. Perfect time for surprise attack.

    Your enemy lives in region X and has a roaming party or mining fleet out of that region? Hit them when their home region is down so they can't call for reinforcements.

    Character data is unavailable? Time for scamming using false identity.

    The current system means that all is operational or no play is available.

    ReplyDelete
  2. First of all, multiple 'servers' (or at least services) are already used. Ever login immediately after shutdown and seen that the AH is down? I have.

    I split chat up in the above exercise to cover some of those concerns, in part to always offer 2 out of 3 chat options. However I am not worried about chat servers needing long reboots. IRC servers (functionally equivalent to an Eve chat channel) have uptimes of years with no problems.

    If players are desperate for reliable communications there are plenty of out of game options.

    In game problems including worried about being ganked? Covered in part 3 of this series of posts.

    In terms of impersonation; you still must be authenticated to get in. No login server means no entry. Once authenticated by the server, that authentication sticks with your character for the duration of your session.

    ReplyDelete
  3. Just to let you know that CCP have made a commitment to eradicating their need for Downtimes. It wasn't too long ago that daily DT was an hour long, now it is down to 10 minutes usually.

    I agree that if you are in an aussie-type tz it can't be pleasant, but be aware that it won't last forever and one day we'll be DT-free

    ReplyDelete

Posts older than 14 days are subject to moderation before being published. I do so sporadically. If you have a question regarding older posts, also evemail dotoo foo.

Blogger comments supports basic html. You can make a link 'clicky' by <a href="http://yoursite/yourpage">yoursite/yourpage</a>

While I currently accept anonymous users, please include a pseudonym. I get confused answering anonymous.

If the word verification is preventing you from adding a comment, please evemail DoToo Foo for alternative methods