Running Flash apps 24/7

[UPDATE: I presented this alongside with Matt Pollitt at the London Flash Platform User Group, thanks to everyone that came down to the talk. So sorry that I couldn't stay for longer, but had a train to catch. If you have any doubts or want to say something, just leave a comment and I'll get in touch. Video is now available, go check it out if you missed it.]

We are developing at ustwo a very interesting internal little project that needs to run 24/7. By default developers should always be very careful with memory leaks and hogging resources, but with an AIR/Flash app running non-stop every detail counts.

To top it up, the app must handle images, SWFs (AS2, AS3) and videos without going through developers. That means marketing can contact one of the designers, ask s/he to produce someething and put it to display without developer approval. Oh, and all assets are in the network, not in the machine. Scary.

Preliminary tests show we are doing OK, we've been running the app non-stop for 4 days without a crash or needing it to reboot. The player takes and releases memory when it needs to, transitions and animations look as smooth as the first day, etc.

Check out below some of our thoughts.

FAIL, FAIL, FAIL

Shit will happen, so be ready to fail. Assets won't load, video will have the wrong codec, image will be 80mb, network will be down....

That's Google's approach to servers. Instead of having a big fat multimillion dollar server, they have a farm of 100 mini-servers and they *know* one or two *will* crash everyday. They've build their systems to cope with failure and fail gracefully.

Also embed some default content for when everything else fails.

RELEASE, RELEASE, RELEASE

You must release resources. Explicitly remove every single listener you add, use FP 10 unloadAndStop and if you are in AIR don't hold references to File objects (more about this in another post).

If you are a good dev, the Flash player will behave most of the times...

MONITOR, MONITOR, MONITOR

...but sometimes it won't. Or something you didn't even think was possible will happen in-your-face. You can't prevent all errors, but you can react when they happen.

At the beginning we were trusting our own app to be its own watchdog, but then we got a nasty crash and obviously the watchdog failed as well. So we've created a little haXe/Neko app called "the helper" that is automatically called every minute and that checks the state of our app.

At the moment, the helper is being automatically launched using launchd (fancy cron job for OS X), but that has very interesting side effects (again, more on another post).

The helper checks if our app is running using something like:

[code lang="bash"]ps -x -o comm[/code]

And parsing the result for a relevant match. If the app is not running, it starts it again using:

[code lang="bash"]open /path/to/our/app[/code]

Since we are using AIR 2 beta, we could implement a more refined communication system using the new Native Process API, but for the time being checking if the app is running "the hard way" is enough.

UPDATE, UPDATE, UPDATE

Find a good mechanism to auto-update your app. Most 24/7 apps are unattended apps and they might well be in a different room / building / city / country. But even if you are just a few steps away, you still want to be able to remotely deploy a new version.

We decided to implement our own update system based on GIT commits instead of relying on AIR's update framework.

LOG, LOG, LOG

Log as much as you can. You'll be happy you did when you need to find out why your app crashed. Believe me, in this case more is more, even if you face huge log files. If you are a little bit consistent you can always build automatic parsing tools for your logs or just run a simple grep to find error traces.

BTW, we settled for a log file per day to avoid massive log files.

WARN, WARN, WARN

As I said before, most of these apps run unattended, so you want to be notified as soon as possible when there's an error. You can:

* Use a private Twitter account that people subscribe to.
* Send an email. You can do that from Flash using for example SMTP Mailer and a local SMTP server.

Whatever you do, don't rely on one method only, specially Twitter, in case is down when the app needs it.

REMOTE, REMOTE, REMOTE

You either enable VNC, SSH or both. If you can't, then be prepared for walking / driving / taking the bus a lot.

--

I'll leave the details of running your apps using launchd and the peculiar way in which AIR holds "hard" references to files for another posts.

Back to index