It's become clear to both me and Danny recently that the usage patterns of Foundry GMs are significantly different. This is a short article that explains a little bit about the way Foundry hosting works and why some (a minority, < 5%) users have problems with Worldmill hosting and other users have no problems.
Programs need resources: Disk Space, Network Capacity, Memory and CPU Power. To keep things short, I'm going to focus on memory, but the others are also important. When Foundry is executing it needs memory. All hosting environments have limited memory resource and you'll see articles in different places on the web (reddit mainly) which indicate that it's entirely possible to run a Foundry instance in 100Mb of memory. This is true. As long as: you don't have large maps, don't have video, don't use audio, don't have very many maps (or actors) or lots of chat, etc... The more of these things you have, the more memory your hosting environment must provide to accommodate them.
Additionally, when Foundry starts executing, there is a memory spike that forces memory consumption to temporarily move far above "Steady State" (the phrase used to represent stable resource consumption). To give you some idea, steady state of 400Mb may yield a start-up spike of ~4GB before settling down.
What does this mean and why should Foundry users of hosting providers care? Well, the economics of hosting are fairly challenging. If Worldmill were to try and provide 4GB of memory on permanent reservation for all customers, then the minimum hosting fee would be about $15/month. That's just to break even.
We provide a very generous steady state allowance for users. I don't want to document it publicly, because it may change, but it is a high figure. We allow memory to spike to about 2x this figure. However, there are many things which could happen that are causing some mills to exceed that limit. When this happens our cluster controller automatically reacts to protect other mills by shutting down the offending mill. Clearly this is a very sad experience for the users of the mill which is shut down, since it happens without warning and sometimes results in the mill being unable to restart. It's also sad for me and Danny, since we don't want to have disappointed customers and it's a lot of work for us to resolve these issues.
We are investigating ways to address the "can't restart after forced shutdown" problem and hope to start rolling out solutions to that soon. The problem of using too much resource in the first place is a problem that's beyond our control. That's only going to be fixed by improvements in the way Foundry uses memory (efficiency improvements) and a general recognition amongst Foundry users that hosting providers aren't providing a bottomless well of resources (consumption reduction).
To be clear: this isn't just a Worldmill problem. All hosting providers have finite resources, although of course the depth of each hosting providers' pockets will vary according to the technology they are using.
Because Foundry is so new, there isn't a lot of guidance available on how to translate resource usage to memory consumption. Anecdotally, I know that too many scenes and too many actors or too much chat or module bugs are the main causes of problems, but it's very hard to be concrete about how many is too many. I'm going to spend some time trying to capture data about resource consumption across our user base and try to establish causal links from patterns of usage to problems caused. I'll document this (probably here) in the hope that some kind of guidance will help Foundry GMs when creating their worlds.
The Foundry hosting community is fairly small and I'm hopeful we can work together to provide good guidance for these kinds of problems. I know the Foundry developers are also happy to work with hosting providers to identify issues and improve performance where possible.
Worldmill has many happy customers, so I guess most of them can ignore this blog post. For those of you experiencing problems who haven't reached out to us, then let us know. We want to make things better and are actively working to make improvements guided by those people who have issues. The more representative that group is, the better our solutions are likely to be.