A method is disclosed to prevent an information system from becoming overloaded in the presence of too many demands by appropriate use of resource, user, and task weights to reduce system load via selective cancellation of requests.
Method to prevent system crashes by selective load reduction
Disclosed is a method to add intelligence to an operating environment to prevent an electronic system from becoming overloaded with information processing demands. Rather than allowing exhaustion of resources and inevitable system failure (or practical failure by slowdown beyond usable limits), a daemon (not limited to an operating system process) would look for conditions leading to overload and kill some processes (cancel demands) to prevent failure. Furthermore, the daemon has sufficient information to determine with some accuracy which processes (
jobs or requests) are more valuable
jobs with a long CPU time and high memory usage are the
highest value jobs. Adjustments should be made for resources accrued by "runaway
jobs" assuming there is a means for detecting these jobs (see below). Optionally, the
user can input a user weight and task weight. The user weight specifies how important the job is to the user relative to other jobs. This is a way the user can identify a critical
job from a less critical job. The task weight is assigned on a per project basis to specify
which projects are more important then others, e.g., to show project A has priority over project B. If no user weight or task weight is assigned then a default value (e.g., 1) will be assigned. The daemon has an interface available for the user to set and change the "user weight" and for the administrator to set the "task weight" as by commands or settings in the environment. An example equation for job score is:
Job score = resource weight x user weight x task weight, where, for example, resource weight = real memory + CPU time
When an overload condition exists, the daemon will begin to kill jobs periodically until the overload condition no longer exists. The least valued jobs are killed first. Batch
jobs running under a distributed processing facility are re-queued. Notification is sent to
the user who submitted a job that was killed. No new user jobs are allowed to start while in an overload condition. The operating...