---------------------------------------------------------------------- Proem ----- Author's email: Please understand this README before using ba_sim. Read the INSTALL file for build instructions. Read David Petrou's PhD dissertation for an explanation of why ba_sim and its tools exist and what David learned through them. The dissertation is `Cluster scheduling for explicitly-speculative tasks,' Electrical and Computer Engineering, Carnegie Mellon University, CMU-PDL-04-112. The dissertation and an archive of this software may be found off of . This documents the `ba_sim' simulator executable. Also included in this archive are Perl scripts for processing ba_sim output. They are not documented. If there is interest, I can provide a description of each source file used to form ba_sim along with descriptions of the Perl scripts. ---------------------------------------------------------------------- Terminology ----------- When an idea is hatched, initial terminology and organization often does not well reflect the idea. There are mismatches of terms and concepts between the simulator and dissertation models because I have not taken the time to realign the implementation to my increasingly better described model. These mismatches do not affect results because there is a mapping between the implemented simulator model and the described simulator model in the dissertation. In the simulator: batch users disclose task sets and request tasks when needed; when making a scheduling decision, the simulator combines the disclosed and requested queues to present candidate tasks to a single policy interactive users disclose task sets and request tasks when needed; the simulator only presents candidate tasks from the requested queue to a single policy batchactive users disclose task sets and request tasks when needed; the simulator presents the queues to two policies; if the requested task queue is not empty, the requested queue policy sees its tasks, else, the disclosed queue policy sees tasks from the disclosed queue all users cancel task sets when outputs suggest no need to continue a task's visible response time is calculated as when the task executes minus when the task was requested (with negative times set to 0). In the dissertation: batch users request task sets; the scheduler presents these tasks to a single policy; at some point, unknown to the scheduler, the users need tasks interactive users request tasks when needed. the scheduler presents these tasks to a single policy batchactive users disclose task sets and request tasks when needed; the simulator presents the queues to two policies as above all users cancel task sets when outputs suggest no need to continue a task's visible response time is calculated as when the task executes minus when the task was needed (with negative times set to 0). The description in the dissertation is better because no `disclosed' action exists in existing, non-speculative schedulers which is where users behave in a batch or interactive manner. The simulation implementation is isomorphic with the above mapping, and is also simpler to implement. Another difference between the implementation and the textual dissertation description is that when a task is `finished,' the implementation removes it from the executed list whereas the dissertation keeps it in the executed list to keep certain task state transition diagrams and descriptions simpler. The `harness' script reference by the dissertation is actually called `ba_harness.pl'. The `present' script is actually called `ba_present.pl'. The `improvement' script `ba_analyze.pl.' The three database tables in MySQL are named poorly. I list the tables, their function, and what they should be named. analyze: the improvement factor between proposed and baseline user behaviors and scheduling policies; better named `improvement' output: ba_sim output; better named `simulation' trace: parameters determining a particular simulation run; better named `parameters' (the use of `trace' makes no sense and has nothing to do with the ability of ba_sim to take trace input or generate trace output) ---------------------------------------------------------------------- Options ------- Run ba_sim with the `-h' option to get a list of options. Some listed options have been disabled deep in the source. Thus output from the `-h' option is superseded by information in this file. The disabled options set simulator behaviors not used for thesis results, and thus may not have been adequately tested. `Disabled' means that the simulator may die with a meaningful error message on startup, or that the simulator may run and output erroneous values, if an attempt to use these options is made. The trace input and trace output options are turned off. The simulator mode of running until a certain number of users are spawned according to an arrival process is turned off. Heeding the specified maximum number of requested and disclosed tasks under this simulation mode is turned off. ---------------------------------------------------------------------- Scheduling policies ------------------- The following are the supported simulator policies for either the requested or disclosed queues. The second set are only for the disclosed queue when simulating batchactive scheduling. fcfs: fcfs ufb: user-fb srpt: srpt rfcfs: requested-fcfs urfb: user-requested-fb hrp: highest request probability hrr: highest requested resources The following policies are implemented but not described in the dissertation and have not been recently tested: rsrpt: requested-srpt gsvsd: greatest visible slowdown rand: random task randu: random user ---------------------------------------------------------------------- Metrics ------- The simulator outputs metrics. Some metrics are scalers. Others are sets of five comma separated numbers. In order, they represent mean, total, min, max, standard deviation, and count (# of samples). The exceptions are the requested and disclosed candidates queue metrics. They consist of two numbers each: the average queue length and the average weight (amount of work) in the queue. The 5-tuples consist of samples one per user except for visible response time and visible slowdown, whose samples are per-task. For example, for `all resource usage,' each user on completion reports all the resources it uses. If there are 10 users, there will be 10 samples, and the mean `all resource usage' will be the total reported by the users divided by 10. To track 25% and 75% quantile statistics, set USE_QUANT in ba_stats.c to 1. In the following list, the name on the left of the colon is the metric as reported in the dissertation. On the right of the colon is its name as output by the simulator. The word in parentheses names the appropriate part of the 5-tuple. mean visible response time: visible response time (mean) mean visible slowdown: visible slowdown (mean) visible task throughput: finished tasks (total) variance of visible response time: visible response time (std. dev.) deadlines met: deadlines met (total) variance of user requested resource usage: billable resources (std. dev.) mean scaled billed resources: depends on user behavior when simulating interactive or batchactive users, this metric should be ignored, because the mean scaled billed resources is always 1 (the simulator is not smart enough to report this metric as 1); when simulating batch users, the metric is scaled resource usage (mean) load: load requested load: depends on user behavior for interactive and batch users, it is load; for batchactive users, it is requested load. uncharged load: depends on user behavior for interactive and batch, it's 0; for batchactive users, it's load - requested load. # of scheduler decisions: see below on turning on an option to enable this metric Additional metrics are reported that are not considered in the dissertation. I do not document them. Beware in using them: some refer to disclosed and possibly eventually requested tasks while others refer to disclosed and never requested tasks. ---------------------------------------------------------------------- Warmup period ------------- Simulator metrics are take time to converge to steady-state according to the warmup period documented in the dissertation. This warmup period is set by the constant warmup_abs_time in ba_report.c. In this release, this warmup period is 2 days. When the macro (defined in ba_sim.h) BA_SIM_INSPECT is 1, no warmup time is elided. When BA_SIM_INSPECT is 0, the warmup time held is warmup_abs_time is elided. See below for the main use of BA_SIM_INSPECT. Only the relevant reported metrics in the dissertation heed this warmup period. At least the following metrics are tabulated only after this warmup period expires. Metrics not listed probably track the entire length of the simulation. Yes, it would be nice if ba_sim output reflected which metrics did and did not heed the warmup period. requested candidates queue disclosed candidates queue all resource usage requested resource usage disclosed only resource usage scaled resource usage billable resources load requested load requested load ignoring idle time finished tasks visible response time visible slowdown deadlines met ---------------------------------------------------------------------- Watch out for uncomputed metrics -------------------------------- Certain simulator output metrics are conditionally not computed to save memory during simulation. These metrics were not needed by my thesis. To compute these metrics, set SAVE_MEMORY in ba_report.c to 0. To not compute these metrics, set this macro to 1. The metrics affected by this macro are: non-work conserving time task sets alive time no task sets alive time tasks ready time The internal variables in ba_report.c that hold these values are: non_work_conserving_delta_time task_set_alive_delta_time no_task_sets_delta_time task_set_ready_tasks_delta_time which are computed from task_set_alive_periods, task_set_ready_tasks_periods, no_task_sets_periods, and non_work_conserving_periods. ---------------------------------------------------------------------- Adding metrics -------------- In the dissertation I argue why visible response time, which reflects task output need, is more important than response time, which reflects when users submit tasks to non-speculative schedulers in an attempt to minimize the response times they actually experience. This is why I advocate and report visible response time. If one wishes to know the traditional response time of tasks, the simulator can be extended as follows: for batch and batchactive users, response time for a task begins when the task set in which it resides is submitted. for interactive users, response time for a task begins when that task is requested. ---------------------------------------------------------------------- Miscellaneous controls ---------------------- Certain code assertions are turned off to increase simulation speed. To turn them on, set THOROUGH_CHECKS to 1 in ba_user.c To output statistics concerning the number of scheduling decisions and how long they took to make, set COUNT_SCHEDULER_DECISIONS in ba_sched.c to 1. To output the number of tasks in queue at each point in time, set one of the following macros in ba_sched.c to 1: PRINT_NUM_ALL_CANDIDATES, PRINT_NUM_REQ_CANDIDATES, or PRINT_NUM_DIS_CANDIDATES. The output can be fed to qlen_to_graph.pl. To output the major scheduling events used to generate the simulator verification in the dissertation, set BA_SIM_INSPECT to 1 in ba_sim.h. ---------------------------------------------------------------------- [eof]