RELEASE NOTES FOR SLURM VERSION 2.0
11 February 2009 (after SLURM 1.4.0-pre8 released)


IMPORTANT NOTE:
SLURM state files in version 2.0 are different from those of version 1.3.
After installing SLURM version 2.0, plan to restart without preserving 
jobs or other state information. While SLURM version 1.3 is still running, 
cancel all pending and running jobs (e.g.
"scancel --state=pending; scancel --state=running"). Then stop and restart 
daemons with the "-c" option or use "/etc/init.d/slurm startclean".

If using the slurmdbd (SLURM DataBase Daemon) you must update this first.  
The 2.0 slurmdbd will work with SLURM daemons at version 1.3.7 and above.  
You will not need to update all clusters at the same time, but it is very 
important to update slurmdbd first and having it running before updating 
any other clusters making use of it.  No real harm will come from updating 
your systems before the slurmdbd, but they will not talk to each other 
until you do.

There are substantial changes in the slurm.conf configuration file. It 
is recommended that you rebuild your configuration file using the tool
doc/html/configurator.html that comes with the distribution.

SLURM can continue to be used as a simple resource manager, but optional
plugins support sophisticated scheduling algorithms. These plugins do require 
the use of a database containing user and bank account information, so 
more administration work is required. SLURM's modular design lets you 
control the functionality that you want it to provide.

HIGHLIGHTS
* Sophisticated scheduling algorithms are available in a new plugin. Jobs
  can be prioritized based upon their age, size and/or fair-share resource 
  allocation using hierarchical bank accounts. For more information see:
  https://computing.llnl.gov/linux/slurm/job_priority.html
* An assortment of resource limits can be imposed upon individual users 
  and/or hierarchical bank accounts such as maximum job time limit, maximum 
  job size and maximum number of running jobs. For more information see:
  https://computing.llnl.gov/linux/slurm/resource_limits.html
* Advanced reservations can be made to insure resources will be available when
  needed. For more information see:
  https://computing.llnl.gov/linux/slurm/reservations.html
* Idle nodes can now be completely powered down when idle and automatically
  restarted when there is work available. For more information see:
  https://computing.llnl.gov/linux/slurm/power_save.html
* SLURM has been modified to allocate specific cores to jobs and job steps in
  the centralized scheduler rather than the daemons running on the individual
  compute nodes. This permits effective preemption or gang schedule jobs.
* New configuration parameters, PrologSlurmctld and EpilogSlurmctld, can be 
  used to support the booting of different operating systems for each job. 
  See "man slurm.conf" for details. 
* Preemption of jobs from lower priority partitions in order to execute jobs
  in higher priority partitions is now supported. The jobs from the lower 
  priority partition will resume once preempting job completes. For more 
  information see:
  https://computing.llnl.gov/linux/slurm/preempt.html
* Added support for optimized resource allocation with respect to network
  topology. Requires switch configuration information be added to slurm.conf.
* Support added for Sun Constellation system with optimized resource allocation
  for a 3-dimensional torus interconnect. For more information see:
  https://computing.llnl.gov/linux/slurm/sun_const.html
* Support added for IBM BlueGene/P systems, including High Throughput Computing
  (HTC) mode.
* Support for checkpoint/restart using BLCR added using the checkpoint/blcr
  plugin. For more information see:
  https://computing.llnl.gov/linux/slurm/checkpoint_blcr.html
  https://ftg.lbl.gov/CheckpointRestart/CheckpointRestart.shtml

CONFIGURATION FILE CHANGES (see "man slurm.conf" for details)
* The default AuthType is now "auth/munge" rather than "auth/none".
* The default CryptoType is now "crypto/munge". OpenSSL is no longer required
  by SLURM in the default configuration.
* DefaultTime has been added to specify a default job time limit in the 
  partition. If not set, uses the partition's MaxTime.
* PrologSlurmctld has been added and can be used to boot nodes into a 
  particular state for each job.
* DefMemPerTask has been removed. Use DefMemPerCPU or DefMemPerNode instead.
* KillOnBadExit added to immediately terminate a job step whenever any tasks
  terminates with a non-zero exit code.
* Added new node state of "FUTURE". These node records are created in SLURM
  tables for future use without a reboot of the SLURM daemons, but are not
  reported by any SLURM commands or APIs.
* BatchStartTime has been added to control how long to wait for a batch job
  to start (complete Prolog, load environment for Moab, etc.).
* CompleteTime has been added to control how long to wait for a job's 
  completion before allocating already released resources to pending jobs.
* OverTimeLimit added to permit jobs to exceed their (soft) time limit by a
  configurable amount. Backfill scheduling will be based upon the soft time
  limit.
* For select/cons_res or sched/gang only: Each nodes processor count must be
  specified in the configuration file. Additional resources found by SLURM
  daemons on the compute nodes will not be used.
* DebugFlags added to provide detailed logging for specific subsystems.
* Added job priority plugin.  Default for PriorityType is "priority/basic" 
  which is the same logic SLURM has today (job priorities are assigned at
  submit time with decreasing value).  "priority/multifactor" is a new plugin 
  which utilizes logic to set a priority on a job based on many different 
  configuration parameters as described here:  
  https://computing.llnl.gov/linux/slurm/job_priority.html
* The task/affinity plugin will automatically bind a job step to the CPUs
  it has been allocated. The entity bound to (sockets, cores or threads)
  will be automatically set based upon the allocation size and task count
  SLURM's SPANK cpuset plugin is no longer be needed.
* Resource allocations can now be optimized according to network topology.
  The following switch topology configuration options have been added: 
  TopologyPlugin and in a new topology.conf file: SwitchName, Nodes, 
  Switches. More information is available in man pages for slurm.conf, 
  topology.conf, and https://computing.llnl.gov/linux/slurm/topology.html
* SrunIOTimeout has been added to optionally ping srun's tasks for better 
  fault tolerance (e.g. killed and restarteed SLURM daemons on compute node).
* ResumeDelay added to control how much time after a node has been suspended
  before resume it (e.g. powering it back up).
* BLUEGENE - Added option DenyPassthrough in the bluegene.conf.  Can be set
  to any combination of X,Y,Z to not allow passthroughs when running in 
  dynamic layout mode. (see "man bluegene.conf" for details)

COMMAND CHANGES (see man pages for details)
* --task-mem and --job-mem options have been removed from salloc, sbatch and
  srun. Use --mem-per-cpu or --mem instead.
* Added the srun option --preserve-env to pass the current values of 
  environment variables SLURM_NNODES and SLURM_NPROCS through to the 
  executable, rather than computing them from commandline parameters.
* --ctrl-comm-ifhn-addr option has been removed from the srun command (it is 
  no longer useful).
* Batch jobs have an environment variable SLURM_RESTART_COUNT set when 
  restarted.
* To create a partition using the scontrol command, use the "create" command
  rather than "update" with a new partition name.
* Time format of all SLURM command set to ISO 8601 (yyyy-mm-ddThh:mm:ss)
  unless the configure option "--disable-iso8601" is used at build time.
* sacct -S to status a job will no longer work.  Use sstat from now on.
* sacct --nodes option can be used to filter jobs by allocated node.
* sacct default starttime is midnight of the previous day rather than the
  start of the database.
* sacct and sstat have been rewritten to have a more sacctmgr like feel
* Added the sprio command to view the factors that comprise a job's scheduling
  priority - works only with the priority/multifactor plugin.

ACCOUNTING CHANGES
* Added ability for slurmdbd to archive and purge step and/or job records.
* Added support for Workload Characterization Key (WCKey) in accounting 
  records. This is an optional string that can be used to identify the type of
  work being performed (in addition to user ID, account name, job name, etc.).
* Added configuration parameter AccountingStorageBackupHost for fault-tolerance
  in communications to SlurmDBD.

OTHER CHANGES
* Modify PMI_Get_clique_ranks() to return an array of integers rather
  than a char * to satisfy PMI standard. Correct logic in
  PMI_Get_clique_size() for when srun --overcommit option is used.
* Set "/proc/self/oom_adj" for slurmd and slurmstepd daemons based upon
  the values of SLURMD_OOM_ADJ and SLURMSTEPD_OOM_ADJ environment
  variables. This can be used to prevent daemons being killed when
  a node's memory is exhausted.
