Twister2

Twister2

  • Getting Started
  • Docs
  • Tutorial
  • AI
  • Examples
  • Contribute
  • Download
  • Configurations
  • Java Docs
  • GitHub
  • Blog

Twister2 Configurations

Common configurations

Checkpoint Configurations

NameDefaultDescription
twister2.checkpointing.enablefalseEnable or disable checkpointing
twister2.checkpointing.storeedu.iu.dsc.tws.checkpointing.stores.LocalFileStateStoreThe implementation of the store to be used
twister2.checkpointing.store.fs.dir${TWISTER2_HOME}/persistent/Root directory of local file system based store
twister2.checkpointing.store.hdfs.dir/twister2/persistent/Root directory of hdfs based store
twister2.checkpointing.source.frequency1000Source triggering frequency

Data Configurations

NameDefaultDescription
twister2.data.hadoop.home${HADOOP_HOME}
twister2.data.hdfs.config.directory${HADOOP_HOME}/etc/hadoop/core-site.xml
twister2.data.hdfs.data.directory${HADOOP_HOME}/etc/hadoop/hdfs-site.xml
twister2.data.hdfs.namenodenamenode.domain.name
twister2.data.hdfs.namenode.port9000
twister2.data.fs.root${TWISTER2_HOME}/persistent/data
twister2.data.hdfs.root/twister2/persistent/data

Network Configurations

NameDefaultDescription
twister2.network.buffer.size1024000the buffer size to be used
twister2.network.sendBuffer.count4number of send buffers to be used
twister2.network.receiveBuffer.count4number of receive buffers to be used
twister2.network.channel.pending.size2048channel pending messages
twister2.network.send.pending.max4the send pending messages
twister2.network.message.group.low_water_mark8group up to 8 ~ 16 messages
twister2.network.message.group.high_water_mark16this is the max number of messages to group
twister2.network.message.grouping.size10in batch partition operations, this value will be used to create mini batches
within partial receivers
twister2.network.ops.persistent.dirs["${TWISTER2_HOME}/persistent/"]For disk based operations, this directory list will be used to persist incoming messages.
This can be used to balance the load between multiple devices, by specifying directory locations
from different devices.
twister2.network.shuffle.memory.bytes.max102400000the maximum amount of bytes kept in memory for operations that goes to disk
twister2.network.shuffle.memory.records.max102400000the maximum number of records kept in memory for operations that goes to dist
twister2.network.shuffle.file.bytes.max10000000size of the shuffle file (10MB default)
twister2.network.shuffle.parallel.io2no of parallel IO operations permitted
twister2.network.alltoall.algorithm.batchringthe partitioning algorithm

NameDefaultDescription
twister2.network.buffer.size.stream.reduce1024000the buffer size to be used
twister2.network.sendBuffer.count.stream.reduce2number of send buffers to be used
twister2.network.receiveBuffer.count.strea.reduce2number of receive buffers to be used
twister2.network.buffer.size.stream.gather1024000the buffer size to be used
twister2.network.sendBuffer.count.stream.gather2number of send buffers to be used
twister2.network.receiveBuffer.count.stream.gather2number of receive buffers to be used
twister2.network.buffer.size.stream.bcast1024000the buffer size to be used
twister2.network.sendBuffer.count.stream.bcast2number of send buffers to be used
twister2.network.receiveBuffer.count.stream.bcast2number of receive buffers to be used
twister2.network.alltoall.algorithm.stream.partitionsimplethe partitioning algorithm
twister2.network.buffer.size.stream.partition1024000the buffer size to be used
twister2.network.sendBuffer.count.stream.partition4number of send buffers to be used
twister2.network.receiveBuffer.count.stream.partition4number of receive buffers to be used

NameDefaultDescription
twister2.network.buffer.size.batch.reduce1024000the buffer size to be used
twister2.network.sendBuffer.count.batch.reduce2number of send buffers to be used
twister2.network.receiveBuffer.count.batch.reduce2number of receive buffers to be used
twister2.network.buffer.size.batch.gather1024000the buffer size to be used
twister2.network.sendBuffer.count.batch.gather2number of send buffers to be used
twister2.network.receiveBuffer.count.batch.gather2number of receive buffers to be used
twister2.network.buffer.size.batch.bcast1024000the buffer size to be used
twister2.network.sendBuffer.count.batch.bcast2number of send buffers to be used
twister2.network.receiveBuffer.count.batch.bcast2number of receive buffers to be used
twister2.network.alltoall.algorithm.batch.partitionsimplethe partitioning algorithm
twister2.network.buffer.size.batch.partition1024000the buffer size to be used
twister2.network.sendBuffer.count.batch.partition4number of send buffers to be used
twister2.network.receiveBuffer.count.batch.partition4number of receive buffers to be used
twister2.network.alltoall.algorithm.batch.keyed_gathersimplethe partitioning algorithm
ttwister2.network.partition.ring.group.workers.batch.keyed_gather2ring group worker
twister2.network.buffer.size.batch.keyed_gather1024000the buffer size to be used
twister2.network.sendBuffer.count.batch.keyed_gather4number of send buffers to be used
twister2.network.receiveBuffer.count.batch.keyed_gather4number of receive buffers to be used
twister2.network.message.group.low_water_mark.batch.keyed_gather8000group up to 8 ~ 16 messages
twister2.network.message.group.high_water_mark.batch.keyed_gather16000this is the max number of messages to group
twister2.network.message.grouping.size.batch.keyed_gather10000in batch partition operations, this value will be used to create mini batches
within partial receivers
twister2.network.alltoall.algorithm.batch.keyed_reducesimplethe partitioning algorithm
ttwister2.network.partition.ring.group.workers.batch.keyed_reduce2ring group worker
twister2.network.buffer.size.batch.keyed_reduce1024000the buffer size to be used
twister2.network.sendBuffer.count.batch.keyed_reduce4number of send buffers to be used
twister2.network.receiveBuffer.count.batch.keyed_reduce4number of receive buffers to be used
twister2.python.port5400port offset for python-java connection.
port+workerId will be used by each worker to communicate with python process
port-1 will be used by client process for the initial communication

Resource Configurations

NameDefaultDescription
twister2.client.debug'-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5006'use this property to debug the client submitting the job
twister2.resource.systempackage.copyfalseWeather we have a requirement to copy the system package

ZooKeeper related config parameters

NameDefaultDescription
twister2.zookeeper.based.group.managementtrueZooKeeper can be used to exchange job status data and discovery
Workers can discover one another through ZooKeeper
They update their status on ZooKeeper
Dashboard can get job events through ZooKeeper
If fault tolerance is enabled, ZooKeeper is used, irrespective of this parameter
#twister2.resource.zookeeper.server.addressesip:portwhen conf/kubernetes/deployment/zookeeper-wo-persistence.yaml is used
following service name can be used as zk address
Options
  • 127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002
twister2.zookeeper.root.node.path/twister2the root node path of this job on ZooKeeper
the default is "/twister2"

Core Configurations

User name that will be used in JobID
JobID is constructed as:
[username]-[jobName]-[timestamp]
if username is specified here, we use this value.
Otherwise we get username from shell environment.
if the username is longer than 9 characters, we use first 9 characters of it
long value of timestamp is converted to alphanumeric string format
timestamp long value = current time - 01/01/2019
twister2.user.name:

Twister2 Job Master related settings

NameDefaultDescription
twister2.job.master.usedtrue
twister2.job.master.runs.in.clienttrueif true, the job master runs in the submitting client
if false, job master runs as a separate process in the cluster
by default, it is true
when the job master runs in the submitting client, this client has to be submitting the job from a machine in the cluster
twister2.worker.to.job.master.response.wait.duration100000

WorkerController related config parameters

NameDefaultDescription
twister2.worker.controller.max.wait.time.for.all.workers.to.join100000amount of timeout for all workers to join the job
in milli seconds
twister2.worker.controller.max.wait.time.on.barrier100000amount of timeout on barriers for all workers to arrive
in milli seconds

Common thread pool config parameters

NameDefaultDescription
twister2.common.thread.pool.threads2Maximum number of threads to spawn on demand
twister2.common.thread.pool.keepalive10maximum time that excess idle threads will wait for new tasks before terminating
twister.python.binpython3path to python binary

Dashboard related settings

NameDefaultDescription
twister2.dashboard.hosthttp://localhost:8080Dashboard server host address and port
if this parameter is not specified, then job master will not try to connect to Dashboard

Task Configurations

Task Scheduler Related Configurations

NameDefaultDescription
twister2.taskscheduler.streamingroundrobinTask scheduling mode for the streaming jobs "roundrobin" or "firstfit" or "datalocalityaware" or "userdefined"
twister2.taskscheduler.streaming.classedu.iu.dsc.tws.tsched.streaming.roundrobin.RoundRobinTaskSchedulerTask Scheduler class for the round robin streaming task scheduler
#twister2.taskscheduler.streaming.classedu.iu.dsc.tws.tsched.streaming.datalocalityaware.DataLocalityStreamingTaskSchedulerTask Scheduler for the Data Locality Aware Streaming Task Scheduler
#twister2.taskscheduler.streaming.classedu.iu.dsc.tws.tsched.streaming.firstfit.FirstFitStreamingTaskSchedulerTask Scheduler for the FirstFit Streaming Task Scheduler
#twister2.taskscheduler.streaming.classedu.iu.dsc.tws.tsched.userdefined.UserDefinedTaskSchedulerTask Scheduler for the userDefined Streaming Task Scheduler
twister2.taskscheduler.batchbatchschedulerTask scheduling mode for the batch jobs "roundrobin" or "datalocalityaware" or "userdefined"
#twister2.taskscheduler.batch.classedu.iu.dsc.tws.tsched.batch.roundrobin.RoundRobinBatchTaskSchedulerTask Scheduler class for the round robin batch task scheduler
twister2.taskscheduler.batch.classedu.iu.dsc.tws.tsched.batch.batchscheduler.BatchTaskSchedulerTask Scheduler class for the batch task scheduler
#twister2.taskscheduler.batch.classedu.iu.dsc.tws.tsched.batch.datalocalityaware.DataLocalityBatchTaskSchedulerTask Scheduler for the Data Locality Aware Batch Task Scheduler
#twister2.taskscheduler.batch.classedu.iu.dsc.tws.tsched.userdefined.UserDefinedTaskSchedulerTask Scheduler for the userDefined Batch Task Scheduler
twister2.taskscheduler.task.instances2Number of task instances to be allocated to each worker/container
twister2.taskscheduler.task.instance.ram512.0Ram value to be allocated to each task instance
twister2.taskscheduler.task.instance.disk500.0Disk value to be allocated to each task instance
twister2.taskscheduler.task.instance.cpu2.0CPU value to be allocated to each task instance
twister2.taskscheduler.container.instance.ram4096.0Default Container Instance Values
Ram value to be allocated to each container
twister2.taskscheduler.container.instance.disk8000.0Disk value to be allocated to each container
twister2.taskscheduler.container.instance.cpu16.0CPU value to be allocated to each container
twister2.taskscheduler.ram.padding.container2.0Default Container Padding Values
Default padding value of the ram to be allocated to each container
twister2.taskscheduler.disk.padding.container12.0Default padding value of the disk to be allocated to each container
twister2.taskscheduler.cpu.padding.container1.0CPU padding value to be allocated to each container
twister2.taskscheduler.container.padding.percentage2Percentage value to be allocated to each container
twister2.taskscheduler.container.instance.bandwidth100 #MbpsStatic Default Network parameters
Bandwidth value to be allocated to each container instance for datalocality scheduling
twister2.taskscheduler.container.instance.latency0.002 #MillisecondsLatency value to be allocated to each container instance for datalocality scheduling
twister2.taskscheduler.datanode.instance.bandwidth200 #MbpsBandwidth to be allocated to each datanode instance for datalocality scheduling
twister2.taskscheduler.datanode.instance.latency0.01 #MillisecondsLatency value to be allocated to each datanode instance for datalocality scheduling
twister2.taskscheduler.task.parallelism2Prallelism value to each task instance
twister2.taskscheduler.task.typestreamingTask type to each submitted job by default it is "streaming" job.
twister2.exector.worker.threads1number of threads per worker
twister2.executor.batch.nameedu.iu.dsc.tws.executor.threading.BatchSharingExecutor2name of the batch executor
twister2.exector.instance.queue.low.watermark10000number of tuples executed at a single pass
twister2.executor.stream.nameedu.iu.dsc.tws.executor.threading.StreamingSharingExecutorname of the streaming executor
twister2.executor.stream.nameedu.iu.dsc.tws.executor.threading.StreamingAllSharingExecutor

Standalone configurations

Checkpoint Configurations

No specific configurations

Data Configurations

No specific configurations

Network Configurations

NameDefaultDescription
twister2.network.channel.classedu.iu.dsc.tws.comms.mpi.TWSMPIChannel

Resource Configurations

NameDefaultDescription
twister2.resource.scheduler.mpi.working.directory${HOME}/.twister2/jobsworking directory
twsiter2.resource.scheduler.mpi.modestandalonemode of the mpi scheduler
twister2.resource.scheduler.mpi.job.idthe job id file
twister2.resource.scheduler.mpi.shell.scriptmpi.shslurm script to run
twister2.resource.scheduler.mpi.homethe mpirun command location
twister2.resource.system.package.uri${TWISTER2_DIST}/twister2-core-0.7.0.tar.gzthe package uri
twister2.resource.class.launcheredu.iu.dsc.tws.rsched.schedulers.standalone.MPILauncherthe launcher class
twister2.resource.scheduler.mpi.mpirun.fileompi/bin/mpirunmpi run file, this assumes a mpirun that is shipped with the product
change this to just mpirun if you are using a system wide installation of OpenMPI
or complete path of OpenMPI in case you have something custom
twister2.resource.scheduler.mpi.mapbynodempi scheduling policy. Two possible options are node and slot.
read more at https://www.open-mpi.org/faq/?category=running#mpirun-scheduling
twister2.resource.scheduler.mpi.mapby.use-pefalseuse mpi map-by modifier PE. If this option is enabled, cpu count of compute resource
specified in job definition will be taken into consideration
twister2.resource.sharedfstrueIndicates whether bootstrap process needs to be run and distribute job file and core
between nodes. Twister2 assumes job file is accessible to all nodes if this property is set
to true, else it will run the bootstrap process
twister2.resource.fs.mount${TWISTER2_HOME}/persistent/fs/Directory for file system volume mount
twister2.resource.uploader.directory${HOME}/.twister2/repositorythe uploader directory
twister2.resource.class.uploaderedu.iu.dsc.tws.rsched.uploaders.localfs.LocalFileSystemUploaderthe uplaoder class

Core Configurations

No specific configurations

Task Configurations

No specific configurations

Slurm configurations

Checkpoint Configurations

No specific configurations

Data Configurations

No specific configurations

Network Configurations

NameDefaultDescription
twister2.network.channel.classedu.iu.dsc.tws.comms.mpi.TWSMPIChannel

Resource Configurations

NameDefaultDescription
twister2.resource.scheduler.mpi.working.directory${HOME}/.twister2/jobsworking directory
twsiter2.resource.scheduler.mpi.modeslurmmode of the mpi scheduler
twister2.resource.scheduler.mpi.job.idthe job id file
twister2.resource.scheduler.mpi.shell.scriptmpi.shslurm script to run
twister2.resource.scheduler.slurm.partitionjulietslurm partition
twister2.resource.scheduler.mpi.homethe mpirun command location
twister2.resource.system.package.uri${TWISTER2_DIST}/twister2-core-0.7.0.tar.gzthe package uri
twister2.resource.class.launcheredu.iu.dsc.tws.rsched.schedulers.standalone.MPILauncherthe launcher class
twister2.resource.scheduler.mpi.mpirun.filetwister2-core/ompi/bin/mpirunmpi run file, this assumes a mpirun that is shipped with the product
change this to just mpirun if you are using a system wide installation of OpenMPI
or complete path of OpenMPI in case you have something custom
twister2.resource.uploader.directory${HOME}/.twister2/repositorythe uploader directory
twister2.resource.class.uploaderedu.iu.dsc.tws.rsched.uploaders.localfs.LocalFileSystemUploaderthe uplaoder class

Core Configurations

WorkerController related config parameters

NameDefaultDescription
twister2.worker.controller.max.wait.time.for.all.workers.to.join100000amount of timeout for all workers to join the job
in milli seconds
twister2.worker.controller.max.wait.time.on.barrier100000amount of timeout on barriers for all workers to arrive
in milli seconds

Dashboard related settings

NameDefaultDescription
twister2.dashboard.hosthttp://localhost:8080Dashboard server host address and port
if this parameter is not specified, then job master will not try to connect to Dashboard

Task Configurations

No specific configurations

Aurora configurations

Checkpoint Configurations

No specific configurations

Data Configurations

No specific configurations

Network Configurations

NameDefaultDescription
twister2.network.channel.classedu.iu.dsc.tws.comms.tcp.TWSTCPChannel

Resource Configurations

NameDefaultDescription
twister2.resource.system.package.uri${TWISTER2_DIST}/twister2-core-0.7.0.tar.gzthe package uri
twister2.resource.class.launcheredu.iu.dsc.tws.rsched.schedulers.aurora.AuroraLauncherlauncher class for aurora submission
twister2.resource.class.uploaderedu.iu.dsc.tws.rsched.uploaders.scp.ScpUploaderthe uploader class
Options
  • edu.iu.dsc.tws.rsched.uploaders.NullUploader
  • edu.iu.dsc.tws.rsched.uploaders.localfs.LocalFileSystemUploader
twister2.resource.job.worker.classedu.iu.dsc.tws.examples.internal.rsched.BasicAuroraContainercontainer class to run in workers
twister2.resource.class.aurora.workeredu.iu.dsc.tws.rsched.schedulers.aurora.AuroraWorkerStarterthe Aurora worker class

Uploader configuration

NameDefaultDescription
twister2.resource.uploader.directory/root/.twister2/repository/the directory where the file will be uploaded, make sure the user has the necessary permissions
to upload the file here.
twister2.resource.uploader.scp.command.optionsThis is the scp command options that will be used by the uploader, this can be used to
specify custom options such as the location of ssh keys.
twister2.resource.uploader.scp.command.connectionroot@149.165.150.81The scp connection string sets the remote user name and host used by the uploader.
twister2.resource.uploader.ssh.command.optionsThe ssh command options that will be used when connecting to the uploading host to execute
command such as delete files, make directories.
twister2.resource.uploader.ssh.command.connectionroot@149.165.150.81The ssh connection string sets the remote user name and host used by the uploader.

Client configuration parameters for submission of twister2 jobs

NameDefaultDescription
twister2.resource.scheduler.aurora.script${TWISTER2_CONF}/twister2.auroraaurora python script to submit a job to Aurora Scheduler
its default value is defined as the following in the code
can be reset from this config file if desired
twister2.resource.scheduler.aurora.clusterexamplecluster name aurora scheduler runs in
twister2.resource.scheduler.aurora.rolewww-datarole in cluster
twister2.resource.scheduler.aurora.envdevelenvironment name
twister2.resource.job.namebasic-auroraaurora job name
Options
  • basic-aurora
twister2.resource.worker.cpu1.0number of cores for each worker
it is a floating point number
each worker can have fractional cores such as 0.5 cores or multiple cores as 2
default value is 1.0 core
twister2.resource.worker.ram200amount of memory for each worker in the job in mega bytes as integer
default value is 200 MB
twister2.resource.worker.disk1024amount of hard disk space on each worker in mega bytes
this only used when running twister2 in Aurora
default value is 1024 MB.
twister2.resource.worker.instances6number of worker instances

Core Configurations

No specific configurations

Task Configurations

No specific configurations

Kubernetes configurations

Checkpoint Configurations

No specific configurations

Data Configurations

No specific configurations

Network Configurations

OpenMPI settings

NameDefaultDescription
twister2.network.channel.classedu.iu.dsc.tws.comms.tcp.TWSTCPChannelIf the channel is set as TWSMPIChannel,
the job is started as OpenMPI job
Otherwise, it is a regular twister2 job. OpenMPI is not started in this case.
kubernetes.secret.nametwister2-openmpi-ssh-keyA Secret object must be present in Kubernetes master
Its name must be specified here

Worker port settings

NameDefaultDescription
kubernetes.worker.base.port9000the base port number workers will use internally to communicate with each other
when there are multiple workers in a pod, first worker will get this port number,
second worker will get the next port, and so on.
default value is 9000,
kubernetes.worker.transport.protocolTCPtransport protocol for the worker. TCP or UDP
by default, it is TCP
set if it is UDP

NodePort service parameters

NameDefaultDescription
kubernetes.node.port.service.requestedtrueif the job requests NodePort service, it must be true
NodePort service makes the workers accessible from external entities (outside of the cluster)
by default, its value is false
kubernetes.service.node.port30003if NodePort value is 0, it is automatically assigned a value
the user can request a specific port value in the NodePort range by setting the value below
by default Kubernetes uses the range 30000-32767 for NodePorts
Kubernetes admins can change this range

Resource Configurations

Kubernetes Docker Image and related settings
Twister2 Docker image for Kubernetes

NameDefaultDescription
twister2.resource.system.package.uri${TWISTER2_DIST}/twister2-core-0.7.0.tar.gzthe package uri
twister2.resource.class.launcheredu.iu.dsc.tws.rsched.schedulers.k8s.KubernetesLauncher

Kubernetes related settings
namespace to use in kubernetes
default value is "default"

NameDefaultDescription
kubernetes.image.pull.policyAlwaysimage pull policy, by default is IfNotPresent
it could also be Always
kubernetes.log.in.clienttrueget log messages to twister2 client and save in files
it is false by default
kubernetes.check.pods.reachabletruebefore connecting to other pods in the job,
check whether all pods are reachable from each pod
wait until all pods become reachable
when there are networking issues, pods may not be reachable immediately,
so this makes sure to wait before all pods become reachable
it is false by default

Job configuration parameters for submission of twister2 jobs
Jobs can be loaded from these configurations or
they can be specified programmatically by using Twister2JobBuilder

NameDefaultDescription
twister2.resource.job.namet2-jobtwister2 job name
# number of workers using this compute resourceinstances * workersPerPodA Twister2 job can have multiple sets of compute resources
instances shows the number of compute resources to be started with this specification
workersPerPod shows the number of workers on each pod in Kubernetes.
May be omitted in other clusters. default value is 1.
#- cpu0.5 # number of cores for each worker, may be fractional such as 0.5 or 2.4
Options
  • 1024 # ram for each worker as Mega bytes
  • 1.0 # volatile disk for each worker as Giga bytes
  • 2 # number of compute resource instances with this specification
  • false # only one ComputeResource can be scalable in a job
  • 1 # number of workers on each pod in Kubernetes. May be omitted in other clusters.
twister2.resource.job.driver.classedu.iu.dsc.tws.examples.internal.rsched.DriverExampledriver class to run
twister2.resource.job.worker.classedu.iu.dsc.tws.examples.basic.HelloWorldworker class to run
Options
  • edu.iu.dsc.tws.examples.internal.BasicNetworkTest
  • edu.iu.dsc.tws.examples.comms.batch.BReduceExample
  • edu.iu.dsc.tws.examples.internal.BasicNetworkTest
twister2.resource.worker.additional.ports["port1", "port2", "port3"]by default each worker has one port
additional ports can be requested for all workers in a job
please provide the requested port names as a list such as:

persistent volume related settings

NameDefaultDescription
twister2.resource.persistent.volume.per.worker0.0persistent volume size per worker in GB as double
default value is 0.0Gi
set this value to zero, if you have not persistent disk support
when this value is zero, twister2 will not try to set up persistent storage for this job
twister2.resource.kubernetes.persistent.storage.classtwister2-nfs-storagecluster admin should provide a storage provisioner.
please specify the storage class name that is used by the provisioner
Minikube has a default provisioner with storageClass of "standard"
Options
  • standard
twister2.resource.kubernetes.storage.access.modeReadWriteManypersistent storage access mode.
It shows the access mode for workers to access the shared persistent storage.
if it is "ReadWriteMany", many workers can read and write
https://kubernetes.io/docs/concepts/storage/persistent-volumes

K8sUploader Settings

NameDefaultDescription
twister2.resource.class.uploaderedu.iu.dsc.tws.rsched.uploaders.k8s.K8sUploader

When a job is submitted, the job package needs to be transferred to worker pods
K8sUploader provides two upload methods:
a) Transferring job package to job pods from submitting client directly
b) Transferring job package to web server pods.
Job pods download the job package from web server pods.
You need to deploy twister2 uploader pods for this to work.
We first check whether there is any uploader web server running,
if there is, we upload the job package to the uploader web server pods.
Otherwise, we upload the job package to job pods directly from submitting client.

NameDefaultDescription
it is by defaulthttp://twister2-uploader.default.svc.cluster.localuploader web server address
if you are using, twister2-uploader-wo-ps.yaml
no need to set this parameter, default one is ok
Options
  • http://twister2-uploader.default.svc.cluster.local
it is by default/usr/share/nginx/htmluploader web server directory
job package will be uploaded to this directory
if you are using, twister2-uploader-wo-ps.yaml
no need to set this parameter, default one is ok
Options
  • /usr/share/nginx/html
it is by defaultapp=twister2-uploaderuploader web server label
job package will be uploaded to the pods that have this label
if you are using, twister2-uploader-wo-ps.yaml
no need to set this parameter, default one is ok
Options
  • app=twister2-uploader

S3Uploader Settings
To use S3Uploader:
Uncomment uploader class below.
Specify bucket name
If your job will run more than 2 hours and it is fault tolerant, update link.expiration.duration

NameDefaultDescription
twister2.resource.class.uploaderedu.iu.dsc.tws.rsched.uploaders.s3.S3Uploader
twister2.s3.bucket.names3://[bucket-name]s3 bucket name to upload the job package
workers will download the job package from this location
twister2.s3.link.expiration.duration.sec7200job package link will be available this much time
by default, it is 2 hours

Node locations related settings

NameDefaultDescription
twister2.resource.kubernetes.node.locations.from.configfalseIf this parameter is set as true,
Twister2 will use the below lists for node locations:
kubernetes.datacenters.list
kubernetes.racks.list
Otherwise, it will try to get these information by querying Kubernetes Master
It will use below two labels when querying node locations
For this to work, submitting client has to have admin privileges
twister2.resource.rack.labey.keyrackrack label key for Kubernetes nodes in a cluster
each rack should have a unique label
all nodes in a rack should share this label
Twister2 workers can be scheduled by using these label values
Better data locality can be achieved
no default value is specified
twister2.resource.datacenter.labey.keydatacenterdata center label key
each data center should have a unique label
all nodes in a data center should share this label
Twister2 workers can be scheduled by using these label values
Better data locality can be achieved
no default value is specified
- echo['blue-rack', 'green-rack']Data center list with rack names
- green-rack['node11.ip', 'node12.ip', 'node13.ip']Rack list with node IPs in them

Kubernetes Mapping and Binding parameters

NameDefaultDescription
kubernetes.bind.worker.to.cputrueStatically bind workers to CPUs
Workers do not move from the CPU they are started during computation
twister2.cpu_per_container has to be an integer
by default, its value is false
kubernetes.worker.to.node.mappingtruekubernetes can map workers to nodes as specified by the user
default value is false
twister2.resource.kubernetes.worker.mapping.keykubernetes.io/hostnamethe label key on the nodes that will be used to map workers to nodes
twister2.resource.kubernetes.worker.mapping.operatorInoperator to use when mapping workers to nodes based on key value
Exists/DoesNotExist checks only the existence of the specified key in the node.
Ref https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
twister2.resource.kubernetes.worker.mapping.values['e012', 'e013']values for the mapping key
when the mapping operator is either Exists or DoesNotExist, values list must be empty.
Options
  • []
Valid valuesall-same-node, all-separate-nodes, noneuniform worker mapping
default value is none
Options
  • all-same-node

Core Configurations

Fault Tolerance configurations

NameDefaultDescription
twister2.fault.tolerantfalseA flag to enable/disable fault tolerance in Twister2
By default, it is disabled
twister2.fault.tolerance.failure.timeout10000a timeout value to determine whether a worker failed
If a worker does not send heartbeat messages for this duration in milli seconds,
It is assumed failed

Twister2 Job Master related settings

NameDefaultDescription
twister2.job.master.runs.in.clientfalseif true, the job master runs in the submitting client
if false, job master runs as a separate process in the cluster
by default, it is true
when the job master runs in the submitting client,
this client has to be submitting the job from a machine in the cluster
getLocalHost must return a reachable IP address to the job master
twister2.job.master.port11011twister2 job master port number
default value is 11011
twister2.worker.to.job.master.response.wait.duration10000worker to job master response wait time in milliseconds
this is for messages that wait for a response from the job master
default value is 10seconds = 10000
twister2.job.master.volatile.volume.size0.0twister2 job master volatile volume size in GB
default value is 1.0 Gi
if this value is 0, volatile volume is not setup for job master
twister2.job.master.persistent.volume.size0.0twister2 job master persistent volume size in GB
default value is 1.0 Gi
if this value is 0, persistent volume is not setup for job master
twister2.job.master.cpu0.2twister2 job master cpu request
default value is 0.2 percentage
twister2.job.master.ram1024twister2 job master RAM request in MB
default value is 1024 MB

WorkerController related config parameters

NameDefaultDescription
twister2.worker.controller.max.wait.time.for.all.workers.to.join100000amount of timeout for all workers to join the job
in milli seconds
twister2.worker.controller.max.wait.time.on.barrier100000amount of timeout on barriers for all workers to arrive
in milli seconds

Dashboard related settings

NameDefaultDescription
twister2.job.master.to.dashboard.connections3the number of http connections from job master to Twister2 Dashboard
default value is 3
for jobs with large number of workers, this can be set to higher number
twister2.dashboard.hosthttp://twister2-dashboard.default.svc.cluster.localDashboard server host address and port
if this parameter is not specified, then job master will not try to connect to Dashboard
if dashboard is running as a statefulset in the cluster

Task Configurations

No specific configurations

Mesos configurations

Checkpoint Configurations

No specific configurations

Data Configurations

No specific configurations

Network Configurations

NameDefaultDescription
twister2.network.channel.classedu.iu.dsc.tws.comms.tcp.TWSTCPChannel

Resource Configurations

NameDefaultDescription
twister2.resource.mesos.scheduler.working.directory~/.twister2/repository"#"${TWISTER2_DIST}/topologies/${CLUSTER}/${ROLE}/${TOPOLOGY}working directory for the topologies
twister2.resource.directory.core-package/root/.twister2/repository/twister2-core/
twister2.resource.directory.sandbox.java.home${JAVA_HOME}location of java - pick it up from shell environment
twister2.mesos.master.uri149.165.150.81:5050The URI of Mesos Master
twister2.resource.mesos.framework.nameTwister2 frameworkmesos framework name
twister2.resource.mesos.master.urizk://localhost:2181/mesos
twister2.resource.mesos.framework.staging.timeout.ms2000The maximum time in milliseconds waiting for MesosFramework got registered with Mesos Master
twister2.resource.mesos.scheduler.driver.stop.timeout.ms5000The maximum time in milliseconds waiting for Mesos Scheduler Driver to complete stop()
twister2.resource.mesos.native.library.path/usr/lib/mesos/0.28.1/lib/the path to load native mesos library
twister2.resource.system.package.uri${TWISTER2_DIST}/twister2-core-0.7.0.tar.gzthe core package uri
twister2.resource.mesos.overlay.network.namemesos-overlay
twister2.resource.mesos.docker.imagegurhangunduz/twister2-mesos:docker-mpi
twister2.resource.system.job.urihttp://localhost:8082/twister2/mesos/twister2-job.tar.gzthe job package uri for mesos agent to fetch.
For fetching http server must be running on mesos master
twister2.resource.class.launcheredu.iu.dsc.tws.rsched.schedulers.mesos.MesosLauncherlauncher class for mesos submission
twister2.resource.job.worker.classedu.iu.dsc.tws.examples.internal.comms.BroadcastCommunicationcontainer class to run in workers
twister2.resource.class.mesos.workeredu.iu.dsc.tws.rsched.schedulers.mesos.MesosWorkerthe Mesos worker class
twister2.resource.uploader.directory/var/www/html/twister2/mesos/the directory where the file will be uploaded, make sure the user has the necessary permissions
to upload the file here.
#twister2.resource.uploader.directory.repository/var/www/html/twister2/mesos/
twister2.resource.uploader.scp.command.options--chmod=+rwxThis is the scp command options that will be used by the uploader, this can be used to
specify custom options such as the location of ssh keys.
twister2.resource.uploader.scp.command.connectionroot@149.165.150.81The scp connection string sets the remote user name and host used by the uploader.
twister2.resource.uploader.ssh.command.optionsThe ssh command options that will be used when connecting to the uploading host to execute
command such as delete files, make directories.
twister2.resource.uploader.ssh.command.connectionroot@149.165.150.81The ssh connection string sets the remote user name and host used by the uploader.
twister2.resource.class.uploaderedu.iu.dsc.tws.rsched.uploaders.scp.ScpUploaderthe uploader class
Options
  • edu.iu.dsc.tws.rsched.uploaders.NullUploader
  • edu.iu.dsc.tws.rsched.uploaders.localfs.LocalFileSystemUploader
twister2.resource.uploader.download.methodHTTPthis is the method that workers use to download the core and job packages
it could be HTTP, HDFS, ..
twister2.resource.HTTP.fetch.urihttp://149.165.150.81:8082HTTP fetch uri

Client configuration parameters for submission of twister2 jobs

NameDefaultDescription
twister2.resource.scheduler.mesos.clusterexamplecluster name mesos scheduler runs in
twister2.resource.scheduler.mesos.rolewww-datarole in cluster
twister2.resource.scheduler.mesos.envdevelenvironment name
twister2.resource.job.namebasic-mesosmesos job name
# workersPerPod2 # number of workers on each pod in Kubernetes. May be omitted in other clusters.A Twister2 job can have multiple sets of compute resources
instances shows the number of compute resources to be started with this specification
workersPerPod shows the number of workers on each pod in Kubernetes.
May be omitted in other clusters. default value is 1.
instances4 # number of compute resource instances with this specification
Options
  • 2 # number of workers on each pod in Kubernetes. May be omitted in other clusters.
twister2.resource.worker.additional.ports["port1", "port2", "port3"]by default each worker has one port
additional ports can be requested for all workers in a job
please provide the requested port names as a list
twister2.resource.job.driver.classedu.iu.dsc.tws.examples.internal.rsched.DriverExampledriver class to run
twister2.resource.nfs.server.address149.165.150.81nfs server address
twister2.resource.nfs.server.path/nfs/shared-mesos/twister2nfs server path
twister2.resource.worker_port31000worker port
twister2.resource.desired_nodesalldesired nodes
twister2.resource.use_docker_containertrue
twister2.resource.rack.labey.keyrackrack label key for Mesos nodes in a cluster
each rack should have a unique label
all nodes in a rack should share this label
Twister2 workers can be scheduled by using these label values
Better data locality can be achieved
no default value is specified
twister2.resource.datacenter.labey.keydatacenterdata center label key
each data center should have a unique label
all nodes in a data center should share this label
Twister2 workers can be scheduled by using these label values
Better data locality can be achieved
no default value is specified
- echo['blue-rack', 'green-rack']Data center list with rack names
- blue-rack['10.0.0.40', '10.0.0.41', '10.0.0.42', '10.0.0.43', '10.0.0.44', ]Rack list with node IPs in them

Core Configurations

Twister2 Job Master related settings

NameDefaultDescription
twister2.job.master.runs.in.clientfalseif true, the job master runs in the submitting client
if false, job master runs as a separate process in the cluster
by default, it is true
when the job master runs in the submitting client, this client has to be submitting the job from a machine in the cluster
#twister2.job.master.port2023twister2 job master port number
default value is 11111
twister2.worker.to.job.master.response.wait.duration10000worker to job master response wait time in milliseconds
this is for messages that wait for a response from the job master
default value is 10seconds = 10000
twister2.job.master.volatile.volume.size1.0twister2 job master volatile volume size in GB
default value is 1.0 Gi
if this value is 0, volatile volume is not setup for job master
twister2.job.master.persistent.volume.size1.0twister2 job master persistent volume size in GB
default value is 1.0 Gi
if this value is 0, persistent volume is not setup for job master
twister2.job.master.cpu0.2twister2 job master cpu request
default value is 0.2 percentage
twister2.job.master.ram1000twister2 job master RAM request in MB
default value is 0.2 percentage
twister2.job.master.ip149.165.150.81

WorkerController related config parameters

NameDefaultDescription
twister2.worker.controller.max.wait.time.for.all.workers.to.join100000amount of timeout for all workers to join the job
in milli seconds
twister2.worker.controller.max.wait.time.on.barrier100000amount of timeout on barriers for all workers to arrive
in milli seconds

Dashboard related settings

NameDefaultDescription
twister2.dashboard.hosthttp://localhost:8080Dashboard server host address and port
if this parameter is not specified, then job master will not try to connect to Dashboard

Task Configurations

No specific configurations

Nomad configurations

Checkpoint Configurations

No specific configurations

Data Configurations

No specific configurations

Network Configurations

NameDefaultDescription
twister2.network.channel.classedu.iu.dsc.tws.comms.tcp.TWSTCPChannel

Resource Configurations

NameDefaultDescription
twister2.resource.scheduler.mpi.working.directory${HOME}/.twister2/jobsworking directory
twister2.resource.job.package.urlhttp://149.165.xxx.xx:8082/twister2/mesos/twister2-job.tar.gz
twister2.resource.core.package.urlhttp://149.165.xxx.xx:8082/twister2/mesos/twister2-core-0.7.0.tar.gz
twister2.resource.class.launcheredu.iu.dsc.tws.rsched.schedulers.nomad.NomadLauncherthe launcher class
twister2.resource.nomad.scheduler.urihttp://localhost:4646
twister2.resource.nomad.core.freq.mapping2000The nomad schedules cpu resources in terms of clock frequency (e.g. MHz), while Heron topologies
specify cpu requests in term of cores. This config maps core to clock freqency.
twister2.resource.filesystem.sharedtrueweather we are in a shared file system, if that is the case, each worker will not download the
core package and job package, otherwise they will download those packages
twister2.resource.nomad.shell.scriptnomad.shname of the script
twister2.resource.system.package.uri${TWISTER2_DIST}/twister2-core-0.7.0.tar.gzpath to the system core package
twister2.resource.uploader.directory/root/.twister2/repository/the directory where the file will be uploaded, make sure the user has the necessary permissions
to upload the file here.
if you want to run it on a local machine use this value
#twister2.resource.uploader.directory/var/www/html/twister2/mesos/if you want to use http server on echo
twister2.resource.uploader.scp.command.options--chmod=+rwxThis is the scp command options that will be used by the uploader, this can be used to
specify custom options such as the location of ssh keys.
twister2.resource.uploader.scp.command.connectionroot@localhostThe scp connection string sets the remote user name and host used by the uploader.
twister2.resource.uploader.ssh.command.optionsThe ssh command options that will be used when connecting to the uploading host to execute
command such as delete files, make directories.
twister2.resource.uploader.ssh.command.connectionroot@localhostThe ssh connection string sets the remote user name and host used by the uploader.
twister2.resource.class.uploaderedu.iu.dsc.tws.rsched.uploaders.localfs.LocalFileSystemUploaderfile system uploader to be used
Options
  • edu.iu.dsc.tws.rsched.uploaders.scp.ScpUploader
twister2.resource.uploader.download.methodLOCALthis is the method that workers use to download the core and job packages
it could be LOCAL, HTTP, HDFS, ..

client related configurations for job submit

NameDefaultDescription
twister2.resource.nfs.server.addresslocalhostnfs server address
twister2.resource.nfs.server.path/tmp/logsnfs server path
twister2.resource.rack.labey.keyrackrack label key for Mesos nodes in a cluster
each rack should have a unique label
all nodes in a rack should share this label
Twister2 workers can be scheduled by using these label values
Better data locality can be achieved
no default value is specified
twister2.resource.datacenter.labey.keydatacenterdata center label key
each data center should have a unique label
all nodes in a data center should share this label
Twister2 workers can be scheduled by using these label values
Better data locality can be achieved
no default value is specified
- echo['blue-rack', 'green-rack']Data center list with rack names
- green-rack['node11.ip', 'node12.ip', 'node13.ip']Rack list with node IPs in them
# workersPerPod2 # number of workers on each pod in Kubernetes. May be omitted in other clusters.A Twister2 job can have multiple sets of compute resources
instances shows the number of compute resources to be started with this specification
workersPerPod shows the number of workers on each pod in Kubernetes.
May be omitted in other clusters. default value is 1.
instances4 # number of compute resource instances with this specification
Options
  • 2 # number of workers on each pod in Kubernetes. May be omitted in other clusters.
twister2.resource.worker.additional.ports["port1", "port2", "port3"]by default each worker has one port
additional ports can be requested for all workers in a job
please provide the requested port names as a list
twister2.resource.worker_port31000worker port

Core Configurations

Twister2 Job Master related settings

NameDefaultDescription
twister2.job.master.runs.in.clienttrueif true, the job master runs in the submitting client
if false, job master runs as a separate process in the cluster
by default, it is true
when the job master runs in the submitting client, this client has to be submitting the job from a machine in the cluster
twister2.job.master.port11011twister2 job master port number
default value is 11111
twister2.worker.to.job.master.response.wait.duration10000worker to job master response wait time in milliseconds
this is for messages that wait for a response from the job master
default value is 10seconds = 10000
twister2.job.master.volatile.volume.size1.0twister2 job master volatile volume size in GB
default value is 1.0 Gi
if this value is 0, volatile volume is not setup for job master
twister2.job.master.persistent.volume.size1.0twister2 job master persistent volume size in GB
default value is 1.0 Gi
if this value is 0, persistent volume is not setup for job master
twister2.job.master.cpu0.2twister2 job master cpu request
default value is 0.2 percentage
twister2.job.master.ram1000twister2 job master RAM request in MB
default value is 0.2 percentage
twister2.job.master.iplocalhostthe job master ip to be used, this is used only in client based masters

WorkerController related config parameters

NameDefaultDescription
twister2.worker.controller.max.wait.time.for.all.workers.to.join1000000amount of timeout for all workers to join the job
in milli seconds
twister2.worker.controller.max.wait.time.on.barrier1000000amount of timeout on barriers for all workers to arrive
in milli seconds

Dashboard related settings

NameDefaultDescription
twister2.dashboard.hosthttp://localhost:8080Dashboard server host address and port
if this parameter is not specified, then job master will not try to connect to Dashboard

Task Configurations

No specific configurations
Twister2
Docs
Getting Started (Quickstart)Guides (Programming Guides)
Community
Stack OverflowProject Chat
More
BlogGitHubStar
Copyright © 2020 Indiana University