Twister2

Twister2

  • Getting Started
  • Docs
  • Tutorial
  • AI
  • Examples
  • Contribute
  • Download
  • Configurations
  • Java Docs
  • GitHub
  • Blog

›APIs

Compiling

  • Overview
  • Linux
  • MacOS
  • Maven Artifacts

App Development

  • API Overview
  • Developing Applications
  • Streaming Jobs
  • Batch Jobs

APIs

  • Worker API
  • Data API
  • Compute API
  • Operator API
  • Windowing API
  • Storm API
  • Apache Beam
  • Python API

Deployment

  • Job Submit
  • Standalone
  • Docker
  • Kubernetes
  • Minikube
  • Mesos
  • Nomad
  • Slurm
  • Dashboard
  • Logging
  • Configurations

Concepts

  • Overview
  • Architecture
  • Operators
  • Task System
  • Data Access

Resources

  • Publications

Python API

The Python API provides the ability to define Twister2 TSet jobs in python.

Getting Started

Following components are required in your system.

  1. Python 3.7
  2. JVM 8+

Setting up twister2 python support

pip3 install twister2

twister2 uses jep to create python interpreters within the JVM. Inorder for this to work properly, jep's native libraries should be accessible by the java process. Update your LD_LIBRARY_PATH environment variable to include jep.so inorder to make it accessible by java process at runtime. The location of jep can be found as follows.

pip show jep

This will give an output similar to below.

Name: jep
Version: 3.9.0
Summary: Jep embeds CPython in Java
Home-page: https://github.com/ninia/jep
Author: Jep Developers
Author-email: jep-project@googlegroups.com
License: zlib/libpng
Location: /home/username/.local/lib/python3.7/site-packages
Requires: 
Required-by: twister2

Now copy the Location and update the LD_LIBRARY_PATH as follows.

export LD_LIBRARY_PATH=/home/username/.local/lib/python3.7/site-packages/jep:$LD_LIBRARY_PATH

On latest MacOS versions, setting LD_LIBRARY_PATH will not work due to System Integrity Protect (SIP) . SIP can be disabled with following steps.

  1. Restart your Mac.
  2. Before OS X starts up, hold down Command-R and keep it held down until you see an Apple icon and a progress bar. Release. This boots you into Recovery.
  3. From the Utilities menu, select Terminal.
  4. At the prompt type exactly the following and then press Return: csrutil disable
  5. Restart

Writing your first application

Twister2 Environment

env = Twister2Environment(name="MyPython", config={"sample_config": 123}, resources=[{"cpu": 1, "ram": 1024, "instances": 2}])

Twister2Environment can be considered as the entry point to connect to the java runtime. Twister2Environment optionally allows you to configure your job and resources.

TSet Source

TSet is all about data transformations. Hence every TSet workflow starts with a source.

TSet source in python should follow below skeleton.

class SourceFunc(ABC):

    def __init__(self):
        pass

    @abstractmethod
    def has_next(self):
        return False

    @abstractmethod
    def next(self):
        return None

Shown below is a simple integer source created based on above skeleton.

from twister2.tset.fn.SourceFunc import SourceFunc

class IntegerSource(SourceFunc):

    def __init__(self, limit):
        super(IntegerSource, self).__init__()
        self.x = 0
        self.limit = limit

    def has_next(self):
        return self.x < self.limit

    def next(self):
        self.x += 1
        return self.x

Once defined, you may add this source to TSet environment as follows.

source = env.create_source(IntegerSource(10), 4)

The second parameter of create_source is the parallelism. In this case(4), twister2 will run 4 integer sources parallelly across the cluster.

Starting from a source, you may transform your data as required by utilizing twister2 TSet operations.

Example - KMeans

This example utilizes existing tools for python such as numpy and scikit-learn for data representation and computation and use twister2 for communication.

# Twister2 Python API supports Numpy
import numpy as np

from twister2.TSetContext import TSetContext
from twister2.Twister2Environment import Twister2Environment
from twister2.tset.fn.SourceFunc import SourceFunc

env = Twister2Environment(resources=[{"cpu": 4, "ram": 4096, "instances": 1}])


class PointSource(SourceFunc):

    def __init__(self, size, count):
        self.i = 0
        self.read = True
        self.size = size
        self.count = count

    def has_next(self):
        return self.read

    def next(self):
        self.i = self.i + 1
        arr = np.random.rand(self.count, self.size)
        self.read = False
        return arr


data = env.create_source(PointSource(100, 10000), 10).cache()
centers = env.create_source(PointSource(100, 200), 10).cache()


def apply_kmeans(points, ctx: TSetContext):
    # If you need to use a 3rd party library, you should import them within the function body.
    from sklearn.cluster import KMeans
    c = ctx.get_input("centroids")
    centers = c.get_partition(ctx.get_index()).consumer().__next__()
    kmeans = KMeans(init=centers, n_clusters=200, n_init=1).fit(points)
    return kmeans.cluster_centers_

# Most of the TSet operations accept a function. You can even pass a lambda instead!
mapped = data.direct().map(apply_kmeans)


def reduce_centroids(c1, c2):
    return c1 + c2


def average(points, ctx):
    import numpy as np
    return np.divide(points, 10)


reduced = mapped.all_reduce(reduce_centroids).map(average)

for i in range(10):
    mapped.add_input("centroids", centers)
    centers = reduced.cache()

centers.direct().for_each(lambda x: print(x))

Submitting Python Jobs to Twister2

Twister2 accepts two types of python jobs.

  1. Single python file (If you have just one file defining the whole job)
./bin/twister2 submit standalone python /absolute/path/to/your_script.py your_script.py  arg1 arg2
  1. Zip file (If you have multiple python files, you can submit them as a zip file)
./bin/twister2 submit standalone python_zip /absolute/path/to/your_zip.zip entry_point.py  arg1 arg2

entry_point.py is where you would initialize Twister2Environment

arg1 and arg2 in above two commands will be passed as system args to the python script.

← Apache BeamJob Submit →
  • Getting Started
    • Following components are required in your system.
    • Setting up twister2 python support
  • Writing your first application
    • Twister2 Environment
    • TSet Source
    • Example - KMeans
  • Submitting Python Jobs to Twister2
Twister2
Docs
Getting Started (Quickstart)Guides (Programming Guides)
Community
Stack OverflowProject Chat
More
BlogGitHubStar
Copyright © 2020 Indiana University