public class BatchEnvironment extends TSetEnvironment
TSetEnvironment
for batch OperationMode
.
There are 2 ways a tset be executed. 1. running a tset (at the completion, execution will be closed) 2. evaluating a tset (execution will be kept alive until 'finishEval' method is called)
And there are 3 execution options. 1. Running/ evaluating a subgraph/ DAG from a specified TSet 2. Running/ evaluating a tset and update another with the results 3. Running just a single source TSet
Constructor and Description |
---|
BatchEnvironment() |
BatchEnvironment(WorkerEnvironment wEnv) |
Modifier and Type | Method and Description |
---|---|
BaseTSet<java.lang.Integer> |
createArrowSource(java.lang.String filePath,
int parallelism) |
SourceTSet<java.lang.Object> |
createArrowSource(java.lang.String filePath,
java.lang.String fileName,
int parallelism,
java.lang.String schema) |
SourceTSet<java.lang.String[]> |
createCSVSource(java.lang.String filePath,
int datasize,
int parallelism,
java.lang.String type) |
<K,V,F extends <any>> |
createHadoopSource(Configuration configuration,
java.lang.Class<F> inputFormat,
int parallel) |
<K,V,F extends <any>,I> |
createHadoopSource(Configuration configuration,
java.lang.Class<F> inputFormat,
int parallel,
MapFunc<I,Tuple<K,V>> mapFunc) |
<K,V,F extends <any>,K2,V2> |
createKeyedHadoopSource(Configuration configuration,
java.lang.Class<F> inputFormat,
int parallel,
MapFunc<Tuple<K2,V2>,Tuple<K,V>> mapFunc) |
<K,V> KeyedSourceTSet<K,V> |
createKeyedSource(SourceFunc<Tuple<K,V>> source,
int parallelism)
Creates a Keyed Source TSet based on the
SourceFunc that produces a Tuple |
<K,V> KeyedSourceTSet<K,V> |
createKeyedSource(java.lang.String name,
SourceFunc<Tuple<K,V>> source,
int parallelism)
Same as above, but a source tset name can be provided at the instantiation
|
RowSourceTSet |
createRowSource(java.lang.String name,
SourceFunc<Row> source,
int parallelism) |
<T> SourceTSet<T> |
createSource(SourceFunc<T> source,
int parallelism)
Creates a source TSet based on the
SourceFunc |
<T> SourceTSet<T> |
createSource(java.lang.String name,
SourceFunc<T> source,
int parallelism)
Same as above, but a source tset name can be provided at the instantiation
|
SourceTSet<java.lang.String> |
createTextSource(java.lang.String filePath,
int dataSize,
int parallelism,
java.lang.String type) |
void |
eval(BaseTSet evalTSet)
Evaluates the TSet using iterative execution in the executor.
|
<T,ST extends BaseTSet<T> & StorableTBase<T>> |
evalAndUpdate(ST evalTSet,
ST updateTSet)
Similar to eval, but here, the data produced by the evaluation will be passed on to the
updateTSet
|
void |
finishEval(BaseTSet evalTset)
Completes iterative execution for evaluated TSet
|
<T> DataObject<T> |
getData(java.lang.String key) |
OperationMode |
getOperationMode()
Returns the
OperationMode |
void |
run(BaseTSet leafTset)
Runs a subgraph of TSets from the specified TSet
|
<T,ST extends BaseTSet<T> & StorableTBase<T>> |
runAndUpdate(ST runTSet,
ST updateTSet)
Runs a subgraph of TSets from the specified TSet and output results as a tset
|
void |
runOne(BaseTSet tSet)
Runs a single TSet (NO subgraph execution!)
|
addInput, close, executeBuildContext, getConfig, getDefaultParallelism, getGraph, getInputs, getNoOfWorkers, getTaskExecutor, getTSetGraph, getWorkerEnv, getWorkerID, initBatch, initCheckpointing, initStreaming, isCheckpointingEnabled, parallelize, parallelize, parallelize, parallelize, setDefaultParallelism, settBaseGraph
public BatchEnvironment(WorkerEnvironment wEnv)
public BatchEnvironment()
public OperationMode getOperationMode()
TSetEnvironment
OperationMode
getOperationMode
in class TSetEnvironment
public <T> SourceTSet<T> createSource(SourceFunc<T> source, int parallelism)
TSetEnvironment
SourceFunc
createSource
in class TSetEnvironment
T
- data typesource
- source functionparallelism
- parallelismpublic SourceTSet<java.lang.String[]> createCSVSource(java.lang.String filePath, int datasize, int parallelism, java.lang.String type)
createCSVSource
in class TSetEnvironment
public SourceTSet<java.lang.Object> createArrowSource(java.lang.String filePath, java.lang.String fileName, int parallelism, java.lang.String schema)
public SourceTSet<java.lang.String> createTextSource(java.lang.String filePath, int dataSize, int parallelism, java.lang.String type)
createTextSource
in class TSetEnvironment
public BaseTSet<java.lang.Integer> createArrowSource(java.lang.String filePath, int parallelism)
createArrowSource
in class TSetEnvironment
public <T> SourceTSet<T> createSource(java.lang.String name, SourceFunc<T> source, int parallelism)
TSetEnvironment
createSource
in class TSetEnvironment
T
- data typename
- name for the tsetsource
- source functionparallelism
- parallelismpublic <K,V> KeyedSourceTSet<K,V> createKeyedSource(SourceFunc<Tuple<K,V>> source, int parallelism)
TSetEnvironment
SourceFunc
that produces a Tuple
createKeyedSource
in class TSetEnvironment
K
- key typeV
- value typesource
- source functionparallelism
- parallelismpublic <K,V> KeyedSourceTSet<K,V> createKeyedSource(java.lang.String name, SourceFunc<Tuple<K,V>> source, int parallelism)
TSetEnvironment
createKeyedSource
in class TSetEnvironment
K
- key typeV
- value typename
- name for the tsetsource
- source functionparallelism
- parallelismpublic RowSourceTSet createRowSource(java.lang.String name, SourceFunc<Row> source, int parallelism)
public <K,V,F extends <any>> SourceTSet<Tuple<K,V>> createHadoopSource(Configuration configuration, java.lang.Class<F> inputFormat, int parallel)
public <K,V,F extends <any>,I> SourceTSet<I> createHadoopSource(Configuration configuration, java.lang.Class<F> inputFormat, int parallel, MapFunc<I,Tuple<K,V>> mapFunc)
public <K,V,F extends <any>,K2,V2> KeyedSourceTSet<K2,V2> createKeyedHadoopSource(Configuration configuration, java.lang.Class<F> inputFormat, int parallel, MapFunc<Tuple<K2,V2>,Tuple<K,V>> mapFunc)
public <T> DataObject<T> getData(java.lang.String key)
public void runOne(BaseTSet tSet)
tSet
- tset to runpublic void run(BaseTSet leafTset)
leafTset
- TSet to be runpublic <T,ST extends BaseTSet<T> & StorableTBase<T>> void runAndUpdate(ST runTSet, ST updateTSet)
T
- type of the output data objectrunTSet
- TSet to be runupdateTSet
- TSet to be updatedpublic void eval(BaseTSet evalTSet)
evalTSet
- TSet to be evaluatedpublic <T,ST extends BaseTSet<T> & StorableTBase<T>> void evalAndUpdate(ST evalTSet, ST updateTSet)
T
- typeevalTSet
- TSet to be evaluatedupdateTSet
- TSet to be updatedpublic void finishEval(BaseTSet evalTset)
evalTset
- TSet to be evaluated