T - type of the data setpublic interface TSet<T> extends TBase
TSet would abstract a Task level computation (Source/ Compute or
Sink) in a more user friendly API. A TSet would be followed by a TLink that
would expose the communication level operations performed on the computation.
Note the extensions of this interface
BatchTSet and
StreamingTSet. These would intimately separate
out the operations based on the OperationMode of the
data flow graph.
This interface only specifies the common operations for Batch and Streaming operations.| Modifier and Type | Method and Description |
|---|---|
TLink<?,T> |
allGather()
Same as gather, but all the target TSet instances would receive the gathered result in the
runtime.
|
TLink<?,T> |
allReduce(ReduceFunc<T> reduceFn)
Similar to reduce, but all instances of the target
TSet would receive the reduced
result. |
TLink<?,T> |
direct()
Returns a Direct
TLink that corresponds to the communication operation where the data
will be transferred to another TSet directly. |
TLink<?,T> |
gather()
Returns a Gather
TLink that would gather data to the target TSet instance with index
0 (in the runtime). |
<K,V> TupleTSet<K,V> |
mapToTuple(MapFunc<Tuple<K,V>,T> mapToTupleFn)
|
TLink<?,T> |
partition(PartitionFunc<T> partitionFn)
Same as above, but the parallelism will be preserved in the target
TSet. |
TLink<?,T> |
partition(PartitionFunc<T> partitionFn,
int targetParallelism)
Returns a Partition
TLink that would partition data according based on a function
provided. |
TLink<?,T> |
reduce(ReduceFunc<T> reduceFn)
Returns a Reduce
TLink that reduce data on to the target TSet instance (in the runtime)
with index 0. |
TLink<?,T> |
replicate(int replicas)
|
TSet<T> |
setName(java.lang.String name)
Sets the name
|
TSet<T> |
union(java.util.Collection<TSet<T>> tSets)
Same as above, but accepts a
Collection of TSets. |
TSet<T> |
union(TSet<T> unionTSet)
|
TSet<T> |
withSchema(Schema dataType)
Sets the data type of the
TSet output. |
TLink<?,T> direct()
TLink that corresponds to the communication operation where the data
will be transferred to another TSet directly.TLink<?,T> reduce(ReduceFunc<T> reduceFn)
TLink that reduce data on to the target TSet instance (in the runtime)
with index 0.reduceFn - Reduce functionTLink<?,T> allReduce(ReduceFunc<T> reduceFn)
TSet would receive the reduced
result.reduceFn - Reduce functionTLink<?,T> partition(PartitionFunc<T> partitionFn, int targetParallelism)
TLink that would partition data according based on a function
provided. The parallelism of the target TSet can also be specified.partitionFn - Partition functiontargetParallelism - Target parallelismTLink<?,T> partition(PartitionFunc<T> partitionFn)
TSet.partitionFn - Partition functionTLink<?,T> gather()
TLink that would gather data to the target TSet instance with index
0 (in the runtime).TLink<?,T> allGather()
<K,V> TupleTSet<K,V> mapToTuple(MapFunc<Tuple<K,V>,T> mapToTupleFn)
TupleTSet based on the MapFunc provided. This will an entry point
to keyed communication operations from a non-keyed TSet.K - type of keyV - type of valuemapToTupleFn - Map functionTLink<?,T> replicate(int replicas)
TLink that would clone/broadcast the data from this TSet.
Note that the parallelism of this TSet should be 1.replicas - Replicas of data (= target TSet parallelism)TSet<T> union(TSet<T> unionTSet)
TSet that create a union of data in both TSets. In order for
this to work both TSets should be of the same typeunionTSet - TSet to union withTSet<T> union(java.util.Collection<TSet<T>> tSets)
Collection of TSets.tSets - a collection of TSet's to union with