T
- type of the data setpublic interface TSet<T> extends TBase
TSet
would abstract a Task level computation (Source/ Compute or
Sink) in a more user friendly API. A TSet
would be followed by a TLink
that
would expose the communication level operations performed on the computation.
Note the extensions of this interface
BatchTSet
and
StreamingTSet
. These would intimately separate
out the operations based on the OperationMode
of the
data flow graph.
This interface only specifies the common operations for Batch and Streaming operations.Modifier and Type | Method and Description |
---|---|
TLink<?,T> |
allGather()
Same as gather, but all the target TSet instances would receive the gathered result in the
runtime.
|
TLink<?,T> |
allReduce(ReduceFunc<T> reduceFn)
Similar to reduce, but all instances of the target
TSet would receive the reduced
result. |
TLink<?,T> |
direct()
Returns a Direct
TLink that corresponds to the communication operation where the data
will be transferred to another TSet directly. |
TLink<?,T> |
gather()
Returns a Gather
TLink that would gather data to the target TSet instance with index
0 (in the runtime). |
<K,V> TupleTSet<K,V> |
mapToTuple(MapFunc<Tuple<K,V>,T> mapToTupleFn)
|
TLink<?,T> |
partition(PartitionFunc<T> partitionFn)
Same as above, but the parallelism will be preserved in the target
TSet . |
TLink<?,T> |
partition(PartitionFunc<T> partitionFn,
int targetParallelism)
Returns a Partition
TLink that would partition data according based on a function
provided. |
TLink<?,T> |
reduce(ReduceFunc<T> reduceFn)
Returns a Reduce
TLink that reduce data on to the target TSet instance (in the runtime)
with index 0. |
TLink<?,T> |
replicate(int replicas)
|
TSet<T> |
setName(java.lang.String name)
Sets the name
|
TSet<T> |
union(java.util.Collection<TSet<T>> tSets)
Same as above, but accepts a
Collection of TSet s. |
TSet<T> |
union(TSet<T> unionTSet)
|
TSet<T> |
withSchema(Schema dataType)
Sets the data type of the
TSet output. |
TLink<?,T> direct()
TLink
that corresponds to the communication operation where the data
will be transferred to another TSet directly.TLink<?,T> reduce(ReduceFunc<T> reduceFn)
TLink
that reduce data on to the target TSet instance (in the runtime)
with index 0.reduceFn
- Reduce functionTLink<?,T> allReduce(ReduceFunc<T> reduceFn)
TSet
would receive the reduced
result.reduceFn
- Reduce functionTLink<?,T> partition(PartitionFunc<T> partitionFn, int targetParallelism)
TLink
that would partition data according based on a function
provided. The parallelism of the target TSet
can also be specified.partitionFn
- Partition functiontargetParallelism
- Target parallelismTLink<?,T> partition(PartitionFunc<T> partitionFn)
TSet
.partitionFn
- Partition functionTLink<?,T> gather()
TLink
that would gather data to the target TSet instance with index
0 (in the runtime).TLink<?,T> allGather()
<K,V> TupleTSet<K,V> mapToTuple(MapFunc<Tuple<K,V>,T> mapToTupleFn)
TupleTSet
based on the MapFunc
provided. This will an entry point
to keyed communication operations from a non-keyed TSet
.K
- type of keyV
- type of valuemapToTupleFn
- Map functionTLink<?,T> replicate(int replicas)
TLink
that would clone/broadcast the data from this TSet
.
Note that the parallelism of this TSet
should be 1.replicas
- Replicas of data (= target TSet parallelism)TSet<T> union(TSet<T> unionTSet)
TSet
that create a union of data in both TSet
s. In order for
this to work both TSets should be of the same typeunionTSet
- TSet to union withTSet<T> union(java.util.Collection<TSet<T>> tSets)
Collection
of TSet
s.tSets
- a collection of TSet's to union with