public abstract class BatchTSetImpl<T> extends BaseTSetWithSchema<T> implements BatchTSet<T>
BaseTSet.StateType
Constructor and Description |
---|
BatchTSetImpl() |
Modifier and Type | Method and Description |
---|---|
BatchTSetImpl<T> |
addInput(java.lang.String key,
StorableTBase<?> input)
Adds data to this TSet
|
AllGatherTLink<T> |
allGather()
Same as gather, but all the target TSet instances would receive the gathered result in the
runtime.
|
AllReduceTLink<T> |
allReduce(ReduceFunc<T> reduceFn)
Similar to reduce, but all instances of the target
TSet would receive the reduced
result. |
CachedTSet<T> |
cache()
Runs this TSet and caches the data to an in-memory
DataPartition and exposes the data as another TSet. |
DirectTLink<T> |
direct()
Returns a Direct
TLink that corresponds to the communication operation where the data
will be transferred to another TSet directly. |
GatherTLink<T> |
gather()
Returns a Gather
TLink that would gather data to the target TSet instance with index
0 (in the runtime). |
BatchEnvironment |
getTSetEnv()
tset env
|
CachedTSet<T> |
lazyCache()
Performs caching lazily.
|
PersistedTSet<T> |
lazyPersist()
Performs persisting lazily.
|
<K,V> KeyedTSet<K,V> |
mapToTuple(MapFunc<Tuple<K,V>,T> mapToTupleFn)
|
PartitionTLink<T> |
partition(PartitionFunc<T> partitionFn)
Same as above, but the parallelism will be preserved in the target
TSet . |
PartitionTLink<T> |
partition(PartitionFunc<T> partitionFn,
int targetParallelism)
Returns a Partition
TLink that would partition data according based on a function
provided. |
PersistedTSet<T> |
persist()
Similar to cache, but the data is stored in a disk based
DataPartition . |
PipeTLink<T> |
pipe() |
ReduceTLink<T> |
reduce(ReduceFunc<T> reduceFn)
Returns a Reduce
TLink that reduce data on to the target TSet instance (in the runtime)
with index 0. |
ReplicateTLink<T> |
replicate(int replications)
|
ComputeTSet<T,java.util.Iterator<T>> |
union(java.util.Collection<TSet<T>> tSets)
Same as above, but accepts a
Collection of TSet s. |
ComputeTSet<T,java.util.Iterator<T>> |
union(TSet<T> other)
|
BatchTSetImpl<T> |
withSchema(Schema schema)
Sets the data type of the
TSet output. |
getInputSchema, getOutputSchema, setOutputSchema
addChildToGraph, addChildToGraph, equals, getId, getInputs, getName, getParallelism, getStateType, hashCode, isMutable, rename, setMutable, setStateType, setTSetEnv, toString
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
build, getINode
generateID, getTBaseGraph
public BatchEnvironment getTSetEnv()
Buildable
getTSetEnv
in interface Buildable
getTSetEnv
in class BaseTSet<T>
public DirectTLink<T> direct()
TSet
TLink
that corresponds to the communication operation where the data
will be transferred to another TSet directly.public ReduceTLink<T> reduce(ReduceFunc<T> reduceFn)
TSet
TLink
that reduce data on to the target TSet instance (in the runtime)
with index 0.public PartitionTLink<T> partition(PartitionFunc<T> partitionFn, int targetParallelism)
TSet
public PartitionTLink<T> partition(PartitionFunc<T> partitionFn)
TSet
TSet
.public GatherTLink<T> gather()
TSet
TLink
that would gather data to the target TSet instance with index
0 (in the runtime).public AllReduceTLink<T> allReduce(ReduceFunc<T> reduceFn)
TSet
TSet
would receive the reduced
result.public AllGatherTLink<T> allGather()
TSet
public <K,V> KeyedTSet<K,V> mapToTuple(MapFunc<Tuple<K,V>,T> mapToTupleFn)
TSet
TupleTSet
based on the MapFunc
provided. This will an entry point
to keyed communication operations from a non-keyed TSet
.mapToTuple
in interface BatchTSet<T>
mapToTuple
in interface TSet<T>
K
- type of keyV
- type of valuemapToTupleFn
- Map functionpublic ComputeTSet<T,java.util.Iterator<T>> union(TSet<T> other)
TSet
public ComputeTSet<T,java.util.Iterator<T>> union(java.util.Collection<TSet<T>> tSets)
TSet
Collection
of TSet
s.public ReplicateTLink<T> replicate(int replications)
TSet
public CachedTSet<T> cache()
StoringData
DataPartition
and exposes the data as another TSet.cache
in interface StoringData<T>
public CachedTSet<T> lazyCache()
StoringData
TSet
is evaluated explicitly.lazyCache
in interface StoringData<T>
public PersistedTSet<T> persist()
StoringData
DataPartition
. This method would also expose the
checkpointing ability to TSet
s.persist
in interface StoringData<T>
public PersistedTSet<T> lazyPersist()
StoringData
lazyPersist
in interface StoringData<T>
public BatchTSetImpl<T> addInput(java.lang.String key, StorableTBase<?> input)
BatchTSet
addInput
in interface AcceptingData<T>
addInput
in interface BatchTSet<T>
key
- the key used to store the given TSetinput
- a @StorableTBase
TSet to be added as an inputpublic BatchTSetImpl<T> withSchema(Schema schema)
TSet
TSet
output. This will be used in the packers for efficient
SER-DE operations in the following TLink
swithSchema
in interface TSet<T>
schema
- data type as a MessageType
TSet