public abstract class BatchRowTSetImpl extends BaseTSetWithSchema<Row> implements BatchRowTSet
BaseTSet.StateType
Modifier | Constructor and Description |
---|---|
protected |
BatchRowTSetImpl() |
protected |
BatchRowTSetImpl(BatchEnvironment tSetEnv,
java.lang.String name,
int parallelism,
Schema inputSchema) |
Modifier and Type | Method and Description |
---|---|
TBase |
addInput(java.lang.String key,
StorableTBase<?> input)
Allows users to pass in other TSets as inputs for a TSet
|
StorableTBase<Row> |
cache()
Runs this TSet and caches the data to an in-memory
DataPartition and exposes the data as another TSet. |
BatchRowTLink |
direct()
Direct/pipe communication
|
BatchEnvironment |
getTSetEnv()
tset env
|
BatchRowTLink |
join(BatchRowTSet rightTSet,
CommunicationContext.JoinType type,
java.util.Comparator<Row> keyComparator)
Joins with another
BatchTupleTSet . |
StorableTBase<Row> |
lazyCache()
Performs caching lazily.
|
StorableTBase<Row> |
lazyPersist()
Performs persisting lazily.
|
BatchRowTLink |
partition(PartitionFunc<Row> partitionFn,
int targetParallelism,
int column)
Returns a Partition
TLink that would partition data according based on a function
provided. |
StorableTBase<Row> |
persist()
Similar to cache, but the data is stored in a disk based
DataPartition . |
BatchRowTLink |
pipe() |
BatchRowTSetImpl |
setName(java.lang.String n)
Sets the name for the
TBase |
BatchRowTSet |
withSchema(RowSchema schema)
Sets the data type of the
TSet output. |
getInputSchema, getOutputSchema, setOutputSchema
addChildToGraph, addChildToGraph, equals, getId, getInputs, getName, getParallelism, getStateType, hashCode, isMutable, rename, setMutable, setStateType, setTSetEnv, toString
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
build, getINode
generateID, getTBaseGraph
protected BatchRowTSetImpl(BatchEnvironment tSetEnv, java.lang.String name, int parallelism, Schema inputSchema)
protected BatchRowTSetImpl()
public BatchRowTLink partition(PartitionFunc<Row> partitionFn, int targetParallelism, int column)
BatchRowTSet
TLink
that would partition data according based on a function
provided. The parallelism of the target TSet
can also be specified.partition
in interface BatchRowTSet
partitionFn
- Partition functiontargetParallelism
- column index to usecolumn
- Target parallelismpublic BatchRowTLink join(BatchRowTSet rightTSet, CommunicationContext.JoinType type, java.util.Comparator<Row> keyComparator)
BatchRowTSet
BatchTupleTSet
. Note that this TSet will be considered the left
TSetjoin
in interface BatchRowTSet
rightTSet
- right tsettype
- CommunicationContext.JoinType
keyComparator
- key comparatorpublic BatchEnvironment getTSetEnv()
Buildable
getTSetEnv
in interface Buildable
getTSetEnv
in class BaseTSet<Row>
public BatchRowTLink direct()
BatchRowTSet
direct
in interface BatchRowTSet
public BatchRowTLink pipe()
public BatchRowTSetImpl setName(java.lang.String n)
TBase
TBase
public StorableTBase<Row> cache()
StoringData
DataPartition
and exposes the data as another TSet.cache
in interface StoringData<Row>
public StorableTBase<Row> lazyCache()
StoringData
TSet
is evaluated explicitly.lazyCache
in interface StoringData<Row>
public StorableTBase<Row> persist()
StoringData
DataPartition
. This method would also expose the
checkpointing ability to TSet
s.persist
in interface StoringData<Row>
public StorableTBase<Row> lazyPersist()
StoringData
lazyPersist
in interface StoringData<Row>
public TBase addInput(java.lang.String key, StorableTBase<?> input)
AcceptingData
addInput
in interface AcceptingData<Row>
key
- the key used to store the given TSetinput
- a StorableTBase
TSet to be added as an inputpublic BatchRowTSet withSchema(RowSchema schema)
BatchRowTSet
TSet
output. This will be used in the packers for efficient
SER-DE operations in the following TLink
swithSchema
in interface BatchRowTSet
schema
- data type as a MessageType
TSet