public abstract class BatchRowTSetImpl extends BaseTSetWithSchema<Row> implements BatchRowTSet
BaseTSet.StateType| Modifier | Constructor and Description |
|---|---|
protected |
BatchRowTSetImpl() |
protected |
BatchRowTSetImpl(BatchEnvironment tSetEnv,
java.lang.String name,
int parallelism,
Schema inputSchema) |
| Modifier and Type | Method and Description |
|---|---|
TBase |
addInput(java.lang.String key,
StorableTBase<?> input)
Allows users to pass in other TSets as inputs for a TSet
|
StorableTBase<Row> |
cache()
Runs this TSet and caches the data to an in-memory
DataPartition and exposes the data as another TSet. |
BatchRowTLink |
direct()
Direct/pipe communication
|
BatchEnvironment |
getTSetEnv()
tset env
|
BatchRowTLink |
join(BatchRowTSet rightTSet,
CommunicationContext.JoinType type,
java.util.Comparator<Row> keyComparator)
Joins with another
BatchTupleTSet. |
StorableTBase<Row> |
lazyCache()
Performs caching lazily.
|
StorableTBase<Row> |
lazyPersist()
Performs persisting lazily.
|
BatchRowTLink |
partition(PartitionFunc<Row> partitionFn,
int targetParallelism,
int column)
Returns a Partition
TLink that would partition data according based on a function
provided. |
StorableTBase<Row> |
persist()
Similar to cache, but the data is stored in a disk based
DataPartition. |
BatchRowTLink |
pipe() |
BatchRowTSetImpl |
setName(java.lang.String n)
Sets the name for the
TBase |
BatchRowTSet |
withSchema(RowSchema schema)
Sets the data type of the
TSet output. |
getInputSchema, getOutputSchema, setOutputSchemaaddChildToGraph, addChildToGraph, equals, getId, getInputs, getName, getParallelism, getStateType, hashCode, isMutable, rename, setMutable, setStateType, setTSetEnv, toStringclone, finalize, getClass, notify, notifyAll, wait, wait, waitbuild, getINodegenerateID, getTBaseGraphprotected BatchRowTSetImpl(BatchEnvironment tSetEnv, java.lang.String name, int parallelism, Schema inputSchema)
protected BatchRowTSetImpl()
public BatchRowTLink partition(PartitionFunc<Row> partitionFn, int targetParallelism, int column)
BatchRowTSetTLink that would partition data according based on a function
provided. The parallelism of the target TSet can also be specified.partition in interface BatchRowTSetpartitionFn - Partition functiontargetParallelism - column index to usecolumn - Target parallelismpublic BatchRowTLink join(BatchRowTSet rightTSet, CommunicationContext.JoinType type, java.util.Comparator<Row> keyComparator)
BatchRowTSetBatchTupleTSet. Note that this TSet will be considered the left
TSetjoin in interface BatchRowTSetrightTSet - right tsettype - CommunicationContext.JoinTypekeyComparator - key comparatorpublic BatchEnvironment getTSetEnv()
BuildablegetTSetEnv in interface BuildablegetTSetEnv in class BaseTSet<Row>public BatchRowTLink direct()
BatchRowTSetdirect in interface BatchRowTSetpublic BatchRowTLink pipe()
public BatchRowTSetImpl setName(java.lang.String n)
TBaseTBasepublic StorableTBase<Row> cache()
StoringDataDataPartition and exposes the data as another TSet.cache in interface StoringData<Row>public StorableTBase<Row> lazyCache()
StoringDataTSet
is evaluated explicitly.lazyCache in interface StoringData<Row>public StorableTBase<Row> persist()
StoringDataDataPartition. This method would also expose the
checkpointing ability to TSets.persist in interface StoringData<Row>public StorableTBase<Row> lazyPersist()
StoringDatalazyPersist in interface StoringData<Row>public TBase addInput(java.lang.String key, StorableTBase<?> input)
AcceptingDataaddInput in interface AcceptingData<Row>key - the key used to store the given TSetinput - a StorableTBase TSet to be added as an inputpublic BatchRowTSet withSchema(RowSchema schema)
BatchRowTSetTSet output. This will be used in the packers for efficient
SER-DE operations in the following TLinkswithSchema in interface BatchRowTSetschema - data type as a MessageTypeTSet