Caches the underlying RDD in memory.
Caches the underlying RDD in memory.
new MaRe object
:: Experimental :: First collects the data locally on disk, and then reduces and writes it to a local output path using a Docker container command.
:: Experimental :: First collects the data locally on disk, and then reduces and writes it to a local output path using a Docker container command. This is an experimental feature (use at your own risk).
mount point for the partitions that is passed to the containers
mount point where the processed partition is read back to Spark
Docker image name
Docker command
local output path
if set to true the Docker image will be pulled even if present locally
intermediate results storage level (default: MEMORY_AND_DISK)
Returns the number of partitions of the underlying RDD.
Returns the number of partitions of the underlying RDD.
number of partitions of the underlying RDD
Maps each RDD partition through a Docker container command.
Maps each RDD partition through a Docker container command.
mount point for the partitions that is passed to the containers
mount point where the processed partition is read back to Spark
Docker image name
Docker command
if set to true the Docker image will be pulled even if present locally
new MaRe object
input RDD
Reduces the data to a single partition using a Docker container command.
Reduces the data to a single partition using a Docker container command. The command is applied using a tree reduce strategy.
mount point for the partitions that is passed to the containers
mount point where the processed partition is read back to Spark
Docker image name
Docker command
depth of the reduce tree (default: 2, must be greater than or equal to 2)
if set to true the Docker image will be pulled even if present locally
new MaRe object
Repartitions the underlying RDD to the specified number of partitions.
Repartitions the underlying RDD to the specified number of partitions.
number of partitions for the underlying RDD
new MaRe object
Repartitions data according to keyBy and org.apache.spark.HashPartitioner.
Repartitions data according to keyBy and org.apache.spark.HashPartitioner.
given a record computes a key
number of partitions for the resulting RDD
Repartitions data according to keyBy and a custom partitioner.
Repartitions data according to keyBy and a custom partitioner.
given a record computes a key
custom partitioner
MaRe API.