Apache Arrow

Apache Arrow is an in-memory columnar data format that is used in Spark to efficiently transfer data between JVM and Python processes. This currently is most beneficial to Python users that work with Pandas/NumPy data. Its usage is not automatic and might require some minor changes to configuration or code to take full advantage and ensure compatibility.

PreviousBucketing, Sorting and Partitioning NextInstall Python Arrow Module PyArrow

Last updated 5 years ago

Was this helpful?