Apache Arrow
Developer(s)	Apache Software Foundation
Initial release	October 10, 2016; 7 years ago (2016-10-10)

Stable release	13.0.0^[1] / 23 August 2023; 10 months ago (23 August 2023)

Repository	https://github.com/apache/arrow
Written in	C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, Rust
Type	Data format, algorithms
License	Apache License 2.0
Website	arrow.apache.org

Apache Arrow is a language-agnostic software framework for developing data analytics applications that process columnar data. It contains a standardized column-oriented memory format that is able to represent flat and hierarchical data for efficient analytic operations on modern CPU and GPU hardware.^[2]^[3]^[4]^[5]^[6] This reduces or eliminates factors that limit the feasibility of working with large sets of data, such as the cost, volatility, or physical constraints of dynamic random-access memory.^[7]

Interoperability

Arrow can be used with Apache Parquet, Apache Spark, NumPy, PySpark, pandas and other data processing libraries. The project includes native software libraries written in C, C++, C#, Go, Java, JavaScript, Julia, MATLAB, Python, R, Ruby, and Rust. Arrow allows for zero-copy reads and fast data access and interchange without serialization overhead between these languages and systems.^[2]

Applications

Arrow has been used in diverse domains, including analytics,^[8] genomics,^[9]^[7] and cloud computing.^[10]

Comparison to Apache Parquet and ORC

Apache Parquet and Apache ORC are popular examples of on-disk columnar data formats. Arrow is designed as a complement to these formats for processing data in-memory.^[11] The hardware resource engineering trade-offs for in-memory processing vary from those associated with on-disk storage.^[12] The Arrow and Parquet projects include libraries that allow for reading and writing data between the two formats.^[13]

Governance

Apache Arrow was announced by The Apache Software Foundation on February 17, 2016,^[14] with development led by a coalition of developers from other open source data analytics projects.^[15]^[16]^[6]^[17]^[18] The initial codebase and Java library was seeded by code from Apache Drill.^[14]

References

External links

The Apache Software Foundation

The Apache Software Foundation
Top-level projects	Accumulo ActiveMQ Airavata Airflow Allura Ambari Ant Aries Arrow Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound Brooklyn Calcite Camel CarbonData Cassandra Cayenne CloudStack Cocoon Cordova CouchDB cTAKES CXF Derby Directory Drill Druid Empire-db Felix Flex Flink Flume FreeMarker Geronimo Groovy Guacamole Gump Hadoop HBase Helix Hive Iceberg Ignite Impala Jackrabbit James Jena JMeter Kafka Kudu Kylin Lucene Mahout Maven MINA mod_perl MyFaces Mynewt NiFi NetBeans Nutch NuttX OFBiz Oozie OpenEJB OpenJPA OpenNLP OрenOffice ORC PDFBox Parquet Phoenix POI Pig Pinot Pivot Qpid Roller RocketMQ Samza Shiro SINGA Sling Solr Spark Storm SpamAssassin Struts 1 Struts 2 Subversion Superset SystemDS Tapestry Thrift Tika TinkerPop Tomcat Trafodion Traffic Server UIMA Velocity Wicket Xalan Xerces XMLBeans Yetus ZooKeeper
Commons	BCEL BSF Daemon Jelly Logging
Incubator	Taverna
Other projects	Batik FOP Ivy Log4j
Attic	Apex AxKit Beehive Bluesky iBATIS Click Continuum Deltacloud Etch Giraph Hama Harmony Jakarta Marmotta MXNet ODE River Shale Slide Sqoop Stanbol Tuscany Wave XML
Licenses	Apache License
Category