This article relies excessively on references to primary sources. Please improve this article by adding secondary or tertiary sources. Find sources: "Apache Drill" – news · newspapers · books · scholar · JSTOR (September 2012) (Learn how and when to remove this message)
Apache Drill
Developer(s)Apache Software Foundation
Initial releaseMay 19, 2015; 9 years ago (2015-05-19)
Stable release
1.20.3 / January 7, 2023; 17 months ago (2023-01-07)
RepositoryDrill Repository
Written inJava
Operating systemCross-platform
LicenseApache License 2.0
Websitedrill.apache.org

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR,[1][2] Drill is inspired by Google's Dremel system.[3] Drill is an Apache top-level project.[4] Tom Shiran is the founder of the Apache Drill Project.[5] It was designated an Apache Software Foundation top-level project in December 2016.[6]

Drill supports a variety of NoSQL databases and file systems, including Alluxio, HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores.

Drill's datastore-aware optimizer automatically restructures a query plan to leverage the datastore's internal processing capabilities. In addition, Drill supports data locality, if Drill and the datastore are on the same nodes.[7]

Features

One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds.[8]

Back-end Support

Drill is primarily focused on non-relational datastores, including Apache Hadoop text files, NoSQL, and cloud storage. A notable feature also includes in situ querying of local JSON and Apache Parquet files. Some additional datastores that it supports include:

A new datastore can be added by developing a storage plugin. Drill's "schema-free" JSON data model enables it to query non-relational datastores in-situ .[9]

Front-end Support

Drill itself can be queried via JDBC, ODBC, or REST through a variety of methods and languages including Python and Java. The default install includes a web interface allowing end-users to execute ANSI SQL directly and export data tables as CSV files without any programming.

The dashboard library, Apache Superset,[10] is particularly well suited for visualization of data queried with Drill.

See also

References

  1. ^ Friedman, Ellen (21 Sep 2015). "Apache Drill: Tracking its history as an open source community". Archived from the original on 18 March 2016.
  2. ^ "Brief About The Differences between Apache Drill Vs Presto". HitechNectar. Retrieved 2023-04-13.
  3. ^ "Spark SQL vs. Apache Drill-War of the SQL-on-Hadoop Tools". ProjectPro. Retrieved 2022-11-15.
  4. ^ "The Apache Software Foundation Announces Apache Drill as a Top-Level Project". 2 December 2014. Retrieved 2014-12-02.
  5. ^ Vizard, Michael (2021-09-01). "Apache Software Foundation updates Drill for broader SQL queries". VentureBeat. Retrieved 2022-10-20.
  6. ^ "Apache Drill Eliminates ETL, Data Transformation for MapR Database". The New Stack. 2016-04-11. Retrieved 2022-11-15.
  7. ^ "Apache Drill - Schema-free SQL for Hadoop, NoSQL and Cloud Storage". drill.apache.org. Retrieved 2015-12-29.
  8. ^ "DrillProposal - INCUBATOR - Apache Software Foundation".
  9. ^ "Frequently Asked Questions - Apache Drill". drill.apache.org. Retrieved 2015-12-29.
  10. ^ Wayner, James R. Borck, Martin Heller, Steven Nuñez, Andrew C. Oliver, Ian Pointer and Peter (2020-10-05). "The best open source software of 2020". InfoWorld. Retrieved 2022-11-26.((cite web)): CS1 maint: multiple names: authors list (link)

Papers

Some papers influenced the birth and design. Here is a partial list: