Active projects

Accumulo: secure implementation of Bigtable
ActiveMQ: message broker supporting different communication protocols and clients, including a full Java Message Service (JMS) 1.1 client.^[2]
AGE: PostgreSQL extension that provides graph database functionality in order to enable users of PostgreSQL to use graph query modeling in unison with PostgreSQL's’ existing relational model
Airavata: a distributed system software framework to manage simple to composite applications with complex execution and workflow patterns on diverse computational resources
Airflow: Python-based platform to programmatically author, schedule and monitor workflows
Allura: Python-based open source implementation of a software forge
Ambari: makes Hadoop cluster provisioning, managing, and monitoring dead simple
Ant: Java-based build tool
- AntUnit: The Ant Library provides Ant tasks for testing Ant task, it can also be used to drive functional and integration tests of arbitrary applications with Ant
- Ivy: a very powerful dependency manager oriented toward Java dependency management, even though it could be used to manage dependencies of any kind
- IvyDE: integrate Ivy in Eclipse with the IvyDE plugin
APISIX: cloud-native microservices API gateway
Archiva: Build Artifact Repository Manager
Aries: OSGi Enterprise Programming Model
Arrow: "A high-performance cross-system data layer for columnar in-memory analytics".^[3]^[4]
AsterixDB: open source Big Data Management System
Atlas: scalable and extensible set of core foundational governance services
Avro: a data serialization system.
Apache Axis Committee
- Axis: open source, XML based Web service framework
- Axis2: a service hosting and consumption framework that makes it easy to use SOAP and Web Services
- Rampart: implementation of the WS-Security standard for the Axis2 Web services engine
- Sandesha2: an Axis2 module implementing WS-RM.
Bahir: extensions to distributed analytic platforms such as Apache Spark
Beam, an uber-API for big data
Bigtop: a project for the development of packaging and tests of the Apache Hadoop ecosystem.
Bloodhound: defect tracker based on Trac^[5]
BookKeeper: a reliable replicated log service
Brooklyn: a framework for modelling, monitoring, and managing applications through autonomic blueprints
BRPC: industrial-grade RPC framework for building reliable and high-performance services
BuildStream: tool for building/integrating software stacks
BVal: Bean Validation API Implementation
Calcite: dynamic data management framework
Camel: declarative routing and mediation rules engine which implements the Enterprise Integration Patterns using a Java-based domain specific language
CarbonData: an indexed columnar data format for fast analytics on big data platform, e.g., Apache Hadoop, Apache Spark, etc
Cassandra: highly scalable second-generation distributed database
Causeway(formerly Isis): a framework for rapidly developing domain-driven apps in Java
Cayenne: Java ORM framework
Celix: implementation of the OSGi specification adapted to C and C++
CloudStack: software to deploy and manage cloud infrastructure
Cocoon: XML publishing framework
Commons: reusable Java libraries and utilities too small to merit their own project
- BCEL: Bytecode Engineering Library
- Daemon: Commons Daemon
- Jelly: Jelly is a Java and XML based scripting engine. Jelly combines the best ideas from JSTL, Velocity, DVSL, Ant and Cocoon all together in a simple yet powerful scripting engine
- Logging: Commons Logging is a thin adapter allowing configurable bridging to other, well known logging systems
- OGNL: Object Graph Navigation Library
Community Development: project that creates and provides tools, processes, and advice to help open-source software projects improve their own community health
Cordova: mobile development framework
CouchDB: Document-oriented database
Apache Creadur Committee
- Rat: improves accuracy and efficiency when reviewing and auditing releases.
- Tentacles: simplifies the job of reviewing repository releases consisting of large numbers of artefacts
- Whisker: assists assembled applications to maintain correct legal documentation.
cTAKES: clinical "Text Analysis Knowledge Extraction Software" to extract information from electronic medical record clinical free-text
Curator: builds on ZooKeeper and handles the complexity of managing connections to the ZooKeeper cluster and retrying operations
CXF: web services framework
Daffodil: implementation of the Data Format Description Language (DFDL) used to convert between fixed format data and XML/JSON
DataFu: collection of libraries for working with large-scale data in Hadoop
DataSketches: open source, high-performance library of stochastic streaming algorithms commonly called "sketches" in the data sciences
Apache DB Committee
- Derby: pure Java relational database management system
- JDO: Java Data Objects, persistence for Java objects
- Torque: ORM for Java
DeltaSpike: collection of JSR-299 (CDI) Extensions for building applications on the Java SE and EE platforms
Apache Directory Committee
- Directory: LDAP and Kerberos, entirely in Java.
- Directory Server: an extensible, embeddable LDAP and Kerberos server, entirely in Java
- Directory Studio: Eclipse based LDAP browser and directory client
- Fortress: a standards-based authorization platform that implements ANSI INCITS 359 Role-Based Access Control (RBAC)
- Kerby: Kerberos binding in Java
- LDAP API: an SDK for directory access in Java
- SCIMple is an implementation of SCIM v2.0 specification
DolphinScheduler: a distributed ETL scheduling engine with powerful DAG visualization interface
Doris: MPP-based interactive SQL data warehousing for reporting and analysis, good for both high-throughput scenarios and high-concurrency point queries
Drill: software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets
Druid: high-performance, column-oriented, distributed data store
Dubbo: high-performance, lightweight, Java-based RPC framework
ECharts: charting and data visualization library written in JavaScript
Empire-db: a lightweight relational database abstraction layer and data persistence component
EventMesh: dynamic cloud-native basic service runtime used to decouple the application and middleware layer
Felix: implementation of the OSGi Release 5 core framework specification
Fineract: Platform for Digital Financial Services
Flagon: software tool usability testing platform
Flex: cross-platform SDK for developing and deploying rich Internet applications.
Flink: fast and reliable large-scale data processing engine.
Flume: large scale log aggregation framework
Apache Fluo Committee
- Fluo: a distributed processing system that lets users make incremental updates to large data sets
- Fluo Recipes: Apache Fluo Recipes build on the Fluo API to offer additional functionality to developers
- Fluo YARN: a tool for running Apache Fluo applications in Apache Hadoop YARN
FreeMarker: a template engine, i.e. a generic tool to generate text output based on templates. FreeMarker is implemented in Java as a class library for programmers
Geode: low latency, high concurrency data management solutions
Geronimo: Java EE server
Gobblin: distributed data integration framework
Gora: an open source framework that provide an in-memory data model and persistence for big data
Griffin: an open source Data Quality solution for Big Data, which supports both batch and streaming mode. Originally developed by eBay^[6]
Groovy: an object-oriented, dynamic programming language for the Java platform
Guacamole: HTML5 web application for accessing remote desktops ^[7]
Gump: integration, dependencies, and versioning management
Hadoop: Java software framework that supports data intensive distributed applications
HAWQ: advanced enterprise SQL on Hadoop analytic engine
HBase: Apache HBase software is the Hadoop database. Think of it as a distributed, scalable, big data store
Helix: a cluster management framework for partitioned and replicated distributed resources
Hive: the Apache Hive data warehouse software facilitates querying and managing large datasets residing in distributed storage.
Hop: The Hop Orchestration Platform, or Apache Hop, aims to facilitate all aspects of data and metadata orchestration.
HTTP Server: The Apache HTTP Server application 'httpd'
- mod_python: module that integrates the Python interpreter into Apache server. Deprecated in favour of mod_wsgi.
Apache HttpComponents: low-level Java libraries for HTTP
Hudi: provides atomic upserts and incremental data streams on Big Data
Iceberg: an open standard for analytic SQL tables, designed for high performance and ease of use.
Ignite: an In-Memory Data Fabric providing in-memory data caching, partitioning, processing, and querying components^[8]
Impala: a high-performance distributed SQL engine
InLong: a one-stop integration framework for massive data that provides automatic, secure and reliable data transmission capabilities
IoTDB: data store for managing large amounts of time series data in industrial applications
Jackrabbit: implementation of the Java Content Repository API
James: Java email and news server
jclouds: open source multi-cloud toolkit for the Java platform
Jena is an open source Semantic Web framework for Java
JMeter: pure Java application for load and functional testing
Johnzon: JSR-353 compliant JSON parsing; modules to help with JSR-353 as well as JSR-374 and JSR-367
JSPWiki: A feature-rich and extensible WikiWiki engine built around the standard J2EE components (Java, servlets, JSP)
Juneau: A toolkit for marshalling POJOs to a wide variety of content types using a common framework
Kafka: a message broker software
Karaf: an OSGi distribution for server-side applications.
Kibble: a suite of tools for collecting, aggregating and visualizing activity in software projects.
Knox: a REST API Gateway for Hadoop Services
Kudu: a distributed columnar storage engine built for the Apache Hadoop ecosystem
Kvrocks: a distributed key-value NoSQL database, supporting the rich data structure
Kylin: distributed analytics engine
Kyuubi: a distributed multi-tenant Thrift JDBC/ODBC server for large-scale data management, processing, and analytics, built on top of Apache Spark and designed to support more engines
Libcloud: a standard Python library that abstracts away differences among multiple cloud provider APIs.
Linkis: a computation middleware project, which decouples the upper applications and the underlying data engines, provides standardized interfaces (REST, JDBC, WebSocket etc.) to easily connect to various underlying engines (Spark, Presto, Flink, etc.)
Apache Logging Services Committee
- Chainsaw: a GUI log viewer.
- Log4cxx: provides logging services for C++.
- Log4j: Apache Log4j
- Log4net: provides logging services for .NET.
- Log4php: a logging framework for PHP.
Apache Lucene Committee
- Lucene Core: a high-performance, full-featured text search engine library
- Solr: enterprise search server based on the Lucene Java search library
Lucene.NET: a port of the Lucene search engine library, written in C# and targeted at .NET runtime users.
MADlib: Scalable, Big Data, SQL-driven machine learning framework for Data Scientists
Mahout: machine learning and data mining solution. Mahout
ManifoldCF: Open-source software for transferring content between repositories or search indexes
Maven: Java project management and comprehension tool
- Doxia: a content generation framework, which supports many markup languages.
Mesos: open-source cluster manager
Apache MINA Committee
- FtpServer: FTP server written entirely in Java
- MINA: Multipurpose Infrastructure for Network Application, a framework to develop high performance and high scalability network applications. MINA
- SSHD: a 100% pure Java library to support the SSH protocols on both the client and server side SSHD
- Vysper: aims to be a modular, full featured XMPP (Jabber) server. Vysper is implemented in Java
Mnemonic: a transparent nonvolatile hybrid memory oriented library for Big data, High-performance computing, and Analytics
Apache MyFaces Committee
- MyFaces: JavaServer Faces implementation
- Tobago: set of user interface components based on JSF
Mynewt: embedded OS optimized for networking and built for remote management of constrained devices
NetBeans: development environment, tooling platform, and application framework
NiFi: easy to use, powerful, and reliable system to process and distribute data
Nutch: a highly extensible and scalable open source web crawler
NuttX: mature, real-time embedded operating system (RTOS)
OFBiz: Open for Business: enterprise automation software
Olingo: Client and Server for OData
Oozie: a workflow scheduler system to manage Apache Hadoop jobs.
OpenJPA: Java Persistence API Implementation
OpenMeetings: video conferencing, instant messaging, white board and collaborative document editing application
OpenNLP: natural language processing toolkit
OpenOffice: an open-source, office-document productivity suite
OpenWebBeans: Dependency Injection Platform
OpenWhisk: distributed Serverless computing platform
ORC: columnar file format for big data workloads
Ozone: scalable, redundant, and distributed object store for Hadoop
Parquet: a general-purpose columnar storage format
PDFBox: Java based PDF library (reading, text extraction, manipulation, viewer)
Mod_perl: module that integrates the Perl interpreter into Apache server
Petri: deals with the assessment of, education in, and adoption of the Foundation's policies and procedures for collaborative development and the pros and cons of joining the Foundation
Phoenix: SQL layer on HBase
Pig: a platform for analyzing large data sets on Hadoop
Pinot: a column-oriented, open-source, distributed data store written in Java^[9]
Pivot: a platform for building rich internet applications in Java
PLC4X: Universal API for communicating with programmable logic controllers
Apache POI Committee
- POI: Poor Obfuscation Implementation, a library for reading and writing Microsoft Office formats
- XMLBeans: XML–Java binding tool
APR: Apache Portable Runtime, a portability library written in C
Portals: web portal related software
Pulsar: distributed pub-sub messaging system originally created at Yahoo
Qpid: AMQP messaging system in Java and C++
Ranger: a framework to enable, monitor and manage comprehensive data security across the Hadoop platform
Ratis: Java implementation for RAFT consensus protocol
RocketMQ: a fast, low latency, reliable, scalable, distributed, easy to use message-oriented middleware, especially for processing large amounts of streaming data
Roller: a full-featured, multi-user and group blog server suitable for both small and large blog sites
Royale: improving developer productivity in creating applications for wherever JavaScript runs (and other runtimes)
Rya: cloud-based RDF triple store that supports SPARQL queries
Samza: Stream Processing Framework
Santuario: XML Security in Java and C++
SDAP: integrated data analytic center for Big Science problems
SeaTunnel: a very easy-to-use ultra-high-performance distributed data integration platform that supports real-time synchronization of massive data
Sedona: big geospatial data processing engine
Serf: high performance C-based HTTP client library built upon the Apache Portable Runtime (APR) library
ServiceComb: microservice framework that provides a set of tools and components to make development and deployment of cloud applications easier
ServiceMix: enterprise service bus that supports JBI and OSGi
ShardingSphere: related to a database clustering system providing data sharding, distributed transactions, and distributed database management
ShenYu: Java native API Gateway for service proxy, protocol conversion and API governance
Shiro: a simple to use Java Security Framework
SINGA: a distributed deep learning library
Spatial Information System (SIS): A library for developing geospatial applications
SkyWalking: application performance management and monitoring (APM)
Sling: innovative Web framework based on JCR and OSGi
Solr: Full Text search server
SpamAssassin: email filter used to identify spam
Spark: open source cluster computing framework
Steve: STeVe is a collection of online voting tools, used by the ASF, to handle STV and other voting methods
Storm: a distributed real-time computation system.
StreamPipes: self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore (Industrial) IoT data streams
Streams: Interoperability of online profiles and activity feeds
Struts: Java web applications framework
Submarine: Cloud Native Machine Learning Platform
Subversion: open source version control (client/server) system
Superset: enterprise-ready web application for data exploration, data visualization and dashboarding
Synapse: a lightweight and high-performance Enterprise Service Bus (ESB)
Syncope: an Open Source system for managing digital identities in enterprise environments.
SystemDS: scalable machine learning
Tapestry: component-based Java web framework
Apache Tcl Committee
- Tcl integration for Apache httpd
- Rivet: Server-side Tcl programming system combining ease of use and power
- Websh: Websh is a rapid development environment for building powerful, fast, and reliable web applications in Tcl
Tez: an effort to develop a generic application framework which can be used to process arbitrarily complex directed-acyclic graphs (DAGs) of data-processing tasks and also a re-usable set of data-processing primitives which can be used by other projects
Thrift : Interface definition language and binary communication protocol that is used to define and create services for numerous languages
Tika: content analysis toolkit for extracting metadata and text from digital documents of various types, e.g., audio, video, image, office suite, web, mail, and binary
TinkerPop: A graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP)
Tomcat: web container for serving servlets and JSP
- Reusable Dialog Components (RDC) Taglib: A framework for creating JSP taglibs that aid in rapid development of voice and multimodal applications
TomEE: an all-Apache Java EE 6 Web Profile stack for Apache Tomcat
Traffic Control: Built around Apache Traffic Server as the caching software, Traffic Control implements all the core functions of a modern CDN. Traffic Control
Traffic Server: HTTP/1.1 compliant caching proxy server. Traffic Server
Turbine: a servlet based framework that allows Java developers to quickly build web applications
TVM: an end to end machine learning compiler framework for CPUs, GPUs and accelerators
UIMA: unstructured content analytics framework
Unomi: reference implementation of the OASIS customer data platform specification
VCL: a cloud computing platform for provisioning and brokering access to dedicated remote compute resources.
Apache Velocity Committee:
- Anakia: an XML transformation tool which uses JDOM and Velocity to transform XML documents into multiple formats.
- Texen: a general purpose text generating utility based on Apache Velocity and Apache Ant.
- Velocity: Java template creation engine
- Apache Velocity DVSL: a tool modeled after XSLT and intended for general XML transformations using the Velocity Template Language.
- Apache Velocity Tools: tools and infrastructure for the template engine
Apache Web Services Committee
- Axiom: an XML object model supporting deferred parsing.
- Woden: used to develop a Java class library for reading, manipulating, creating and writing WSDL documents.
Whimsy: tools that display and visualize various bits of data related to ASF organizations and processes.
Wicket: component-based Java web framework
Xalan: XSLT processors in Java and C++
Xerces: validating XML parser
Apache XML Graphics Committee
- Batik: pure Java library for SVG content manipulation
- FOP: Java print formatter driven by XSL formatting objects (XSL-FO); supported output formats include PDF, PS, PCL, AFP, XML (area tree representation), Print, AWT and PNG, and to a lesser extent, RTF and TXT
- XML Graphics Commons: common components for Apache Batik and Apache FOP
Yetus: a collection of libraries and tools that enable contribution and release processes for software projects
YuniKorn: standalone resource scheduler responsible for scheduling batch jobs and long-running services on large scale distributed systems
Zeppelin: a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems
ZooKeeper: coordination service for distributed applications

Retired projects

A retired project is one which has been closed down on the initiative of the board, the project its PMC, the PPMC or the IPMC for various reasons. It is no longer developed at the Apache Software Foundation and does not have any other duties.

Abdera: implementation of the Atom Syndication Format and Atom Publishing Protocol
ACE: a distribution framework that allows central management and distribution of software components, configuration data and other artefacts to target systems
Any23: Anything To Triples (Any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a variety of Web documents
Apex: Enterprise-grade unified stream and batch processing engine
Aurora: Mesos framework for long-running services and cron jobs
AxKit: XML Application Server for Apache. It provided on-the-fly conversion from XML to any format, such as HTML, WAP or text using either W3C standard techniques, or flexible custom code
Beehive: Java visual object model
Buildr: a build system for Java-based applications, including support for Scala, Groovy and a growing number of JVM languages and tools
Chemistry: provides open source implementations of the Content Management Interoperability Services (CMIS) specification
Chukwa: Chukwa is an open source data collection system for monitoring large distributed systems
Clerezza: a service platform which provides a set of functionality for management of semantically linked data accessible through RESTful Web Services and in a secured way
Click: simple and easy-to-use Java Web Framework
Continuum: continuous integration server
Crimson: Java XML parser which supports XML 1.0 via various APIs
Crunch: Provides a framework for writing, testing, and running MapReduce pipelines
Deltacloud: provides common front-end APIs to abstract differences between cloud providers
DeviceMap: device Data Repository and classification API
DirectMemory: off-heap cache for the Java Virtual Machine
DRAT: large scale code license analysis, auditing and reporting
Eagle: open source analytics solution for identifying security and performance issues instantly on big data platforms
ECS: API for generating elements for various markup languages
ESME: secure and highly scalable microsharing and micromessaging platform that allows people to discover and meet one another and get controlled access to other sources of information, all in a business process context
Etch: cross-platform, language- and transport-independent RPC-like messaging framework
Excalibur: Java inversion of control framework including containers and components
Falcon: data governance engine
Forrest: documentation framework based upon Cocoon
Giraph: scalable Graph Processing System
Hama: Hama is an efficient and scalable general-purpose BSP computing engine
Harmony: Java SE 5 and 6 runtime and development kit
HiveMind: services and configuration microkernel
iBATIS: Persistence framework which enables mapping SQL queries to POJOs
Jakarta: server side Java, including its own set of subprojects
Jakarta Cactus: simple test framework for unit testing server-side Java code
Joshua: statistical machine translation toolkit
Apache jUDDI Committee
- Scout: Apache Scout is an implementation of the JSR 93 (JAXR).
Labs: a place for innovation where committees of the foundation can experiment with new ideas
Lens: Unified Analytics Interface
Lenya: content management system (CMS) based on Apache Cocoon
Lucy: search engine library that provides full-text search for dynamic programming languages
Marmotta: An Open Platform for Linked Data
MetaModel: provides a common interface for discovery, exploration of metadata and querying of different types of data sources.
Metron: Real-time big data security
MRUnit: Java library that helps developers unit test Apache Hadoop map reduce jobs
MXNet: Deep learning programming framework
ODE: Apache ODE is a WS-BPEL implementation that supports web services orchestration using flexible process definitions.
ObJectRelationalBridge (OJB): Object/Relational mapping tool that allowed transparent persistence for Java Objects against relational databases
Oltu - Parent: OAuth protocol implementation in Java
Onami: project focused on the development and maintenance of a set of Google Guice extensions not provided out of the box by the library itself
OODT: Object Oriented Data Technology, a data management framework for capturing and sharing data
Open Climate Workbench: A comprehensive suite of algorithms, libraries, and interfaces designed to standardize and streamline the process of interacting with large quantities of observational data and conducting regional climate model evaluations
ORO: Regular Expression engine supporting various dialects
Polygene: community based effort exploring Composite Oriented Programming for domain centric application development
PredictionIO: PredictionIO is an open source Machine Learning Server built on top of state-of-the-art open source stack, that enables developers to manage and deploy production-ready predictive services for various kinds of machine learning tasks.
REEF: A scale-out computing fabric that eases the development of Big Data applications on top of resource managers such as Apache YARN and Mesos
Regexp: Regular Expression engine
River: provides a standards-compliant JINI service
Sentry: Fine grained authorization to data and metadata in Apache Hadoop
Shale: web application framework based on JavaServer Faces
Shindig: OpenSocial container; helps start hosting OpenSocial apps quickly by providing the code to render gadgets, proxy requests, and handle REST and RPC requests
Sqoop: a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases
STDCXX: collection of algorithms, containers, iterators, and other fundamental components of every piece of software, implemented as C++ classes, templates, and functions essential for writing C++ programs
Stanbol: Software components for semantic content management
Stratos: Platform-as-a-Service (PaaS) framework
Tajo: relational data warehousing system. It using the hadoop file system as distributed storage.
Tiles: templating framework built to simplify the development of web application user interfaces.
Trafodion: Webscale SQL-on-Hadoop solution enabling transactional or operational workloads on Apache Hadoop^[10]^[11]^[12]
Tuscany: SCA implementation, also providing other SOA implementations
Twill: Use Apache Hadoop YARN's distributed capabilities with a programming model that is similar to running threads
Usergrid: an open-source Backend-as-a-Service ("BaaS" or "mBaaS") composed of an integrated distributed NoSQL database, application layer and client tier with SDKs for developers looking to rapidly build web and/or mobile applications
VXQuery: Apache VXQuery implements a parallel XML Query processor.
Wave: online real-time collaborative editing
Whirr: set of libraries for running cloud services
Wink: RESTFul web services based on JAX-RS specification
Wookie: parser, server and plugins for working with W3C Packaged Web Apps
WS Muse: implementation of the WS-ResourceFramework (WSRF), WS-BaseNotification (WSN), and WS-DistributedManagement (WSDM) specifications
Xang: XML Web Framework that aggregated multiple data sources, made that data URL addressable and defined custom methods to access that data
Xindice: XML Database
Zipkin: distributed tracing system
OpenCMIS: ollection of Java libraries, frameworks and tools around the CMIS specification for document interoperability.

The above may be incomplete, as the list of retired projects changes.

Active projects

Incubating projects

Retired projects

References

External list