Apache Stack

stream

Messaging/Streaming

Apache Kafka

High-throughput event streaming platform; durably stores and processes real-time data streams with pub-sub and queue patterns

Similar Technologies
RabbitMQAWS KinesisGoogle Pub/SubNATSRedpanda
Apache Pulsar

Multi-tenant messaging with built-in geo-replication; separates storage from compute, better for cloud deployments than Kafka

Similar Technologies
NATSAWS KinesisGoogle Pub/SubApache Kafka
Apache Flink

Stateful stream processing with exactly-once semantics; handles complex event processing, windowing, and real-time analytics

Similar Technologies
Spark StreamingKafka StreamsAWS Kinesis AnalyticsGoogle Dataflow
Apache Storm

Early real-time processing framework; largely superseded by Flink but still used for simple streaming topologies

Similar Technologies
Apache FlinkSpark StreamingKafka Streams
Apache Samza

Stream processing tightly integrated with Kafka; good for stateful transformations on Kafka topics

Similar Technologies
Kafka StreamsApache FlinkSpark Streaming
storage

Data Storage

Apache Cassandra

Masterless wide-column store optimized for writes; linear scalability across datacenters, eventual consistency

Similar Technologies
ScyllaDBDynamoDBGoogle BigtableAzure Cosmos DB
Apache HBase

Column-family database on Hadoop; random read/write access to billions of rows, strong consistency

Similar Technologies
Google BigtableDynamoDBCassandra
Apache CouchDB

Document database with multi-master replication; HTTP/JSON API, offline-first architecture

Similar Technologies
MongoDBPouchDBRavenDBCouchbase
Apache Druid

Time-series OLAP database with sub-second query latency; excellent for event data and real-time dashboards

Similar Technologies
ClickHouseTimescaleDBInfluxDBApache Pinot
Apache Doris

MPP SQL database for interactive analytics; combines fast queries with high concurrency, alternative to ClickHouse

Similar Technologies
ClickHouseDuckDBSnowflakeStarRocks
Apache Pinot

Real-time OLAP store designed for user-facing analytics; ultra-low latency on fresh data, used by LinkedIn/Uber

Similar Technologies
ClickHouseApache DruidRockset
Apache Iceberg

Table format enabling ACID transactions on data lakes; schema evolution, time travel, partition management on S3/HDFS

Similar Technologies
Delta LakeApache HudiApache XTable
api

Big Data Processing

Apache Hadoop

Distributed filesystem (HDFS) + MapReduce processing; foundation for big data ecosystem, mostly legacy now

Similar Technologies
AWS S3Google Cloud StorageAzure Blob StorageMinIO
Apache Spark

In-memory batch/streaming engine with unified API; 100x faster than MapReduce, supports SQL/ML/graph processing

Similar Technologies
DaskRayPresto/TrinoAWS EMR
Apache Hive

SQL query engine over Hadoop/S3; translates SQL to MapReduce/Spark jobs, metadata catalog for data lakes

Similar Technologies
Presto/TrinoAWS AthenaSnowflakeDatabricks SQL
Apache Pig

Dataflow scripting language for Hadoop; procedural alternative to SQL, largely replaced by Spark

Similar Technologies
Apache SparkApache HiveSQL
search

Search/Indexing

Apache Solr

Enterprise search with full-text indexing, faceting, geo-search; more feature-rich than Elasticsearch out-of-box

Similar Technologies
ElasticsearchMeilisearchTypesenseAlgolia
Apache Lucene

Core search library providing indexing/search algorithms; powers both Solr and Elasticsearch

Similar Technologies
TantivyBleveXapian
account_tree

Workflow/Orchestration

Apache Airflow

Python-based DAG scheduler for ETL pipelines; dynamic workflows, extensive integrations, monitoring/alerting

Similar Technologies
PrefectDagsterTemporalAWS Step FunctionsKestra
Apache NiFi

Visual dataflow automation with back-pressure handling; drag-drop ETL for moving/transforming data between systems

Similar Technologies
StreamSetsTalendAirbytePentaho
Apache Oozie

XML-based Hadoop workflow coordinator; legacy tool for scheduling MapReduce/Hive jobs

Similar Technologies
Apache AirflowPrefectLuigi
functions

Computation

Apache Arrow

In-memory columnar format for zero-copy data sharing between processes; 10-100x faster than serialization

Similar Technologies
Parquet (on-disk)FlatbuffersCap'n Proto
Apache Calcite

SQL parser/optimizer framework used by many databases; provides cost-based query optimization

Similar Technologies
Presto ParserDataFusionCustom SQL Parsers
Apache Beam

Write-once pipelines that run on Flink/Spark/Dataflow; abstracts execution engine for portable data processing

Similar Technologies
Native Flink APINative Spark APIGoogle Dataflow
star

Other Notable

Apache Superset

BI tool for exploring/visualizing data; connects to 40+ databases, SQL IDE, shareable dashboards

Similar Technologies
MetabaseGrafanaTableauLookerRedash
Apache Parquet

Columnar file format for analytics; efficient compression/encoding, predicate pushdown, industry standard

Similar Technologies
ORCArrow IPCCarbonData
Apache Avro

Row-based serialization with schema evolution; compact binary format for streaming/RPC

Similar Technologies
Protocol BuffersThriftMessagePackJSON Schema
Apache Zookeeper

Distributed coordination for leader election, config management; being replaced by Raft-based alternatives

Similar Technologies
etcdConsulRedis (with Sentinel)Kubernetes ConfigMaps