Will Percey — Portfolio

Apache Stack

> > Updated Dec 2025

stream

Messaging/Streaming

Apache Kafka

High-throughput event streaming platform; durably stores and processes real-time data streams with pub-sub and queue patterns

Similar Technologies

RabbitMQAWS KinesisGoogle Pub/SubNATSRedpanda

Apache Pulsar

Multi-tenant messaging with built-in geo-replication; separates storage from compute, better for cloud deployments than Kafka

Similar Technologies

NATSAWS KinesisGoogle Pub/SubApache Kafka

Apache Flink

Stateful stream processing with exactly-once semantics; handles complex event processing, windowing, and real-time analytics

Similar Technologies

Spark StreamingKafka StreamsAWS Kinesis AnalyticsGoogle Dataflow

Apache Storm

Early real-time processing framework; largely superseded by Flink but still used for simple streaming topologies

Similar Technologies

Apache FlinkSpark StreamingKafka Streams

Apache Samza

Stream processing tightly integrated with Kafka; good for stateful transformations on Kafka topics

Similar Technologies

Kafka StreamsApache FlinkSpark Streaming

storage

Data Storage

Apache Cassandra

Masterless wide-column store optimized for writes; linear scalability across datacenters, eventual consistency

Similar Technologies

ScyllaDBDynamoDBGoogle BigtableAzure Cosmos DB

Apache HBase

Column-family database on Hadoop; random read/write access to billions of rows, strong consistency

Similar Technologies

Google BigtableDynamoDBCassandra

Apache CouchDB

Document database with multi-master replication; HTTP/JSON API, offline-first architecture

Similar Technologies

MongoDBPouchDBRavenDBCouchbase

Apache Druid

Time-series OLAP database with sub-second query latency; excellent for event data and real-time dashboards

Similar Technologies

ClickHouseTimescaleDBInfluxDBApache Pinot

Apache Doris

MPP SQL database for interactive analytics; combines fast queries with high concurrency, alternative to ClickHouse

Similar Technologies

ClickHouseDuckDBSnowflakeStarRocks

Apache Pinot

Real-time OLAP store designed for user-facing analytics; ultra-low latency on fresh data, used by LinkedIn/Uber

Similar Technologies

ClickHouseApache DruidRockset

Apache Iceberg

Table format enabling ACID transactions on data lakes; schema evolution, time travel, partition management on S3/HDFS

Similar Technologies

Delta LakeApache HudiApache XTable

api

Big Data Processing

Apache Hadoop

Distributed filesystem (HDFS) + MapReduce processing; foundation for big data ecosystem, mostly legacy now

Similar Technologies

AWS S3Google Cloud StorageAzure Blob StorageMinIO

Apache Spark

In-memory batch/streaming engine with unified API; 100x faster than MapReduce, supports SQL/ML/graph processing

Similar Technologies

DaskRayPresto/TrinoAWS EMR

Apache Hive

SQL query engine over Hadoop/S3; translates SQL to MapReduce/Spark jobs, metadata catalog for data lakes

Similar Technologies

Presto/TrinoAWS AthenaSnowflakeDatabricks SQL

Apache Pig

Dataflow scripting language for Hadoop; procedural alternative to SQL, largely replaced by Spark

Similar Technologies

Apache SparkApache HiveSQL

Search/Indexing

Apache Solr

Enterprise search with full-text indexing, faceting, geo-search; more feature-rich than Elasticsearch out-of-box

Similar Technologies

ElasticsearchMeilisearchTypesenseAlgolia

Apache Lucene

Core search library providing indexing/search algorithms; powers both Solr and Elasticsearch

Similar Technologies

TantivyBleveXapian

account_tree

Workflow/Orchestration

Apache Airflow

Python-based DAG scheduler for ETL pipelines; dynamic workflows, extensive integrations, monitoring/alerting

Similar Technologies

PrefectDagsterTemporalAWS Step FunctionsKestra

Apache NiFi

Visual dataflow automation with back-pressure handling; drag-drop ETL for moving/transforming data between systems

Similar Technologies

StreamSetsTalendAirbytePentaho

Apache Oozie

XML-based Hadoop workflow coordinator; legacy tool for scheduling MapReduce/Hive jobs

Similar Technologies

Apache AirflowPrefectLuigi

functions

Computation

Apache Arrow

In-memory columnar format for zero-copy data sharing between processes; 10-100x faster than serialization

Similar Technologies

Parquet (on-disk)FlatbuffersCap'n Proto

Apache Calcite

SQL parser/optimizer framework used by many databases; provides cost-based query optimization

Similar Technologies

Presto ParserDataFusionCustom SQL Parsers

Apache Beam

Write-once pipelines that run on Flink/Spark/Dataflow; abstracts execution engine for portable data processing

Similar Technologies

Native Flink APINative Spark APIGoogle Dataflow

star

Other Notable

Apache Superset

BI tool for exploring/visualizing data; connects to 40+ databases, SQL IDE, shareable dashboards

Similar Technologies

MetabaseGrafanaTableauLookerRedash

Apache Parquet

Columnar file format for analytics; efficient compression/encoding, predicate pushdown, industry standard

Similar Technologies

ORCArrow IPCCarbonData

Apache Avro

Row-based serialization with schema evolution; compact binary format for streaming/RPC

Similar Technologies

Protocol BuffersThriftMessagePackJSON Schema

Apache Zookeeper

Distributed coordination for leader election, config management; being replaced by Raft-based alternatives

Similar Technologies

etcdConsulRedis (with Sentinel)Kubernetes ConfigMaps