← Back to Portfolio
Distributed File Retrieval Engine

Personal Project

Distributed File Retrieval Engine

gRPCZeroMQJava

Overview

A high-performance distributed file retrieval engine designed to query and return file paths and term frequencies across a network of nodes, prioritizing low-latency communication and high concurrency.


Role

Sole Engineer


Problem

Querying file metadata and term frequencies across distributed nodes traditionally introduces severe communication bottlenecks. A basic server-reply architecture blocks under high concurrency, drastically increasing latency during concurrent retrieval requests.

Solution

Built a distributed engine that evolved from a standard server-reply model to a highly concurrent Router-Dealer communication pattern. Utilized gRPC for structured remote procedure calls and ZeroMQ for asynchronous message queuing.


Architecture

A distributed node network utilizing gRPC for strict service definitions and ZeroMQ for high-throughput messaging, orchestrated via a Router-Dealer topology.

Key Design Decisions

  • Implementation of the ZeroMQ Router-Dealer pattern to handle asynchronous, non-blocking requests
  • gRPC integration for strongly typed, cross-service communication and payload serialization
  • Distributed querying of file paths and term frequencies with aggregated result compilation
  • Optimized for low-latency network communication and high concurrent node scaling

Challenges

  • Migrating the initial server-reply architecture to a Router-Dealer pattern without dropping in-flight requests
  • Handling network partitions and node timeouts during term frequency aggregation
  • Ensuring thread safety and memory efficiency when parsing large file directories concurrently

Impact

  • Successfully implemented a robust distributed systems architecture capable of non-blocking parallel execution
  • Deepened foundational knowledge of low-level network protocols and message queue orchestration