
Personal Project
Distributed File Retrieval Engine
Overview
A high-performance distributed file retrieval engine designed to query and return file paths and term frequencies across a network of nodes, prioritizing low-latency communication and high concurrency.
Role
Sole Engineer
Problem
Querying file metadata and term frequencies across distributed nodes traditionally introduces severe communication bottlenecks. A basic server-reply architecture blocks under high concurrency, drastically increasing latency during concurrent retrieval requests.
Solution
Built a distributed engine that evolved from a standard server-reply model to a highly concurrent Router-Dealer communication pattern. Utilized gRPC for structured remote procedure calls and ZeroMQ for asynchronous message queuing.
Architecture
A distributed node network utilizing gRPC for strict service definitions and ZeroMQ for high-throughput messaging, orchestrated via a Router-Dealer topology.
Key Design Decisions
- Implementation of the ZeroMQ Router-Dealer pattern to handle asynchronous, non-blocking requests
- gRPC integration for strongly typed, cross-service communication and payload serialization
- Distributed querying of file paths and term frequencies with aggregated result compilation
- Optimized for low-latency network communication and high concurrent node scaling
Challenges
- Migrating the initial server-reply architecture to a Router-Dealer pattern without dropping in-flight requests
- Handling network partitions and node timeouts during term frequency aggregation
- Ensuring thread safety and memory efficiency when parsing large file directories concurrently
Impact
- Successfully implemented a robust distributed systems architecture capable of non-blocking parallel execution
- Deepened foundational knowledge of low-level network protocols and message queue orchestration