InfoQ Homepage Distributed Systems Content on InfoQ
-
Replacing Database Sequences at Scale Without Breaking 100+ Services
The article discusses the challenges faced during a migration from a relational database to NoSQL, focusing on the importance of database sequences for unique identifiers. It outlines the development of a new sequence service using DynamoDB and a two-tier caching architecture.
-
Event-Driven Patterns for Cloud-Native Banking: Lessons from What Works and What Hurts
Event-driven architecture helps banks decouple systems, scale services, and create clear activity trails. But it also introduces complexity, new failure modes, and operational challenges. Chris Tacey-Green explains where it adds value in banking systems and the practical patterns, such as inbox/outbox and stable event contracts, needed to make it reliable.
-
Configuration as a Control Plane: Designing for Safety and Reliability at Scale
Configuration has evolved from static deployment files into a live control plane that directly shapes system behavior. The evolution of configuration management highlights why misconfigurations can trigger large outages and how hyperscalers deploy changes safely using staged rollouts, validation, blast radius limits, and automated rollback at scale.
-
One Cache to Rule Them All: Handling Responses and In-Flight Requests with Durable Objects
Traditional caching fails to stop "thundering herds" where multiple clients trigger the same work during a miss. This article proposes using Cloudflare Durable Objects to treat in-flight work and finished results as two states of one cache entry. By routing to a single owner, systems eliminate redundant tasks. This pattern replaces complex locks with simple promises, simplifying the system design.
-
Scaling Cloud and Distributed Applications: Lessons and Strategies
The article shares goals and strategies for scaling cloud and distributed applications, focusing on lessons learned from cloud migration at Chase.com at JP Morgan Chase. The discussion centers on three primary goals and the strategies addressing the goals, concluding how these approaches were achieved in practice. For those managing large-scale systems, these lessons provide valuable guidance!
-
Engineering Principles for Building a Successful Cloud-Prem Solution
Discover how Cloud-Prem solutions combine cloud efficiency with on-premise control, meeting data sovereignty and compliance demands while optimizing operational costs and enhancing customer security.
-
Analyzing Apache Kafka Stretch Clusters: WAN Disruptions, Failure Scenarios, and DR Strategies
Proficient in analyzing the dynamics of Apache Kafka Stretch Clusters, I assess WAN disruptions and devise effective Disaster Recovery (DR) strategies. With deep expertise, I ensure high availability and data integrity across multi-region deployments. My insights optimize operational resilience, safeguarding vital services against service level agreement violations.
-
Designing Resilient Event-Driven Systems at Scale
Learn how to design resilient event-driven systems that scale. Explore key patterns like shuffle sharding and decoupling queues to handle load spikes and failures. Understand common pitfalls like over-relying on retries and neglecting observability for robust, scalable architectures.
-
Distributed Cloud Computing: Enhancing Privacy with AI-Driven Solutions
Distributed cloud, PETs, and AI enable secure, private data processing. This integration enhances collaboration, security, and compliance across marketing, finance, and healthcare, addressing the growing need for data protection.
-
Reactive Real-Time Notifications with SSE, Spring Boot, and Redis Pub/Sub
Explore the power of reactive programming for building scalable real-time notification systems. Using Spring Boot Reactive and Spring WebFlux, leverage non-blocking operations to handle high-volume, asynchronous data flows efficiently. Discover how Redis Pub/Sub enables event-driven messaging and how the SSE protocol provides persistent connections for instant client updates without polling.
-
Taking Advantage of Cell-Based Architectures to Build Resilient and Fault-Tolerant Systems
Cell-based architectures offer a robust approach to building resilient systems. They achieve this through the core principles of isolation, autonomy, and replication. Each cell manages its resources and makes decisions autonomously. Observability for cell-based architecture requires a tailored approach to address the unique challenges and opportunities presented by this distributed system design.
-
Article Series: Cell-Based Architectures: How to Build Scalable and Resilient Systems
In this article series, we take readers on a journey of discovery and provide a comprehensive overview and in-depth analysis of many key aspects of cell-based architectures, as well as practical advice for applying this approach to existing and new architectures.