Architecture Styles and Design Patterns
What is an Architecture Style?
An Architecture Style is a coarse-grain pattern that provides an abstract framework for a family of systems.
There are four main architectural styles of Data Processing: Batch, Real-time, Event-driven, and Streaming.
Batch
Batch processing is a method of running high-volume, repetitive data jobs during a specified window. For data processing, tools that have the capability to perform this style of processing are commonly known as Data Integration tools, ETL tools (Extract, Transform, Load) or ELT tools (Extract, Load, Transform, or SQL "push-down"). However, modern tools such as Talend Data Fabric go far beyond these basic capabilities by adding Data Governance capabilities, as well as the ability to implement all of the architectural styles, not just batch.
- Latency tolerant
- Complex transformations
- Massive volumes
- Code-less specifications
- Metadata reuse
One of the classic design patterns that Batch Data Integration is commonly used to implement is a Data Warehouse for business analytics and reporting:
Real Time
Wikipedia describes Real-time computing as:
Real-time computing (RTC), or reactive computing is the computer science term for hardware and software systems subject to a "real-time constraint", for example from event to system response. Real-time programs must guarantee response within specified time constraints, often referred to as "deadlines".
Real-time processing is a near instantaneous response to an action or event. Most mission critical applications are real-time.
- API Services
- Creation of Data Services - SOAP or REST services implemented in Talend Studio using the same palette of components used to create Batch Jobs.
- Routes - Graphically design Camel Routes in Talend Studio to implement SOAP or REST services
- Deployment to Talend Runtime, as a Microservice or as a Microservice within a Container
- Logging and Monitoring
- Continuous Integration \ Deployment
Streaming
Wikipedia describes Streaming as follows:
Streaming data is data that is continuously generated by different sources. Such data should be processed incrementally using Stream Processing techniques without having access to all of the data. In addition, it should be considered that concept drift may happen in the data which means that the properties of the stream may change over time. It is usually used in the context of big data in which it is generated by many different sources at high speed
For more information on Streaming and its use cases, see What is Streaming data?.
- Low Latency
- Simple transformations and Aggregations
- Small batches often known as 'micro-batching'
- Fault Tolerant
- Minimum risk of data loss
- Sliding Window Capability
The below diagram shows how data streams are processed by a Spark Engine in the form of micro batches.
Event-driven
Wikipedia defines an Event-driven architecture as follows:
Event-driven architecture (EDA) is a software architecture paradigm promoting the production, detection, consumption of, and reaction to events.
An event can be defined as "a significant change in state". For example, when a consumer purchases a car, the car's state changes from "for sale" to "sold". A car dealer's system architecture may treat this state change as an event whose occurrence can be made known to other applications within the architecture. From a formal perspective, what is produced, published, propagated, detected or consumed is a (typically asynchronous) message called the event notification, and not the event itself, which is the state change that triggered the message emission. Events do not travel, they just occur. However, the term event is often used metonymically to denote the notification message itself, which may lead to some confusion. This is due to Event-Driven architectures often being designed atop message-driven architectures, where such communication pattern requires one of the inputs to be text-only, the message, to differentiate how each communication should be handled.
As described, this style (especially for data processing) is most commonly associated with the usage of message-driven architectures. However, other examples that could be implemented with Talend include a Route polling for a file on an FTP server and processing those files when the upload to the FTP server is complete and using a web service to instantiate an asynchronous process - that is, the web service does not wait for the process to complete before responding to its client.
- Message based
- Ensured delivery
- Restart/recovery
- Transaction oriented
The below diagram shows an enterprise bus where messages are published to topics and read by the subscribers.
What is a Design Pattern?
A Design Pattern is a general reusable solution to a commonly occurring problem within a given context.
- Represent field-tested solution to common design problems
- Are generally repeatable by most IT professionals involved with design
- Can be used to ensure consistency in how systems are designed and built
- Can become the basis for design standards for our Job Designs, Routes, etc.
Architecture Style | Design Pattern | Examples |
---|---|---|
Batch |
|
Loading a Data warehouse at defined frequency, Daily Incremental Data Load, FTP Transfers, Data Replication etc. |
Real Time |
|
Salesforce updates, Reading Queues, Enterprise Service Bus message reads, API services for Integration |
Streaming |
|
Leader boards, Tweeter streams, Live Streams |
Event Driven |
|
Sensor data on event occurrence, Workflow events, File Triggers |