Tags: #integrationPatterns #integrationArchitecture #reuse #reusability #event #soa #synchronous #complexity
Published: 06.05.2021
Last Updated: 06.05.2021
Naming conventions in the following text and its diagrams derive from Mulesoft, since this is the platform I currently work with in SOA model the most and can be replaced with different names as needed or as used in an organization implementing a SOA Integration Platform.
All information presented here is based on my own research and professional experience. I welcome all comments, constructive criticism and I'm open for discussion.
This article requires a basic understanding of Synchronous Integration Patterns in a Service Oriented Architecture. To have a better grasp on the topic you might want to read the article Synchronous Integration Patterns
One of the key aspects of SOA is reusability and creating the capacity to reuse when designing an integration service. This reusability can be done on multiple levels from code snippets to runtime and services. In this article I will focus around services and integration applications, as in how to reuse them in the context of SOA and Integration Patterns.
If we'd consider integration not as a whole platform, but rather just a single data exchange, it might not make any sense to put a mediator between the two systems that try to communicate. It does not seem profitable to do so. And this might be very true depending on the scale we are dealing with. If we try to integrate a few systems, with very few interactions in a small company, where those integrations have a low impact on business, well perhaps a clear microservices approach would be more appropriate. For a bit bigger company a broker type integration might be ideal. If our scale reaches interactions counted in hundreds - they might not be manageable anymore. Where is the border between point to point, broker type and SOA integrations? Well that is a matter of debate and our ability to calculate complexity of said integrations. Calculating this is a topic quite difficult and in my opinion it deserves a separate article at which I will attempt at another time, but I'll try to provide some examples to support the explanation of reusability. For now I'll focus mostly around the differences between Client-Server (CS) and SOA.
The complexity model for integration has many variables and it is very hard to calculate to be unified (lot's of matrices). So for the purpose of showing reusability I have made some simplifications and limited the number of variables. The boundaries for this article are as follows:
Variables:
Logic nodes - as in logic created to extract, transform, orchestrate and merge data, for Client-Server
Connections - maintaining configuration and invocation of connections between nodes (does not include extraction logic)
Scoring:
connection
scored as 1 point, unique to the level of host, port and possibly partial URI. Different resource URIs on the same application do not count as unique,
reusing an already counted connection (e.g. another SAPI consumer, another service in an EAPI consumed by the system it is assigned to) scores as 0 points,
logic node:
client systems:
1 point for each data extraction logic node - the need to learn the providers system API, call logic, language,
1 point for each merge/orchestration logic node - when a single p2p integration is not sufficient to serve the business needs and would create an aggregated service in SOA,
0 points for a client using an EAPI, since it's API is tailored to the consuming client and standardized so this mitigates the cost of building a complex data extraction and data model translation logic,
integration application:
1 point for each integration application logic implementation (EAPI, PAPI, SAPI),
0 points for a complete reuse of already established integration application logic implementation ,
0.5 point for establishing a new service in an already existing integration application, as in an extension of an existing logic node,
Different models of simple synchronous integration architecture, from top: client-server, broker architecture, service oriented architecture.
If we'd just look in the context of the diagram above, a simple point to point does not require an integration platform. This is very true, but only in a context of a single integration. If we'd consider the complexity of such integration, counting each connection and a node between system A and B, then complexity score would be as follows:
CS: 2, as it has a single connection and an additional logic node built in the client system,
SOA: 5, two logic nodes in two integration applications, three connections
SOA integrations in this comparison seem to be too complex, consuming too much resources and budget. This is the most common argument against integration platforms when one is to be introduced to a company that had point to point integrations so far. Unfortunately, but in favor of all integration platform models, creating a lot of point to point integrations leads to something called a spaghetti architecture, where the number of connections is not maintainable and a single change (e.g. source system changes an endpoint) can create a cascade failure.
Higher complexity comes with higher scale, so to reduce complexity we have to enforce reusability. Integration platforms design against SOA principles help reduce this complexity. So starting with the Experience API we can change how complex our landscape is.
Let's dive into a scenario. System A needs data from both system B and system C. Those two systems expose different resources/objects that do not correspond with each other and the data is needed in different contexts at different frequency and time, so it is not feasible to compose the two into a single service. In light of this, two point to point integrations are required.
Where is the reuse?
If we'd have to express this scenario as a client-server integration without an integration layer mediating between the systems, this would create two connections, where each connection consists of invoking a different interface, as in:
host,
port,
uri,
login/access credentials,
protocol (e.g. HTTP, JMS, JDBC)
data extraction logic (including a translation to a different data model)
So to invoke both systems the client would have to keep the endpoint configuration for both of them and also understand the complexity of data extraction (e.g. the right JDBC call, query parameters, uri parameters and so on). Also it would have to keep login/access credentials to each and every system it integrates to. If we look at this in a context of two point to point integrations, this might not be a lot and it would be quite maintainable. But what if we resize the number to thirty or a hundred integrations?
The reusability in this case comes with limiting the connections made by the client system (system A). Instead of a multitude of connection configurations and logic forced on the system it has to keep:
one host,
one port,
uri per service,
one login/access credentials (to EAPI only),
one protocol implementation,
no data extraction logic of provider systems, as it is tailored to the consumer and standardized
So if we expose services as REST or SOAP APIs, there is close to non extraction logic that needs to be implemented, as in the client system only has to parse the response from SOAP or JSON format to whatever format data is kept in said system. As you can see the reusability here focuses mostly around the configuration of the client system and simplifies it to an absolute minimum needed. This also changes how those systems connect on a network level and simplifies the network configuration for firewalls, IPSec and so on.
Additionally each new service in the EAPI reuses the application runtime, resources, common code or functions and configuration. This creates a partial reuse in the logic node.
A reuse of an Experience API, a single Experience API hosting to services exposed to the consumer.
Just as a thought exercise, let's try and score complexity here:
CS: 4
2 connections,
2 for logic nodes,
SOA: 8,5
5 connections, reduced 1 point for maintaining the connection between System A and its EAPI,
3.5 for logic nodes, 1.5 for EAPI (0.5 for the second and every next EAPI service), 2 for SAPIs,
Ok, so SOA still has a higher complexity. Will that still be the case for more p2p integrations, let's say two additional consumers:
CS: 8
4 connections,
4 for logic nodes,
SOA: 15,5
9 connections, 1 point counted for the connection between System A and its EAPI,
6,5 for logic nodes, 2,5 for EAPI, 4 for SAPIs,
Adding another three consumers to this integration would result in the following scores:
CS: 14
7 connections,
7 for logic nodes,
SOA: 26
15 connections, 1 point counted for the connection between System API and the corresponding provider system,
11 for logic nodes, 4 forEAPI, 7 for SAPI,
As you can see, with scale the percent difference between the two models decreases. With two integrations SOA seems to be at 212,5% of the complexity that CS offers, but with a higher number of connections it decreases. This can be described by the equation on the right, where "n" is the number of integrations required. Keep in mind this would be a true complexity only if no other elements of the integration platform were reused.
Now this might seem like an increase in complexity overall, but we do not build an integration platform to host p2p integrations, so other reusable parts need to be taken into account before we'd say this is an increase of complexity.
Just to summarize and have a more proper perspective let's just focus on the complexity the client system (System A) has to handle with the assumption it has 10 p2p integrations:
With CS integrations we have maintain per system integrated:
10 connection configuration,
10 data extraction logic and implementation,
Where with SOA we have to maintain per all systems integrated:
1 connection configuration,
a simplified data extraction logic and implementation, as the EAPI has a standardized interface tailored to the client system, enabling to incorporate data easily into the business process,
With this focus we can see that the complexity for the client system is greatly reduced with each single integration. It can be reduced further if we consider that it's not only p2p integrations we're having but the system is also invoking aggregated services, where their complicated logic of data transformation, orchestration and merging is hidden beneath the Experience Layer. More on this in the next chapters below.
Let's take a peek how reusability looks like with System APIs and how it can affect the complexity of our landscape.
Another scenario. System A and B both need data from system C. Those two systems have different business uses that do not correspond with each other directly and the data is needed in different contexts at different frequency and time. In light of this, two point to point integrations are required.
Where is the reuse?
If we'd have to express this scenario as a client-server integration without an integration layer mediating between the systems, this would create two connections, where each connection consists of invoking the same provider system interface, so the following connection configuration is duplicated with system A and B:
host,
port,
uri,
login/access credentials,
protocol (e.g. HTTP, JMS, JDBC)
data extraction logic
Both client systems would have to duplicate configuration and the complexity of the extraction logic, doing the same work twice and maintaining their configuration and implementation. It is worth noting that as the logic might seem to be duplicate, in fact it might contain small differences, often coincidental, that could lead to lowering the quality of data. Now if we look at this in the context of changes in the provider system, two systems to maintain a stable connection with and a good quality of data that's not much of a trouble. This might get more problematic as we scale out and have more client systems. Most of all if we make changes (e.g. extend or narrow the scope of data, change endpoint) there might be a problem with forcing the client systems to maintain a common understanding.
The reusability in this case comes with limiting the connections made to the provider system (system C). Instead of a multitude of connections coming in from a lot of systems it is limited to a single point of exposure by a System API that standardizes the communication. In this case the integration platform holds:
one host,
one port,
uri per service,
one login/access credentials,
one protocol implementation,
all data extraction logic of the provider system,
There is no duplication of implementation and configuration. The client systems are not forced to learn how to extract the data form the provider system instead they both use a dedicated Experience API that in turn reuses a standardised System API exposing system data/objects/resources.
A reuse of a System API, one data provider service used by two different consumers.
Again, as a thought exercise, let's try and score complexity here:
CS: 4
2 connections,
2 for logic nodes,
SOA: 8
5 connections, 1 point counted for the connection between System API and the corresponding provider system
3 integration applications, 2 EAPIs, 1 SAPI,
In this case SOA also has a higher complexity. Will that still be the case for more p2p integrations, let's add two more clients:
CS: 8
4 connections,
4 for logic nodes,
SOA: 14
9 connections, 1 point counted for the connection between System API and the corresponding provider system,
5 for logic nodes, 4 EAPIs, 1 SAPI,
In this case the reusability ratio starts at 250% and goes to the limit of 150%, a bit faster then with the Experience API since we're not building new logic in the SAPI. Please note again that this is just in the context of creating only p2p integrations from ground up. Same as before this might seem like an increase in complexity overall, so other reusable parts need to be taken into account before we'd say this is an increase of complexity, although based on evidence so far we can reach a conclusion that using a SOA based Integration Platform would be an overkill if we want to use is to support a few and p2p only integrations.
To summarize and have a more proper perspective let's focus on the complexity the provider (System A) has to handle with the assumption it has 10 p2p integrations:
With CS integrations we have maintain per system integrated:
10 login/access credentials,
10 configurations of privileges to access the right limited scope of data, (I assume no sane person will be giving an admin access to a system),
Where with SOA we have to maintain per all systems integrated:
1 login/access credentials for the integration platform
1 configuration of privileges to access the right limited scope of data,
In this case the complexity for the provider system is greatly reduced with each single integration.
At this point looking on the complexity reduction in both the client and the provider systems we can say that the complexity measurement is not an easy thing to do, as it has a lot of variations and requires a consideration of multiple variables.
One of the core strengths of a SOA platform is reducing the need for overlapping or duplicate logic in client systems, unifying this logic for all of the service consumers. As shown in the chapter above, this could be simply by hiding the complexity of data extraction, or as I'll discuss in this chapter, by creating a service logic on the integration platform. This is another situation where the System API can be reused.
Let's assume a scenario: System A requires data from System C, creating a p2p integration with it. System B on the other hand requires data from system D, but in the given business context of their business domain, this data is insufficient and has to be enriched by data from System C, the same data that is required by System A.
Where is the reuse?
To simply put it - it's exactly the same as in the chapter above. We reuse the connection to System C, by exposing the data in a System API and using it either in a p2p fashion or to compose a service that aggregates data from multiple sources and merges it into a response bounded by a business domain context.
A more complex scenario for a System API reuse. It is used as a standalone data provider for one consumer and is reused as a part of a service defined in a Process API.
System API, just like Experience API can host multiple data exposure services. The difference is, as EAPIs expose services from different providers aggregating them for a specific consumer, SAPIs aggregate all the exposure services for a system they are assigned to.
Our scenario here is that System C provides two different services, exposing two different data resources through it's System API. Two consumer systems (A and B) wish to consume the data resources accordingly (A wants resource exposed by servie 1, B wants the one with service 2).
Where is the reuse?
The reuse here is very similar to the one with the Experience API and focuses around the application runtime, resources, common code or functions and configuration. So as with the described above reuse of a single service in a SAPI, every next service implemented in the same SAPI app would reuse the connection configuration to the system whose data it exposes.
A quick look at the complexity for this case:
CS: 4
2 connections,
2 for logic nodes,
SOA: 8,5
5 connections, reduced 1 point for maintaining the connection between System A and its EAPI,
3.5 for logic nodes, 2 for EAPIs, 1.5 for SAPIs (0.5 for the second and every next SAPI service),
This would seem like exactly the same complexity as with the EAPI reuse. Just for an exercise, let's say two additional consumers:
CS: 8
4 connections,
4 for logic nodes,
SOA: 15,5
9 connections, 1 point counted for the connection between System A and its EAPI,
6,5 for logic nodes, 2,5 for EAPI, 4 for SAPIs,
So just like with EAPIs if applied alone it would lead to an increase of complexity to a ratio limit of 175%.
Since we already know that creating a one-sided reuse of APIs (only EAPI or only SAPI) increases complexity to 150% or higher, let's try to combine the two scenarios to assess what would happen if we'd create a larger reuse on both sides.
Let's assume a scenario: We start off with two EAPIs invoking the same service in a SAPI. We want to extend our integration platform to other services in the same and different systems and add new consumers to those.
Two services exposed to System A and both provided by System B. The applications hosting the APIs both on the Experience Layer and on the System Layer are being reused.
Ok let's try to get around the complexity score here. We already know the score for what we start with:
CS: 4 (2 for connections and 2 for logic nodes)
SOA: 8 (5 for connections and 3 for logic nodes - 2 EAPI and 1 SAPI)
Let's now add a second service to the already established SAPI. The change in the complexity score is as follows:
CS: +4
2 for connections,
2 for logic nodes in the consumer systems,
SOA: +1,5
0 for connections, since the connection configuration is already established to that SAPI
1.5 for the existing logic nodes extension (0,5 point per each integration application)
So we get a score of:
CS: 8
SOA: 9,5
We get a ratio change from 200% to 118,75%. That's already a good improvement.
Now let's add another SAPI with a third service and consume it with both of the EAPI:
CS: +4
2 for connections,
2 for logic nodes in consumer systems,
SOA: + 5
3 for connections,
2 for logic nodes (0,5 point per each EAPI extension and 1 for a new SAPI,
Now we have a score of:
CS: 12
SOA: 14,5
This gives us a change in the complexity ratio to 120,8(3)%.
Ok now let's add a fourth service and place it with the second SAPI for both EAPIs to consume it. The score increase will be just like with our first addition to the original scenario:
CS: +4
SOA: +1.5
Which sums up to a total of:
CS: 16
SOA: 16
As you can see even though at first the complexity was twice as high with SOA, with each reuse added it changed setting a falling trend for the complexity ratio.
Just for a final exercise on P2P communication reuse let's add a third EAPI, that will consume two services provided by the first SAPI and one provided by the second one:
CS: +6
3 for connections,
3 for logic nodes in consumer systems,
SOA: +5
3 for connections (1 for the consumer to EAPI and 2 from the EAPI to both SAPIs),
2 for logic nodes (1 for creating a new EAPI with 1 service and 0,5 per each following service implementation),
This brings us to a total of:
CS: 22
SOA: 21
This gives us a complexity ratio of 95,(45)% and at this point having a SOA based integration platform is more optimal than having client-server integrations without middleware and it's mediation.
Just by looking at those numbers we can derive at a conclusion that we will lower the complexity of our landscape by creating reusable services and by having more interactions with the same systems. So for example integrating single use case microservices this way would in fact create more unwanted complexity, unless they are logically grouped by an API Gateway to provide a single access point, as in SAPI per microservices group, not per microservice.
Moving on to a more common SOA scenario, let's go through the reusability of a Process API and how implementing a service over this integration layer will affect the complexity of the integration landscape.Let's set the scene: System A and B both need data a specific set of data. This data can be only provided by combining two resources, both mastered in different systems (C and D). Those two systems have different business uses that do not correspond with each other directly and the data is needed in different contexts at different frequency and time. Furthermore we want, as a principle, the data to have a consistent format and relationship between objects for all consumers. In light of this a Process API is created to govern data transformation, merger and call orchestration.
Where is the reuse?
The service provided by the Process API when translated to a client-server model would be represented by two separate connections to each of the systems and a logic implementation that would have to handle two different data extractions from provider systems and the merger of data. This way we'd end up with not only a duplicate configuration to call both providers in the consumer systems, duplicate privileges in provider systems, but also the need to duplicate the extraction and merger/transformation logic with every new consumer. As a result we'd gain a risk that data and the relationship between different entities could be understood differently with each next consumer added.
This would pose not only a maintenance risk as the configuration complexity would grow, but also the risk that the data we process would become inaccurate and the potential value to our business impaired.
This pattern also incorporates the reusability of client and provider system configuration as described in the previous chapters.
A reuse of a Process API by two different consumers.
Let's have another attempt at scoring complexity this time it might be more interesting:
CS: 10
4 connections,
6 logic nodes implemented in the client system, (adding one point per each client for the need to create logic merging the two responses into context bounded data)
SOA: 13
8 connections,
5 integration applications, 2 EAPIs, 1 PAPI, 2 SAPIs,
In this case SOA also has a higher complexity. Now let's try and add two clients to consume the service provided by the Process API:
CS: 20
8 connections,
12 implementations of logic in the client system,
SOA: 19
12 connections,
7 integration applications, 4 EAPIs, 1 PAPI, 2 SAPIs,
The complexity difference seems to fade quite quickly. Now if we'd add another three clients to the Process API service we'd get:
CS: 37
14 connections,
21 implementations of logic in the client system,
SOA: 28
18 connections,
10 integration applications, 7 EAPIs, 1 PAPI, 2 SAPIs,
As you can see, the complexity score ratio starts to be in favor of the SOA based Integration Platform. If we'd try to write this down as a set of equations it would be as stated in the picture to the right. So gradually with every client of the second integration, the complexity score ratio will go down to the limit of 60%. Having four clients of this single service makes the integration less complex than the p2p equivalent. A conclusion can be made out of this that it is feasible to create a SOA based Integration Platform when the complexity of our data and data sources requires additional logic to give business value.
Process APIs, apart from being reused by different consuming Experience APIs, on rare occasions a Process API service can be reused by another Process API service. Typically those services reside in the same bounded context of a business domain. At this point we have already covered all possible types of reusability. The key one here is the partial reuse of the Process API to host another service. This is more of an example of possibilities that are available using a SOA integration platform.
Unfortunately at this point I don't have a unified mathematical model to calculate complexity. All of my calculations were done by hand. Just as a thought experiment let's count the complexity score of this particular case.
CS: 12
5 for connections, two for the first service, three for the second service,
7 for logic nodes, three for the first service, four for the second service, where two points here are for the data merger/transformation logic in each client,
SOA: 14,5
8 connections,
6,5 integration applications, 2 EAPIs, 1,5 for the logic in PAPI, 3 SAPIs,
At a start the difference in complexity is not that big and lands at a ratio of 120,8(3)%.
Let's add a new consumer to Service 2 and see how this changes the complexity:
CS: +7
3 for connections ,
4 for logic nodes,
SOA: +3
2 for connections, consumer system to EAPI and EAPI to PAPI,
1 for logic nodes, the implementation of a new EAPI
This gives us a total of:
CS: 19
SOA: 17,5
The ratio here drops to 92,1%. Just one additional consumer of a complex service makes the complexity of the whole ecosystem lower with SOA then it is with CS.
As a different example let's add to the original setup an Experience API that reuses two of the already existing System APIs (Service 1 data provider from System C and a Service 2 data provider from System E). and score this:
CS: +4
2 for connections,
2 for logic nodes,
SOA: +4
3 for connections,
1 for logic nodes, implementation of the EAPI
So this gives us the same score increase. This actually works in favor of the SOA platform, since why would we implement a client-server integration if there is no complexity difference between it and SOA and we already have such platform established.
If you manage to get to this point I'd like to thank you for reading. Hopefully this article gave you some insight into the reusability of synchronous integration patterns of a SOA integration platform. As you probably noticed, complexity is a major factor alongside reusability. The higher the complexity of our landscape is, the more we can lower it by applying a SOA integration platform and in turn ease the burden on maintenance, future development and the flexibility of the business we're supporting.
Now just to summarize a few key conclusions (or TL;DR):
Creating a SOA based integration platform from ground up will most likely not have much reusability and increase complexity on start,
Single use case systems (e.g. microservices) or single, not reusable point to point integrations will increase complexity, so System API services should be designed to provide reusability for other systems without contextually limiting the scope of data objects they are exposing,
Reusability does not limit to the integration platform itself, it also affect the amount of coding and configuration required by client and provider systems, and also affects the network and security configuration in your IT landscape,
The more complex a service provided by the integration platform is, the faster we lower complexity of our IT landscape with every single client consuming the said service,
IT landscape complexity is a very difficult subject and it is quite hard to measure it in an organized manner. The variables I proposed in the article can and probably should be extended and the score per each variable should probably be weighted. The scores provided here serve only as a proof to the importance of reusability and are not an accurate measurement of complexity.
Spaghetti Architecture - a style of enterprise landscape where the overall complexity makes it very hard to maintain IT Systems and and the interactions between them. See: https://data-sleek.com/what-is-spaghetti-architecture-and-how-to-avoid-it/
SOA, Service Oriented Architecture - a style of Integration Platform Architecture focused on providing reusability and defined services for the IP client systems in an organised and manageable fashion. See: https://en.wikipedia.org/wiki/Service-oriented_architecture
CDM, Canonical Data Model - https://en.wikipedia.org/wiki/Canonical_model
BCM, Bounded Context Model - https://en.wikipedia.org/wiki/Domain-driven_design#Bounded_context
API-led connectivity, The next step in the evolution of SOA, whitepaper, https://www.mulesoft.com/ty/wp/api-led-connectivity
Beata Skrzymowska - my wife, for being patient with me and giving me space to write, :-)
Dominik Kruszewski - a long time collaegue, specialising in DevOps in the Integration world, for proof reading all that I write, asking the right questions and supporting my technical OCDs with some nice discussions.
Marcin Nowiński - a fellow Integration Architect specialising in helping clients choose the right tools for their needs and addressing integration complexity, for discussions on the logic of my reasoning and doing sanity checks on said logic. (Also great editorial pointers!)
Dear Reader!
Please leave a comment under the LinkedIn post for this article! I would really love to hear your thoughts on the topic and get some constructive feedback! I also welcome any ideas for future articles!