I spent last week hacking on Event Driven Architecture with some very smart people. We came across a problem, and I thought maybe somebody has already seen this and solved it! Comments are welcome.
We were using the following pattern. We have n existing systems. Lets call them L1...Ln. Each of these systems is basically not "touchable" - we can only call the existing interfaces. Each system offers something similar to the following interface:
* update(resource r)
which allows you to update some object, resource or thing.
We plan to use this interface to update all the systems using events to keep them in sync. Each system can subscribe to updates and keep in sync.
This works fine if the model is master-slave. In other words, there is a single master system that publishes events to all the slaves. But in reality, that isn't how most architectures work. Each system may have updates which need to be propogated to the others.
To make this work, each system allows you to be notified when any updates happen The problem is that you cannot distinguish the root cause of the update. Some updates come from within the existing system, and of course some updates come through the external interface.
So imagine I make a change to Resource R1 in system L1. An event (E1) is issued saying there is an update to R1. This is distributed to L2 and L3. These systems in turn update their core storage in response to event E1. Now, these systems issue events (E2 and E3) that reflect the updates to those systems. Ideally we would junk these updates since they are just "echoes" of E1. Without doing that, the system will start to resonate, and feedback will takeover, just like in a badly setup PA.
Spotting the echoes is actually pretty hard to do: we cannot distinguish between these echoes and E1.
Ideally we would have a correlation between E1 and E2, but since the existing system can't be changed, we can't do this.
Suppose we try to do by comparing E1 and E2. The problem is there may be different data in E1 and E2. Maybe system L2 adds an "last updated" timestamp. This will mean E2 is slightly different to E1.
What about using timing. Just drop all events that come back referencing R1 for the next 10 seconds after E1 is delivered? What if someone edits the resource inside L2 and there is a genuine new event (E4) as well as the echo? The real event gets dropped.
Is there any solution? The only one we found is that we have a "Master System" (M). All updates from L1...Ln must go through M. So L2 only hears about L1's events via M. And M keeps track of the master data, and M knows what information is important and what is transient (like the aforementioned Timestamp).
Every time an update comes to the Master, it checks against its own store. If there is a genuine update (the important data for the resource has changed) then it distributes it to everyone (except maybe the source of the update). If the important data matches the existing state, it drops the message. Think of it as a feedback supressor.
So in our case, E1 would get past (genuine update), but E2 and E3 would be dropped. But E4 would be distributed because it contains real updates to the state of R1.
This is a good pattern. It means we always can get the master state. But its also considerably more complex with more moving parts than the standard view of an Event Driven Architecture. You need to have twice as many topics for example (because you have to have one topic for messages to flow up to the Master and a separate topic for messages to flow out back to the Ln systems.
Secondly we need a complete copy of the state (or at least a cache of recent state) PLUS we need to know the shape of the data - what is important and what is transient.
All in all, this is an interesting issue when you try to bridge between a legacy system and an event driven architecture. If anyone has a better solution, please let me know!