Considerations When Retrying Failed Messages in BizTalk or the ESB Toolkit

I was doing some research lately into a publish/subscribe scenario and it made me think of a “gotcha” that folks may not think about when building this type of messaging solution.

Specifically, what are the implications of resubmitted a failed transmission to a particular subscriber in a publish/subscribe scenario?  For demonstration purposes, let’s say I’ve got a schema defining a purchase request for a stock.

2010.01.18pubsub01

Now let’s say that this is NOT an idempotent message and the subscriber only expects a single delivery.  If I happen to send the above message twice, then 400 shares would get bought.  So, we need a guaranteed-once delivery.  Let’s also assume that we have multiple subscribers of this data who all do different things.  In this demonstration, I have a single receive port/location which picks up this message, and two send ports which both subscribe on the message type and transmit the data to different locations.

2010.01.18pubsub02

As you’d expect, if I drop a single file in, I get two files out.  Now what if the first send port fails for whatever reason?  If I change the endpoint address to something invalid, the first port will fail, and the second will proceed as normal.

2010.01.18pubsub03

You can see that this suspension is directly associated with a particular send port, so resuming this failed message (after correcting the invalid endpoint address) should ONLY target the failed send port, and not put the message in a position to ALSO be processed by the previously-successful send port.  This is verified in the scenario above.

So all is good.  BUT what happens if you leverage an external system to facilitate the repair and resubmit of failed messages?  This could be a SharePoint solution, custom application or the ESB Toolkit.  Let’s use the ESB Toolkit here.  I went into each send port and checked the Enable routing for failed messages box.  This will result in port failures being published back to the bus where the ESB Toolkit “catch all” exception send port will pick it up.

2010.01.18pubsub04

Before testing this out, make sure you have an HTTP receive location set up.  We’ll be using this to send message back from the ESB portal to BizTalk for reprocessing.  I hadn’t set up an HTTP receive location yet on my IIS 7 box and found the instructions here (I used an HTTP receive location instead of the ESB on-ramps because I saw the same ESB Toolkit bug mentioned here).

So once again, I changed a send port’s address to something invalid and published a message to BizTalk.  One message succeeded, one failed and there were no suspended messages because I had the failed message routing turned on.  When I visit my ESB Toolkit Management Portal I can see the failed message in all its glory.

2010.01.18pubsub05

Clicking on the error drills into the details. From here I can view the message, click Edit and choose to resubmit it back to BizTalk.

2010.01.18pubsub06

This message comes back into BizTalk with no previous context or destination target.  Rather, it’s as if I’m dropping this message into BizTalk for the first time.  This means that ALL subscribers (in my scenario here) will get the message again and cause unintended side effects.

This is a case you may not think of when working primarily in point-to-point solutions.  How do you get around it?  A few ways I can think of:

  • Build your messages and services to be idempotent.  Who cares if a message comes once or ten times?  Ideally there is a single identifier in each message that can indicate a message is a duplicate, or, the message itself is formatted in a way which is immune to retries.  For instance, instead of the message saying to buy 200 shares, we could have fields with a “before amount” of 800 and “after amount” of 1000.
  • Transform messages at the send port to destination specific formats.  If each send port transforms the message to a destination format, then we could repair and resubmit it and only send ports looking for either the canonical format OR the destination format would pick it up.
  • Have indicators in the message to indicate targets/retries and filter those out of send ports.  We could add routing instructions to a message that specified a target system and have filters in send ports so only ports listening for that target pick up a message.  The ESB Toolkit lets us edit the message itself before resubmitting it, so we could have a field called “target” and manually populate which send port the message should aim for.

So there you go.  When working solely within BizTalk for messaging exceptions, the fact of using pub/sub or not shouldn’t matter.  But, if you leverage error handling orchestrations or completely external exception management systems, you need to take into account the side effects of resubmitted messages that could reach multiple subscribers.

Share

Author: Richard Seroter

Richard Seroter is currently the Chief Evangelist at Google Cloud and leads the Developer Relations program. He’s also an instructor at Pluralsight, a frequent public speaker, the author of multiple books on software design and development, and a former InfoQ.com editor plus former 12-time Microsoft MVP for cloud. As Chief Evangelist at Google Cloud, Richard leads the team of developer advocates, developer engineers, outbound product managers, and technical writers who ensure that people find, use, and enjoy Google Cloud. Richard maintains a regularly updated blog on topics of architecture and solution design and can be found on Twitter as @rseroter.

9 thoughts

  1. Hello Richard,

    First of all, congratulations for “SOA Patterns with BizTalk Server 2009”, it’s a very good book, I’ve learned a lot. Thanks.

    Your post is exactly on issue I’m working at the moment.

    In my basic test case, “Ordered delivery” and “Stop sending subsequent messages on current message failure” options are activated and only one send port (FILE Adapter) is subscribing to messages.

    In order to emulate a failure, I wrote a wrong destination folder on send port. When a first message (called Message#1) comes into MessageBox, Send port service is suspended (due to folder problem) and message appears on ESB Toolkit. Good.

    If Message#2, Message#3,…Message#N arrive, they are put on suspended queue and by consequence don’t appear on ESB Portal. Good, activated options work.

    If I write a right folder and resubmit Message#1 from ESB Portal, It will be put at the last position of the suspended queue. That’s not good for me, It must be the first message to be sent.

    So How resubmit Message#1 in order to send it at the send port before all suspended Messages and deblock Message#2, Message#3,…, Message#N ?

    I experienced that manually resume send port service, It sends all messages to the send port so that ordered delivery is broken.

    So, for me, a solution will be to resubmit Message#1 in a duplicate send port (in order to sent Message#1 before all suspended messages), and resume send port service via WMI instructions.

    What do you think about it ? Is there any other way ?

    Thanks & Regards,
    Christophe

  2. I once had the same problem as you faced. But i solved it with a envelope (set of promoted context properties) that states where it came from, where it should go etc. Then when i receive a message i run it through a distributor pipeline. This pipeline knows how many subscribers there are and creates copies (disassembler stage) of the message each with different envelope values.

    In my architecture subscriptions are all based on this custom envelope. So once it is past the receive pipeline i will get 2 or 6 messages each identical but with different context properties.

    Now if a send failes i have got my own fault portal that knows how to persist the special envelope and the message. So when i turn on Failed Message routing the message is send to the fault portal.

    Then i can edit/resubmit the message. A designated receive pipeline will pick up the message from the FaultDB and will restore the envelope (restore context and promote what was promoted) and dump the message in the message box.

    Then the original orchestration or send port will pick it up again and try again.

    Works very well……

  3. Thanks Patrick for your solution.

    I didn’t understand how Patrick deals with “Stop sending subsequent messsages on current message failure” option : how suspended messages are processed after the first suspended message (the only one sent to ESB Portal) is resubmitted ?

    I really need not to break ordered delivery.
    Patrick’s solution needs custom code (pipeline), I do want explore all BizTalk options.

    My proposed solution is not so good : each port needs a copy (an another port) that will process the next message to be treated (before all suspended messages). What is your opinion ? Have you got a better idea.

    Thanks.

    1. Well i would first store the message in a DB then i would have a sproc have pick up the next message for delivery. The delivery of the message would start. If delivery fails the message would go to the faultportal. Where you can edit & resubmit the message. If delivery was successfull i would write some status field in the db causing the sproc in the db to outout the next message.

      I know there is some more DB stuff involved with this approach but it doesn’t require long running orchestrations. You avoid convoys (and the persistence problems they can cause). And it’s easy to restart an already processed batch. So there are some advantages as well.

      I also know that using orchestrations puts a load on the db-tier of BizTalk, nbut in my opinion the db-tier should be able to accomodate the extra load caused by orchestrations. (we are talking about a serious DB-Tier i hope)

  4. Hello,

    Resubmit message is possible only using http adapter ? can we resubmit message with others adapter like File or SAP ???

    Thanks for help.
    Regards,
    Tarik

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.