Overview
Achieving a resilient topology with fast recovery from link failures is a piece of cake with Cisco Catalyst SD-WAN. However, by default, it doesn’t protect against soft failures like packet loss and high latency or jitter. As business demands of optimised application performance continue to grow, we can use Application Aware Routing (AAR) to help us out.
AAR is a feature which can be deployed at multi-homed sites that tracks the performance of each WAN link and forwards your traffic of interest (typically your business critical apps) out of whichever WAN link meets a predefined Service Level Agreement (SLA).
Key components
- Bidirectional Forwarding Detection (BFD) – BFD is a lightweight protocol which enables the collection of performance data on a WAN link (TLOC).
- Service-Level Agreements (SLAs) – A configurable object used in an AAR policy defining acceptable packet loss, latency and jitter.
- AAR policy – Where traffic of interest is matched and assigned an SLA.
Benefits for AAR
- Enhanced application performance – Make sure that your most critical business applications always use the most optimal path.
- Protect against soft failures – Shift traffic away from soft failures such as unexpected packet loss on a link.
- Efficient use of available bandwidth – use high performance links for business relevant traffic and offload your guest to a lower grade circuit.
High level overview
Below is a simple topology with a router on each site with 2 WAN links (TLOCs) each. By default the TLOCs will mesh using IPsec creating 4 possible tunnels for site to site traffic. BFD runs inside each tunnel probing its performance every second by default.
Using the Manager, an AAR policy is configured directing traffic of interest over both TLOCs however, if a TLOC breaches SLA (for example due to high packet loss) subsequent packets will be shifted to TLOC that still meets SLA.

Note: This article discusses the traditional Centralised Policy approach to configuring AAR. The newer approach through Policy Groups uses the same concepts and can be easily understood once you understand this method.
AAR configuration process
- Identify traffic of interest through objects.
- Create SLA classes defining acceptable packet loss, latency and jitter.
- Create an AAR policy where traffic of interest is tied to an SLA.
- Attach the AAR policy to a centralised policy.
BFD
It’s vital to understand how BFD works before we proceed as it’s the backbone of AAR. BFD is a very lightweight protocol which runs across every tunnel within the SD-WAN fabric. By default it sends probes to the remote router every 1 second and uses the reply to gauge packet loss, latency and jitter. The BFD packet is minimally processed compared to a normal packet at the remote end ensuring CPU and memory bottlenecks don’t skew the readings.
Below shows a router with 2 TLOCs meshing to 5 other routers. Each TLOC sends BFD probes across all of its IPSec tunnels.

All measurements are used to calculate the average packet loss, latency and jitter. The default calculation process is as follows:
- BFD probes are sent every 1 second. This is called the BFD hello interval.
- AAR takes 10 minutes worth of BFD probes (600) and places them into a ‘bucket’. This is called the Polling interval.
- AAR takes 6 buckets (indexed 0 to 5) and calculates the average packet loss, latency and jitter. The number of buckets taken into account is called the Multipler.
- When a new bucket has filled up with 600 probes its inserted into index 0 and pushes the rest out which removes the oldest measurements from index 5. This is called a sliding window
By default the router takes 60 minutes worth of BFD measurements into consideration when calculating the values. This means it can take 10 – 60 minutes depending on the severity of a soft fault before AAR takes any action. As per Ciscos recommendations, you can lower the polling interval to 2 minutes and the multiplier to 5 at which point that range shrinks to 2 – 10 minutes but you risk false positives. There is also an option to enable ‘Enhanced Application Aware Routing (EAAR)’ which will be explained at the end.
You can check the AAR measurements on your TLOCs as follows:
show sdwan app-route stats summary

Use this command to see your active BFD sessions:
show sdwan bfd sessions

SLA Classes
SLA Classes are configured in the Manager that specify the maximum acceptable packet loss, latency and jitter. Inside the AAR policy matched traffic is assigned against these objects.

SLA classes have 2 more options called the App Probe Class and the variance. We’ll discuss these further below.
Topology
We’ll be working with the following topology:

We have two sites with a dual homed router on each. One TLOC connects to biz-internet and another to public-internet colors. As we’re keeping everything default, each TLOC on a site will fully mesh with every available TLOC on another site resulting in 4 IPSec tunnels in total which also run BFD inside.
All 4 paths are available for traffic forwarding between sites which can be seen by using the ‘service-path’ command on one of the routers. Make sure to use the ‘all’ keyword at the end or you’ll only see one path in the output.
show sdwan policy service-path vpn 20 interface GigabitEthernet4.20 source-ip 10.1.20.10 dest-ip 10.2.20.10 protocol 1 all

You can also see this on the Manager if you go to Monitor > Devices > site1-edge1 > Troubleshooting > Simulate Flows

This is a virtual LAB inside Cisco Modelling Labs meaning the links perform pretty much identically. I’m going to add 200ms latency to one of the links to simulate some kind of fault. If you have Cisco CML you can do this by clicking on the link and applying latency.

Instantly I can see the effects on my ping from site 1 to site 2. Lets create a policy which tells my router not to use a link over 100ms latency.

Configuring the policy
As we’re starting from scratch we first need to create a centralised policy.
1. In the Manager GUI go to Configuration > Policies > Centralised Policy and click ‘Add Policy’. In the ‘Groups of interest’ section I defined and SLA Class, site and VPN objects.



2. Click ‘Next’, ‘Next’ and the ‘Application Aware Routing’ tab. Click ‘Add Policy’ and ‘Create New’. Name it and give it a description.
3. Click the ‘+ Sequence Type’ button and you should see a new ‘App Route’ section appear. Click ‘+ Sequence Rule’.

4. The ‘Match’ section is where you match on your traffic of interest. I’m testing with ICMP so I’ll select ‘Protocol’ and write 1 in the text field (protocol 1 is ICMP).
5. Click ‘Actions’ and ‘SLA Class List’. Select the SLA Class you created earlier. My one will breach above 100ms and one of my links is running at 200ms so I expect AAR not to use that link.

We’ve left all other options as default meaning we’re telling the router to use any TLOC which meets the SLA. You can be more specific if you wish. Here is a brief summary of what those options mean:
- Preferred Color – If you specify a TLOC color here, even if multiple links meet SLA, only that color will be used until SLA is breached.
- Preferred Color Group – A TLOC color group object allowing you to select primary, secondary and tertiary colors for forwarding.
- Remote Preferred Color – Forces the local router to forward traffic over to the specified remote color if in SLA.
- When SLA not met – What happens when a TLOC doesn’t meet SLA.
- Strict/Drop – drop all traffic.
- Fallback to best path – Falls back to the next best available TLOC.
- Load balance – Load balance traffic across all available TLOCs regardless of their performance metrics.
Some other options in the ‘Actions’ section available to us are:
- Backup SLA Preferred Color – If a preferred color breaches SLA, traffic will be sent over this TLOC. If this TLOC is down, the next best one is used.
- Counter – Creates a simple counter to see if the policy rule is getting hits. Use ‘show sdwan policy app-route-policy-filter’ to see via CLI.
- Log – Logs to syslog when policy rule is hit.
- Cloud SLA – Used with Cisco SD-WAN Cloud OnRamp and not covered here.
6. Click ‘Save Match and Actions’, ‘Save Application Aware Routing Policy’, ‘Next’, name the policy and click the ‘Application-Aware Routing’ tab. Specify which site and VPN this policy should apply to, click ‘Add’ and ‘Save Policy’.

7. Activate the centralised policy by clicking ‘…’ and ‘Activate’.

You can now preview your policy in CLI before pushing it.

Click ‘Activate’ when happy. You should get a ‘Success’ status on the task screen.
Outcome
As soon as the policy activated, ICMP shifted from the 200ms circuit to the next best available one.

We can now see that the service-path command only lists 2 paths instead of 4.

After returning latency to normal I could still see that service-path was only showing 2 forwarding paths. That’s because if you remember, by default, AAR takes 60 minutes worth of data into account when calculating averages. After 10 minutes, I was able to see the first ‘bucket’ (index 0) update with the lower latency but the mean-latency value for the public-internet TLOC is still 334 which is above the 100ms in the SLA. I will not see the router use all 4 paths again until the mean-latency drops below that value.

Verification
To verify your router has received the AAR policy use the following command:
show sdwan policy from-vsmart app-route-policy
View the SLA classes existing on the router.
show sdwan policy from-vsmart sla-class
show sdwan app-route sla-class
Check the TLOC AAR statistics
show sdwan app-route stats summary
Show AAR policy counters
show sdwan policy app-route-policy-filter
AAR tunnel selection flowchart
Cisco documentation has a useful flowchart to help visualise the tunnel selection process.

How AAR interacts with other policies
For this we simply need to understand the order of operations which can be found in Cisco documentation.

You can see that for inbound traffic from the service VPN (LAN), traffic is first evaluated against any localised policies (ACLs) on the interface level. Only if traffic wasn’t dropped will the AAR policy be evaluated. Any other step after number 2 can overwrite the AAR actions.
Note that AAR only works for inbound traffic from service VPN to WAN. It doesn’t have any effect on traffic in the other direction.
App Probe Class
BFD is always sent with DSCP 48 meaning if QoS is configured anywhere along the path of the packet it will likely receive preferential treatment. That’s great if the router is trying to judge the overall health of the link but not so great when it needs to know how a specific application will be treated.
In an attempt to solve that issue, an App Probe Class can be configured inside the SLA Class which tells the router what DSCP value and forwarding class to use for BFD packets.

SLA Class fallback and variance
If you remember in the ‘Configuring the policy’ section we had an option for ‘Fallback to best path’ under the ‘SLA Class List’ action. I explained that if you decided to specify a preferred color and it breaches the SLA, the router will use the next best TLOC to forward your traffic.
It does that by looking at the packet loss, latency and jitter of other eligible links and picks the best performing one. An issue can arise when the best link changes frequently causing your data to flap between different paths.
This is where variance comes in. Under the SLA Class you have an option to tick ‘Fallback Best Tunnel’ which unlocks the ‘Criteria’ and ‘Variance’ fields. Below you can see I’ve enabled variance for packet loss to be 2%. What this means is that if my preferred color breaches SLA, the router will pick the next best TLOC as well as any other TLOC that has up to 2% more packet loss than the chosen one.

For example.
Lets say I have 3 links as per below with biz-internet being my preferred color and evaluated against the SLA Class above:
- biz-internet: 3% loss, 50ms latency, 2ms jitter.
- public-internet: 7% loss, 100ms latency, 10ms jitter.
- blue: 10% loss, 150ms latency, 20ms jitter.
Under normal circumstances, biz-internet is chosen to forward traffic. Suddenly, it begins experiencing 6% packet loss causing it breach SLA. The next best performing link is chosen which is link number 2. As per the loss variance setting, any TLOC with up to 2% more packet loss (9% total) than the current best link (2) is going to be used for forwarding. The third link has 10% packet loss which is 1 more than the variance will allow for hence is not used.
You can configure variance in the following combinations:

Enhanced Application Aware Routing (EAAR)
Earlier on I explained the time it can take for AAR to take any action is 10 – 60 minutes by default and 2 – 10 minutes with the fastest settings as recommended by Cisco. You can verify the polling interval, multiplier and whether EAAR is enabled with the following:
show sdwan app-route params

The poll interval is displayed in ms which translates to 10 minutes so the above are default timers with EAAR disabled.
I won’t go too much into it here but Cisco SD-WAN routers also monitor Quality of Experience (QoE) for flows going through them which means they also measure packet loss, latency and jitter passively by utilising a ‘metadata’ header. EAAR improves the whole process by taking that data into account instead of BFD when calculating TLOC performance allowing for faster reaction times due to there being more data to calculate with.
A dampening mechanism is also introduced which states how long a link should not be in use after it starts meeting SLA again which deals with the traffic path flapping issue.
EAAR can run in one of three modes:
Mode | EAAR Poll Interval | EAAR Poll Multiplier | EAAR Poll Window | SLA Dampening Multiplier | SLA Dampening Window |
---|---|---|---|---|---|
Aggressive | 10s | 6 | 10s – 60s | 120 | 20 mins |
Moderate | 60s | 5 | 60s – 300s | 40 | 40 mins |
Conservative | 300s | 6 | 300s – 1800s | 12 | 60 mins |
If the ‘Moderate’ mode is chosen, a link takes between 60s – 300s to react to SLA changes. The EAAR Poll Interval is multiplied by the SLA Dampening Multiplier to get the SLA Dampening Window.
Routers at either end of the tunnels needs to be configured to use EAAR. If one isn’t or doesn’t support it, they fall back to AAR.
To enable EAAR in moderate mode using configuration groups go to your system profile and edit the ‘Basic’ sub-feature.

For feature templates you can find the option in the ‘Cisco System’ basic feature template.

For a CLI template add the following:
bfd enhanced-app-route enable
bfd enhanced-app-route pfr-poll-interval 60
bfd enhanced-app-route pfr-multiplier 5
bfd sla-dampening enable
bfd sla-dampening multiplier 40
We can now see EAAR is enabled.

Conclusions
AAR is something every network admin/engineer should investigate if they are already running SD-WAN. For those who aren’t, maybe it’s a great opportunity to pitch the solution to your CFO.