Retrace platform update to alerts and notificaitons

Retrace Platform Major Update: Alerts and Notifications

Jason Taylor Stackify Product & Company Updates Leave a Comment

UPDATE: The previously scheduled maintenance has been rescheduled for February 14, 2018, 10:00am CST – 2:00pm CST to account for additional changes and some scheduling conflict with other maintenance and releases.”

Stackify will be releasing a major update to a subsystem of the Retrace platform that impacts how alerts are processed through the system and users are notified of those alerts.

The aim of this release is partly to improve system performance and resolve some outstanding defects. Most importantly though we incorporated a tremendous amount of valuable feedback we have received from our users on how to make this part of the Retrace platform a better overall experience.

Release Schedule

February 7 February 14, 2018. 10:00 am CST – 2:00 pm CST. Please subscribe to updates at http://stackify.statuspage.io/ to receive real time updates of this schedule.

Release Impact

Any open alerts at the time of this release will automatically be closed and will re-open according to their configured rules (e.g. ‘Warning’ when CPU > 70% after 10 minutes) beginning immediately after the update. If the open alert was subject to a “Snooze,” “Acknowledge,” or “Maintenance Window” no notifications will be re-sent as configured. Otherwise, if the monitor is still in an alert state, notifications will be re-sent as configured. This “duplicate” notification will be a one-time event.

During this time period, for critical monitoring, we recommend our users log in to the Retrace portal and watch for alerted monitors, since notifications may be delayed or suspended during this maintenance window. This includes all forms of notifications including email, sms, and Slack integration.

What’s Changed?

Alert Details

The new Alert detail view better illustrates:

  • The transitions that a particular monitor passed through while it was alerting
  • The scope of the alert
  • What applications and devices were impacted
  • When notifications were sent

Retrace new alerts details

Alert Detail Features

  1. Current Status across all associated applications and devices
  2. Timeline view of alert status
    1. Snooze/Maintenance window visualization
    2. Notification indicator
    3. Acknowledgement indicator

The timeline visualization is a composite view of the alert status across all associated applications and devices. The highest level of severity is displayed in the graph. The dark vertical lines indicate a status transition on one or more associated applications or devices.

Associations

If the monitor being displayed is associated with multiple items, the association’s section will be present at the bottom of the dialog. Expanding the section displays everywhere the monitor is being used and allows the user to navigate to that association.

If the monitor being displayed is associated with multiple items, the associations section will be present at the bottom of the dialog. Expanding the section displays everywhere the monitor is being used and allows the user to navigate to that association.

This example illustrates that applications can set independent thresholds for the same monitor.

Historical alert data has been archived and will continue to be accessible from a secondary view via a link on the alerts history page.

Notification Formats

Email and sms notifications have been revised to provide clearer content and more actionable details about what is alerting and what the impacts are.

Email and sms notifications have been revised to provide clearer content and more actionable details about what is alerting and what the impacts are.

Email details now include

  • Name of affected resource
  • Monitor type
  • Link to alert detail in the Stackify portal
  • Sparkline showing recent history of monitor
  • List of all associated resources

Notification Groups

Retrace notifications report: QA Alerts

Notification groups now support

  1. Contact specific message delivery preferences for each notification type
  2. Slack integration at a more granular level
  3. Slack integration @channel messages

Slack Integration

Slack webhook integration can now be included in (or removed from) individual notification groups, rather than set at a “global” level. For customers who currently have Slack integration enabled, Slack will be added to all Notification Groups at the time of this update to preserve existing behavior. For customers enabling Slack integration in the future, Slack is not a default member of their Notification Groups and should be added to individual groups to begin receiving Slack notifications.

Configuring Alerts

Alert configuration rules have two parts, a value and a duration threshold.

  • The value threshold determines the point in which beyond or below is abnormal.
    (e.g., X > Y, where Y is the value threshold)

The duration threshold determines how long the value threshold must be exceeded for a certain alert to be raised.

Currently, Retrace allows two methods to set the duration threshold:

  1. As a unit of time (e.g. 5 minutes continuously)
  2. As a number of repetitions for a result (e.g. 5 times consecutively).

Stackify has determined that the result-count method isn’t as reliable or accurate as the time-based duration threshold. Since any result-count rule could also be written as a time-based rule, Stackify will be removing the result-count method for determining duration.

Since the result-count duration thresholds will no longer be supported all existing configurations of this type will be automatically converted to time-based thresholds based on the current frequency of the monitor configuration.

For example:

“Check CPU every 2 minutes, alert as ‘Warning’ if value is over 70% for 5 consecutive checks.”

Will become:

“Check CPU every 2 minutes, alert as ‘Warning’ if value is over 70% for 10 minutes.”

Recap

Stackify has made every effort to minimize customer impact during the deployment of this release. There is no expected downtime for the Retrace portal and all client systems will continue to be monitored. It is important to note that notifications may be delayed or suspended during this maintenance window. During the maintenance window customers are encouraged to take a more proactive approach to system monitoring by viewing alerts in the Retrace Portal.

We appreciate your business and sincerely hope that this update will be well received and enhance your ability to produce and maintain high quality software systems.


Free Download