Incident management is a critical area of not just IT Service Management, but for business as a whole. Incidents cause delays, impact customer experience, have a direct correlation to business productivity and can even impact the financial and reputational status of a business. Even the pettiest of paper jams can hold up a project.
With an increase in IT security breaches and downtime making global headlines, the role of incident management and the need for proactive processes that reduce Mean Time to Repair (MTTR) are growing in importance.
The media attention for cyber-attacks such as the Gmail breach, WannaCry’s impact on the NHS and most recently British Airway’s global IT failure has encouraged many senior managers to prioritise incident response and prevention as high as C-level.
Although slightly frustrating for IT teams who have been emphasising the importance of investing in major incident management and cyber security, this is a promising trend in priorities. IT departments are already juggling implementing new innovations with keeping the lights on, so an increased focus on resources and attention for security will be well received.
Incidents can have catastrophic effects for multiple areas of an organisation. With financial and reputational risk to suppliers, stakeholders, customers and colleagues alike, an effective and proactive response to reduce business downtime is crucial.
During Critical IT Events, action is key. Downtime, MTTR, response time, recovery speed; time is a key component to improved incident response and service quality from both an internal and customer perspective.
One of the biggest problems with incident management is the threat of delays to this recovery. Slow responses from teams, delayed data that’s now outdated, difficult communication between those pinpointing the Root Cause Analysis and those dealing with stakeholders etc. The longer action is delayed, the higher the risk of business impact and loss.
Frankly, it is often better to take slightly longer ensuring the correct information and action is taken to avoid further damage. However, there are several delays that occur in incident management workflows that can cause unnecessary and significant pressure to time-sensitive processes.
There are three key areas that need clear and concise focus in order to limit the impact of an incident: Data, Communication and Action. It is these avoidable delays that need to be addressed in order to truly optimise Incident management processes.
In the digital age of transformation and disruption, data is king, queen and the entire royal court. Data needs to be relevant, data needs to be accurate, data needs to be accessible in real-time; A tricky balance to master.
The fast-paced world of business means data can become outdated very quickly. During critical IT incidents, such as a security breach where keeping a step ahead of an intruder is vital, this can have detrimental consequences for company and customers alike.
- Real-time intelligence – Instant access to live data is a fast-growing reality in business. The faster and more accurate the data, the more informed and effective the resulting action taken will be. Facilitate faster detection and identify the cause of incident much sooner.
- Break silos – Data silos are dangerous road blocks in business processes. Whether innocently formed or purposefully hoarded, restricting large amounts of data to specific remits, people or systems can be detrimental to swift incident response.
- Filtration – There is such a thing as too much data. If information is irrelevant or presented in an unclear, confusing or misleading manner, the benefits are quickly counteracted and the additional delays required to bring order damage the productivity of the incident response.
Ensuring those involved with incident response have access to the information they need in more practical and efficient manners will dramatically reduce delays in Root Cause Analysis and MTTR. Similarly, keeping stakeholders informed with relevant information will help ease the tensions and reputational risk an incident can cause.
CIEs need a fast response. It is important that the communication channels you are using are equally effective at sharing information and facilitating fast response that mirrors the severity of the incident. Time is money and waiting for someone to find an email, return a phone call or schedule an emergency meeting can create roadblocks.
Whilst there are numerous ways you are able to contact other members of your team, the software e use a work tends to rely heavily on email so your communications methods must be more diverse. This is especially important if email severs are impacted by the cyber incident.
- Modern methods – email is not a reliable enough system to base an entire incident management response process on. With multiple modern integrations and various optimised methods of process driven communication platforms readily available, it is foolish to enforce a rigid and inefficient method purely because you’ve used it for nearly half a century; that’s an alarm bell to update.
- Integrated – It is important that we have open channels of communication not only with stakeholders and colleagues, but also with the affected or involved software systems. Service Management systems such as BMC or ServiceNow provide a great deal of useful insight regarding Major Incident Management, but how do we connect that information to the humans working to resolve? Having a communication method that integrates with your software as well as your colleagues is vital to an effective response.
- Intelligent – Incidents are becoming more complex so the way we communicate about a major incident should be just as sophisticated. As most security attacks are targeted for untraditional business hours, IT teams need an effective method of communication that is smart enough to sense the urgency and escalate alerts when necessary. The more intelligent the communication channel, the more time the team can spend on resolving and not chasing up.
Actions speak louder than words. Talk for the sake of talk is a global efficiency killer. Whilst it’s critical to think before jumping into action, communication needs to remain task focused and drive action. During a critical incident, messages to each other and customers need to remain purposeful and productive to drive the resolution and reduce business downtime.
Although most incidents differ in their resolution, there are many stages in the incident management process that contain key patterns. Often, the most tedious cause for delays are the manual “in-between” tasks, such as repeating data entry, logging into systems or checking on-call schedules. Introducing automation at even the most basic level can help alleviate these distractions and optimise the entire process, alloying teams to focus on a fix.
- Detect and Assign – when an incident occurs, knowing who is available and getting them started on the fix asap is crucial. Often, cyber-attacks occur at times when less people will be on call, but similarly, in today’s modern business where 72% of worker will be working remotely by 2020, finding who is around can be tricky. Automating this to synchronise with your call schedule will ensure a faster initial response.
- Proactive triggers – Why wait until an incident occurs? Develop preventative workflows that track shifts in your metrics alerting you to abnormal activity and threats before they develop. Highlight when memory is beginning to reach its maximum, follow diminishing connectivity, get alerted to expirations before they happen. By automating these warning procedures, your team can take the necessary action that stop a threat from becoming an incident. Risk assessment.
- Remove the manual – whether hunting down similar historical incidents, running standard diagnostics or even sharing regular updates, the time taken to manually input each of these tasks could be better used elsewhere. By automating these expected tasks into smart workflows, you can speed up the resolution remedy
Automation is a key trend for the future of service management. 9 in 10 believe automating work processes would make them more productive, so much so that 62% look for automation as a critical element of a new ITSM tool.
IT is becoming more heavily ingrained into business as a whole as IT service management continues to transition toward enterprise service management. The rise of automation and transformative technology across departments has led to an increase in risks for IT based incidents to occur.
By 2020, 1 in 4 cyber-attacks will involve IOT, but accounts for just 10% of security budgets. Failure to evolve security processes and remove needless inefficiencies keeps businesses at risk and ultimately less competitive. Even a few basic changes to the way incidents are managed to improve the admin and communication aspects will help keep enterprises proactive, protected and maintain a high level of service and customer experience as we keep fighting the fires of IT incidents.
Intelligent communication of Incident Management
Heed’s smart messaging integration connects with leading ITSM software including ServiceNow, BMC and Cherwell to ensure the right people get the information they need to act. Automate major incident management processes with smart messaging workflows that work with you to reduce MTTR and increase productivity across the enterprise.
Find out more here: Heed