Session 702: Major Incidents and Outages: What to Do at 3 AM?

SupportWorld Live Schedule 2024

HDI's SupportWorld Live provides inspirational sessions that matter to you, with high-quality speakers and more practitioner-led case studies than ever, this conference is sure to educate and inspire you.

You can find specific sessions by checking the conference tracks and/or conference pass types on the left hand side below. Check back frequently, information is added weekly! Click on session titles for full session descriptions. All times noted are in MST.

All Sessions Speakers My Schedule

Session 702: Major Incidents and Outages: What to Do at 3 AM?

Arif Gheewala (Program Manager, UCLA Health IT)

Pass Type: 2-day Training + Standard Conference Pass, 3-day Training + Standard Conference Pass, Standard Conference Pass, VIP Access Conference Pass - Get your pass now!

Track : Achieving Service Excellence, Optimizing the Support Organization

Session Type: Case Study

Vault Recording: TBD

Audience Level: All

Hospitals can be a high-speed, intense environment. At UCLA Health, the IT team supports more than 100 clinical applications across the spectrum of criticality. When one of those applications goes down, what do we do? Over the past few years, the UCLA Health IT team developed a robust playbook for handling outages and unscheduled downtime. Why is this important? What have we learned? In this session, Arif Gheewala will share UCLA Health IT's approach to creating its Major Incident Response Runbook. Regardless of your industry, learn how you can accurately detect, rapidly react, effectively respond, and accurately communicate risks and results to your own stakeholders.

Takeaway

1. Creating roles and responsibilities during outages and unscheduled downtimes – who does what, what is each person responsible for, etc

2. Check list – on the flow of process, the simple to digest check list on what to do and (importantly!) who to communicate it

3. How we utilize the Knowledge Base at the Service Desk to play (fast!) processes in Outages/Major Incidents

4. How we categorize the importance of applications.

The goal of the Information Technology Major Incident Response Runbook is to detect and react to operation priority incidents, determine their scope and risk, respond appropriately to the incident, communicate the results and risk to all stakeholders, and reduce the likelihood of the incident from reoccurring.