IT & Network Operations Improvement with Generative AI — Change Management Use Cases
In the last post, I provided some LLM use cases related to incident management. In this post, I will provide some LLM use cases related to change management. First, I would like to describe some change management aspects that highlight the complexities involved. Figure 1 shows a representation of a change management process. Although Figure 1 shows various phases (Determine need for change, Develop change, Test change, Review change, Deploy change) of a change, it is important to note the following about these changes:
1. Organizations which practice agile methodologies may push many changes into production every week, every day, and even every hour. Many studies have found that most incidents happen due to changes made to the production environment. Can you imagine how difficult it is to review 100 changes per hour? Yes, 100 changes per hour. This may sound crazy. But large production environments can have very many changes made every hour. To mitigate risks, businesses have change review boards to review change tickets as they flow into production and make decisions on the risk levels of these changes. Because it is very challenging to review many tickets, businesses often decide to drop certain change tickets from the review process. Not reviewing a change ticket can be a recipe for disaster.
2. Some changes may be riskier than others to the production environment.
3. Changes have varying timelines. Some changes take longer to design, implement, test, review, and deploy than others.
4. Some changes may be conflicting.
5. When changes are reviewed, they tend to be reviewed in isolation. It is quite challenging to review many changes at the same time and predict what will happen in the production environment. This is because testing is typically, for many reasons, not done correctly.
6. Some changes may be made silently and in secret, bypassing any review and approval process, especially in organizations with poor governance.
7. Some changes may be made with poor auditing, especially in organizations with poor governance.
8. Some changes will usually go through a review process. For example, a critical change that is triggered by an incident.
9. Some changes will only be reviewed once. For example, an administrative command is reviewed and approved to be executed when needed. Once it is approved to execute, there is no need to review and approve the command again and again.
10. Some changes will never be reviewed until a problem happens!
11. Some changes will not be tested!
12. Some changes will need to be developed, tested, reviewed, and approved only once.
13. Some changes will always go through all phases. For example, application code changes fall into this category.
14. A change may involve many different experts from many different teams.
15. Although businesses tend to use change management tools such as Remedy or ServiceNow, there are still change-related aspects that happen without visibility. For example, the change review itself is typically done on a schedule, involves a variable number of experts, and typically happens without capturing key information.
16. The change impact on production is not typically captured unless an incident happens. Then, a review of what might have changed in the environment is conducted.
17. The change management process is typically ad hoc even when it is well defined at high level by some businesses. For example, the unforeseen changes such as those triggered by incidents are not initially known. These unforeseen changes may consist of the following:
a. A tactical change to resolve the incident quickly to restore service. The tactical change is determined only after some understanding of the incident is gained.
b. A more involved change determined through root cause analysis to permanently prevent the incident from happening again.
c. A change that is automated. For example, the change may be put together to automatically provision disk space, archive logs, …
So, how does Generative AI help here? Here are some change management use cases for Generative AI to minimize friction and streamline the change management process:
Change Ticket Summary: This is extremely helpful. In an agile organization, there may be many change tickets to review on a continuous basis. Experts may gather in a “room” to review and approve these tickets. Having a properly-trained LLM model summarizing all change tickets that are subject for review will accelerate the review process. There are many variations (sub-use cases) to this summary. For example:
1. Summarize tickets relevant to a specific application or service.
2. Summarize tickets relevant to a specific host.
3. Summarize tickets for executives.
4. Summarize tickets for technical audience.
Change Ticket Classification: The change ticket may be classified in several ways. For example:
1. The ticket priority may be classified as high, medium, or low.
2. The ticket location may be classified as network, OS, middleware, database, application, …
3. The ticket security may be classified. For example, the change may involve sensitive information.
4. The ticket ownership may be classified as development, operations, infrastructure, …
5. You could prompt a properly-trained LLM model to classify the ticket based on any criteria that may exist in the ticket.
Change Ticket Similarity: The change management tool can have invaluable historical change tickets that will enable a properly-trained LLM model to find similarities between a new change ticket and past tickets. This type of analysis can be leveraged to implement the following sub-use cases:
1. Determine the end-to-end path of a change. For example, this change ticket may have to go through architecture review, development, testing, review, … or the change ticket may skip one or more phases in the change management process.
2. Determine the various teams who need to be involved in this change ticket. Which architect, development team, test team, review team, infrastructure team, … who will be involved in the change ticket?
3. Determine the risk associated with this change ticket.
4. Determine the impacted component, service, application, host, end users, …
5. Suggest a procedure, including actual commands and/or code, to make the change.
Change Ticket Conflict Detection: A properly-trained LLM model can find similarities, differences, as well as conflict between changes. For example, you can simply provide a properly-trained LLM model the following prompt:
Report any conflict between the following two tickets:
1. Ticket 1: change the Java heap parameter -Xmx to 4 GB for cluster member x25frd
2. Ticket 2: change the Java heap parameter -Xmx to 3 GB for cluster member x25frd
Ticket 1 may be for application 1 and Ticket 2 may be for application 2. Both applications are using the same Java heap. Clearly, there is a conflict which can be detected by a properly trained LLM model. You can tune this prompt to instruct a properly-trained LLM model to provide an output that can be fed to another application for downstream processing.
With a properly trained LLM model, the above use cases can minimize friction, streamline the change management process, accelerate incident resolution, proactively avoid incidents, and increase the agility of delivering innovation to the end user.