Networks
Durga Prasad Muni, Suman Roy, Yeung Tack Iheb Ben Abdallah
Yan John John Lew Chiang, Navin Budhiraja Computer Science and Electrical Engineering,
Infosys Limited Ecole CentraleSupelec
#44 Electronic City, Hosur Road Grande Voie des Vignes, Chatenay-Malabry
Bangalore, India 560100 Paris, France 92290
{DurgaPrasad Muni,Suman Roy,Yeung Chiang,Navin. Iheb.Benabdallah@supelec.fr
Budhiraja}@infosys.com
ABSTRACT 1 INTRODUCTION
In Information Technology Infrastructure Library (ITIL) services Ticketing system forms a core component for the problem and con-
a sizable volume of tickets are raised everyday for different issues figuration management for Information Technology Infrastructure
to be resolved so that the service can be delivered without inter- Library (ITIL) services. Vast number of tickets are raised on the
ruption. An issue is captured as summary on the ticket and once ticketing system by users with a view to resolve issues/concerns
a ticket is resolved, the solution is also noted down on the ticket faced by them while using different support systems. These incident
as resolution. It is required to automatically extract information data in the form of tickets can be used for different purposes such as
from the description of tickets to improve operations like iden- SLA calculation, forecasting, optimum resource level checking, per-
tifying critical and frequent issues, grouping of tickets based on formance metrics computation etc. A ticketing system tries to min-
textual content, suggesting remedial measures for them etc. In an imize the business impact of incidents by addressing the concerns
earlier work we have proposed deep learning based recommen- of the raised tickets. The incident tickets record symptom descrip-
dation algorithm for recovering resolutions for incoming tickets tion of issues, as well as details on the incident resolution using a
through identification of similar tickets. In this work we use similar range of structured fields such as date, resolver, categories, affected
deep neural based framework to compute the similarity between servers and services and a couple of free-form entries outlining
two tickets by considering context information. In particular, we the description/summary of issues, note by users/administrators
append the feature representation of tickets with context infor- etc. Once a ticket is resolved, the solution is also noted down on
mation to be fed as input to deep neural network. Our learning the ticket as resolution as texts. Manual screening of such a huge
algorithm seems to improve the performance of similarity learning volume of tickets would be laborious and time-consuming. One
using the traditional techniques. In particular the context-enriched needs to extract information automatically from the description of
DNN approach on average improves the performance by 5-6% in tickets to gain insights in order to improve operations like identify-
comparison to simple DNN-based approach. ing critical and frequent issues, grouping of tickets based on textual
content, suggesting remedial measures for them and so forth.
CCS CONCEPTS Deep learning allows computational models that are composed
Computing methodologies Neural networks; Applied Com- of multiple processing layers to learn representations of data with
puting Document management and text processing; multiple levels of abstraction [16]. They typically use artificial
neural networks with several layers - these are called deep neural
KEYWORDS networks. Deep neural networks (DNN) are becoming popular
these days for providing efficient solutions for many problems re-
Deep Learning; Neural Network; Deep Neural Network; Ticket lated to language and information retrieval [4, 6, 7]. In this work
similarity; Short Text Similarity; Context Information we use context sensitive Feed Forward deep neural network (FF-
ACM Reference format: DNN) for computing ticket similarity. In an earlier work we have
Durga Prasad Muni, Suman Roy, Yeung Tack Yan John John Lew Chiang, proposed an automated method based on deep neural networks for
Navin Budhiraja and Iheb Ben Abdallah. 2016. Learning Ticket Similarity recommending resolutions for incoming tickets through identifica-
with Context-sensitive Deep Neural Networks. In Proceedings of ACM tion of similar tickets. We use ideas from deep structured semantic
Conference, Washington, DC, USA, July 2017 (Conference17), 9 pages. models (DSSM) for web search for such resolution recovery. We
DOI: 10.475/123 4
take feature vectors of tickets and pass them onto DNN to generate
low dimensional feature vectors, which helps compute the similar-
This work was done when Iheb Ben Abdallah was an intern at Infosys Ltd during
July-Aug, 2017. ity of an existing ticket with the new ticket. We select a couple of
Permission to make digital or hard copies of part or all of this work for personal or tickets which has the maximum similarity with the incoming ticket
classroom use is granted without fee provided that copies are not made or distributed and publish their resolutions as the suggested resolutions for the
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored. latter ticket.
For all other uses, contact the owner/author(s). We modify this framework of similarity computing by taking
Conference17, Washington, DC, USA into account context information. Context may appear in various
2016 Copyright held by the owner/author(s). 123-4567-24-567/08/06. . . $15.00
DOI: 10.475/123 4 forms ranging from neighboring sentences of a current sentence in
Conference17, July 2017,
Durga
Washington,
Prasad Muni,
DC, USA
Suman Roy, Yeung Tack Yan John John Lew Chiang, Navin Budhiraja and Iheb Ben Abdallah
Table 5: Ticket pair similarity for Ret Data Set Table 9: Resolution Recommendation for SnA Data Set
Approach sim dis (sim dis ) Approach top@1 top@3 top@5 top@10
FFDNN 0.904 0.345 0.559 FFDNN 0.366 0.459 0.485 0.511
Context FFDNN 1.178 0.475 0.703 Context FFDNN 0.361 0.458 0.484 0.517
Table 6: Ticket Summary Ranking for SnA Data Set Table 10: Resolution Recommendation for FnB Data Set
Approach top@1 top@3 top@5 top@10 Approach top@1 top@3 top@5 top@10
FFDNN 0.497 0.531 0.538 0.547 FFDNN 0.399 0.497 0.529 0.559
Context FFDNN 0.524 0.559 0.566 0.573 Context FFDNN 0.392 0.498 0.532 0.562
% of improvement 5.4 5.3 5.2 4.8
Table 11: Resolution Recommendation for Ret Data Set
Table 7: Ticket Summary Ranking for FnB Data Set
Approach top@1 top@3 top@5 top@10
Approach top@1 top@3 top@5 top@10
FFDNN 0.513 0.604 0.633 0.662
FFDNN 0.784 0.811 0.814 0.815 Context FFDNN 0.513 0.606 0.645 0.672
Context FFDNN 0.780 0.796 0.819 0.822
% of improvement 0.5 1.8 0.6 0.9
Table 12: Evaluation of recommended resolutions with
FFDNN for top@10
Table 8: Ticket Summary Ranking for Ret Data Set
Domain Manual SS
Approach top@1 top@3 top@5 top@10 Evaluation Evaluation
FFDNN 0.657 0.718 0.736 0.747 SnA 0.564 0.511
Context FFDNN 0.689 0.762 0.778 0.789 Ret 0.710 0.662
% of improvement 4.9 6.1 5.7 5.6
7 CONCLUSIONS
approach) score which ranges from 0 to 1. Then we compute the
In this work we have integrated context with deep neural network
average semantic similarity score over all test tickets, see Tables 6, 7,
to compute similarity between tickets used in ITIL services. Our
and 8. It can be seen that context-based FFDNN performs better
learning algorithm seems to improve the performance of similar-
by approximately 5% over the simple FFDNN approach for the
ity computation without contexts using deep network only. Also
data sets SnA and and by 6% for the data set Ret. However, the
we could see some improvement in the representation of other
performance goes down for FnB data set. This might be because
similarity tasks. In future we would like examine other context
of the same reason for contextual similarity task.
models like PoS tag, word dependency information, word sense,
Resolution recommendation for incoming tickets: As given in sec- domain ontology and integrate with similar learning framework
tion 6.3, we recommend resolutions for a new ticket. The results are for performing semantic similarity tasks as discussed in the paper.
given in Tables 9, 10, and 11. It shows that context based FFDNN This will help handle automation of ticketing systems in different
has performed marginally better than FFDNN. stages in addition to automation of several incident management,
To determine the effectiveness of semantic similarity(SS) ap- monitoring and event management tasks.
proach, we manually evaluated recommended resolutions with
FFDNN approach for two data sets. We inspect the actual resolu- A GRADIENT DESCENT
tion of each ticket and the corresponding recommended resolutions We now formulate the gradient descent algorithm in our framework.
for top 10 case. We use three similarity scores of 0, 0.5 and 1. If the This formulation is based on the one given in [22]. However, we
meaning of a pair of actual resolution and recommended language modify the derivation by taking into account context vectors.
appear to be the same (using meta language oriented informal The DNN is trained using gradient-based numerical optimiza-
semantics) then we assign a similarity score of 1 to this pair. If we tion algorithms [13] because L() is differentiable wrt . con-
find that the meaning of the elements of this pair are not exactly sists of weight matrices Wk and Vk and bias vectors bk and dk ,
same, but there is some match then we provide a score of 0.5 to k = 2, 3, ..., N . The parameters in () are updated as
this pair. Otherwise (in case of the resolutions completely differing
L()
in their meaning) we score this pair 0. As before, we calculate t = t 1 t | , (10)
the average manual similarity score over all test tickets. Table 12 =t 1
shows that manual scoring of the similarity is slightly higher than where t is the learning rate at the t th iteration, t and t 1 are
automated evaluation by SS approach. the model parameters at the t th and (t 1)th iteration, respectively.
Conference17, July 2017,
Durga
Washington,
Prasad Muni,
DC, USA
Suman Roy, Yeung Tack Yan John John Lew Chiang, Navin Budhiraja and Iheb Ben Abdallah
m R(Tm ,T jm )
m m
R(Tm ,Tim+ )
j
= (16) k,Ti (Tm ,Ti ) = (1 + hk,Ti ) (1 hk,Ti ) WkT+1k +1,Ti (Tm ,Ti )
WN WN WN
c (cm ,c i )
and k,c = (1 + lk,c i ) (1 lk,c i ) VkT+1kc +1,c (cm ,c i )
i i
exp( m
j )
m
j = (17)
1 + T m exp( m (22)
j 00 j 00 )
Let ym and yi be the outputs of DNNR with Tm and Ti ticket with
summaries. Let ym c and y c be the outputs of DNNRC with context
i (cm ,c i )
vectors cm and c i of Tm and Ti ticket summaries respectively. N ,Tm (Tm ,Ti ) = ym (Tm ,Ti ) , N
c
,cm = yc c (cm ,c i )
m
(cm ,c i )
N ,Ti (Tm ,Ti ) = yi (Tm ,Ti ) , N
c
,c i = yc c (cm ,c i )
R(Tm ,Tim ) Ty c T yc
ym
" #
ym i i
i
= + c ||||y c ||
WN WN ||ym ||||yi || ||ym i (18) The gradient of the loss function w.r.t the intermediate weight
matrix, Wk , k = 2, 3, . . . , N 1, can be computed as
= ym (Tm ,Ti )hTN 1,Tm + yi (Tm ,Ti )hTN 1,Ti
m
where, Lm () m j
= j (23)
Wk Wk
m Tj
ym (Tm ,Ti ) = (1 ym ) (1 + ym ) (uvyi avu 3ym )
where
m R(Tm ,Tim+ ) R(Tm ,T jm )
yi (Tm ,Ti ) = (1 yi ) (1 + yi ) (uvym auv 3yi ) j
=
Wk Wk Wk
T 1 1 (Tm ,Ti ) T (Tm ,Ti ) T (24)
a = ym yi , u
= ,v = = k,Tm hk 1,T + k,Ti hk1,T
m i
||ym || ||yi ||
(Tm ,T j ) T (Tm ,T j ) T
The operator denotes the element-wise multiplication. k,Tm hk 1,T k,T j hk1,T
m j
The gradient of the loss function w.r.t the N-th weight matrix The gradient of the loss function w.r.t the intermediate weight
VN of DNNRC is matrix, Vk , k = 2, 3, . . . , N 1, can be computed as
m m
Lm () m j Lm () m j
= j (19) = j (25)
VN VN Vk Vk
m Tj m Tj
Learning Ticket Similarity with Context-sensitive Deep Neural Networks Conference17, July 2017, Washington, DC, USA
where [13] Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry P.
m
j R(Tm ,Tim+ ) R(Tm ,T jm ) Heck. 2013. Learning deep structured semantic models for web search using
clickthrough data. In 22nd ACM International Conference on Information and
=
Vk Vk Vk Knowledge Management, CIKM13. 23332338.
[14] Da Kuang, Jaegul Choo, and Haesun Park. 2015. Nonnegative matrix factor-
c
= (k,c (cm ,c i ) T c (cm ,c i ) T
lk 1,c + k,c lk1,c (26) ization for interactive topic modeling and document clustering. In Partitional
m m i i Clustering Algorithms. Springer, 215243.
c (cm ,c j ) T c (cm ,c j ) T [15] Quoc V. Le and Tomas Mikolov. 2014. Distributed Representations of Sentences
k,c lk 1,c k,c lk 1,c ) and Documents. In Proceedings of the 31th International Conference on Machine
m m j j
Learning, ICML14. 11881196.
Similarly, the gradient of loss function w.r.t bias can be derived. [16] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Learning. Nature
The partial derivation of R(Tm ,Tim+ ) wrt bias b N and bk , k = 521 (2015), 436444.
2, 3, ..., N 1 can be derived as: [17] Jure Leskovec, Anand Rajaraman, and Jeff Ullman. 2014. Mining of Massive
Datasets (2nd ed.). Cambridge University Press.
[18] Yuhua Li, David McLean, Zuhair Bandar, James OShea, and Keeley A. Crockett.
R(Tm ,Tim ) 2006. Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE
= ym (Tm ,Ti ) + yi (Tm ,Ti ) (27) Trans. Knowl. Data Eng. 18, 8 (2006), 11381150.
b N [19] Rui Lin, Shujie Liu, Muyun Yang, Mu Li, Ming Zhou, and Sheng Li. 2015. Hierar-
chical Recurrent Neural Network for Document Modeling. In Proceedings of the
R(Tm ,Tim )
2015 Conference on Empirical Methods in Natural Language Processing, EMNLP15.
= k,Tm (Tm ,Ti ) + k,Ti (Tm ,Ti ) (28) 899907.
bk [20] Zhengdong Lu and Hang Li. 2013. A deep architecture for matching short texts.
In Advances in Neural Information Processing Systems. 13671375.
The partial derivation of R(Tm ,Tim+ ) wrt bias d N and dk , k = [21] Jonas Mueller and Aditya Thyagarajan. 2016. Siamese Recurrent Architectures
for Learning Sentence Similarity.. In AAAI. 27862792.
2, 3, ..., N 1 can be derived as: [22] D. P. Muni, S. Roy, Y. T. Y. J. John L. Chiang, A. J-M Viallet, and N. Budhiraja.
2017. Recommending resolutions of ITIL services tickets using Deep Neural
R(Tm ,Tim ) Network. In Proceedings of the 4th IKDD Conference on Data Science, CODS.
= yc c (cm ,c i ) + yc c (cm ,c i ) (29) [23] Sascha Rothe and Hinrich Schutze. 2015. AutoExtend: Extending Word Em-
d N m i beddings to Embeddings for Synsets and Lexemes, See [1], 17931803. http:
//aclweb.org/anthology/P/P15/
[24] S. Roy, D. P. Muni, J-J. Yeung T. Y., N. Budhiraja, and F. Ceiler. 2016. Clustering
R(Tm ,Tim ) c (cm ,c i ) c (cm ,c i ) and Labeling IT Maintenance Tickets. In Service-Oriented Computing - 14th
= k,c + k,c (30)
dk m i International Conference, ICSOC 2016, Banff, AB, Canada, Proceedings. 829845.
[25] G. Salton and C. Buckley. 1988. Term Weighing Approaches in Automatic Text
Retrieval. Information Processing and Management (1988).
REFERENCES [26] Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to Rank Short Text
[1] ACL15 2015. Proceedings of the 53rd Annual Meeting of the Association for Pairs with Convolutional Deep Neural Networks. In Proceedings of the 38th
Computational Linguistics and the 7th International Joint Conference on Natural International ACM SIGIR Conference on Research and Development in Information
Language Processing of the Asian Federation of Natural Language Processing, ACL Retrieval15. 373382.
2015, July 26-31, 2015, Beijing, China, Volume 1 & 2: Long and Short Papers. The [27] Aliaksei Severyn and Alessandro Moschitti. 2015. Learning to rank short text
Association for Computer Linguistics. http://aclweb.org/anthology/P/P15/ pairs with convolutional deep neural networks. In Proceedings of the 38th In-
[2] Hadi Amiri, Philip Resnik, Jordan Boyd-Graber, and Hal Daume III. 2016. Learn- ternational ACM SIGIR Conference on Research and Development in Information
ing text pair similarity with context-sensitive autoencoders. In Proceedings of the Retrieval. ACM, 373382.
54th Annual Meeting of the Association for Computational Linguistics (ACL16), [28] Rupesh K Srivastava, Klaus Greff, and Jurgen Schmidhuber. 2015. Training very
Vol. 1. 18821892. deep networks. In Advances in neural information processing systems. 23772385.
[3] Chao An, Jiuming Huang, Shoufeng Chang, and Zhijie Huang. 2016. Question [29] Keith Stevens, W. Philip Kegelmeyer, David Andrzejewski, and David Buttler.
Similarity Modeling with Bidirectional Long Short-term Memory Neural Net- 2012. Exploring Topic Coherence over Many Models and Many Topics. In
work.. In Proceedings of IEEE First International Conference on Data Science in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language
Cyberspace. Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012,
[4] Yoshua Bengio. 2009. Learning Deep Architectures for AI. Foundations and July 12-14, 2012, Jeju Island, Korea. 952961.
Trends in Machine Learning 2, 1 (2009), 1127.
[5] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet
Allocation. Journal of Machine Learning Research 3 (2003), 9931022.
[6] Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray
Kavukcuoglu, and Pavel P. Kuksa. 2011. Natural Language Processing (Almost)
from Scratch. Journal of Machine Learning Research 12 (2011), 24932537.
[7] Li Deng, Xiaodong He, and Jianfeng Gao. 2013. Deep stacking networks for
information retrieval. In IEEE International Conference on Acoustics, Speech and
Signal Processing, ICASSP13, Vancouver, BC, Canada. 31533157.
[8] Ccero Nogueira dos Santos, Luciano Barbosa, Dasha Bogdanova, and Bianca
Zadrozny. 2015. Learning Hybrid Representations to Retrieve Semantically
Equivalent Questions, See [1], 694699. http://aclweb.org/anthology/P/P15/
[9] Jianfeng Gao, Xiaodong He, and Jian-Yun Nie. 2010. Clickthrough-based transla-
tion models for web search: from word models to phrase models. In Proceedings
of the 19th ACM Conference on Information and Knowledge Management, CIKM10.
11391148.
[10] Jianfeng Gao, Kristina Toutanova, and Wen-tau Yih. 2011. Clickthrough-based
latent semantic models for web search. In Proceeding of the 34th International
ACM SIGIR Conference on Research and Development in Information Retrieval,
SIGIR11. 675684.
[11] Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional
neural network architectures for matching natural language sentences. In Ad-
vances in neural information processing systems. 20422050.
[12] Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng.
2012. Improving Word Representations via Global Context and Multiple Word
Prototypes. In The 50th Annual Meeting of the Association for Computational
Linguistics, Proceedings of the Conference, ACL12: Long Papers. 873882.