Ha Manh Tran
|Student:||Ha Manh Tran|
|Title:||Distributed Case-Based Reasoning for Fault Management|
|Committee:||Jürgen Schönwälder (Jacobs), Michael Kohlhase (Jacobs), Olivier Festor (INRIA)|
PhD project description
The resolution of faults in communication networks and distributed systems is to a large extend a human driven process. Automated monitoring and event correlation systems usually produce fault reports that are forwarded to operators for resolution. Support systems such as trouble ticket systems are frequently used to organize the work-flows.
Case-based reasoning (CBR) has been proposed in the early 1990s to assist operators in the resolution of faults by providing mechanisms to correlate an observed fault with previously solved similar cases (faults). CBR systems are typically linked to trouble ticket systems since the data maintained in trouble ticket systems can be used to populate the case database. Existing CBR systems for fault management usually operate only on a local case database and can not easily share and exploit knowledge about faults and their resolution present at other sites. This restriction to local knowledge especially becomes an issue in environments where software components and offered services change very dynamically and the case database is thus frequently outdated.
With the success of general purpose search engines like Google, it has become common practice for operators to "google" for fault messages and to search for problem resolutions in indexed public archives. Experience tells us that quite often problems can be resolved quickly after "googling" long enough. Solutions are typically found in indexed discussion forums, bug tracking and trouble ticket systems, or vendor provided knowledge bases. While some of these data sources maintain some structured information (e.g., bug tracking and trouble ticket systems), this information can not be exploited due to the usage of a generic search engine which does not understand the meta information readily available.
The goal of our project is to develop a distributed case-based reasoning system to assist operators in resolving faults by finding relevant cases more easily and effectively. The system will take advantage of peer-to-peer (P2P) technologies to achieve some degree of self-organization and to avoid centralized servers. In addition, we plan to develop and integrate semantic search mechanisms that can take advantage of the semi structured data that can be retrieved from the network and integrated into the system. We plan to develop and evaluate the system by running it on the PlanetLab infrastructure.
- H.M. Tran: Distributed Case-Based Reasoning for Fault Management. Jacobs University PhD Dissertation, ISBN 978-3-8322-8525-8, Shaker Velag, 2009.
- H.M. Tran, C. Lange, G. Chulkov, J. Schönwälder, M. Kohlhase: Applying Semantic Techniques to Search and Analyze Bug Tracking Data. Journal of Network and Systems Management 17(3), Springer, September 2009.
- H.M. Tran, J. Schönwälder: Fault Resolution in Case-Based Reasoning. 10th Pacific Rim International Conference on Artificial Intelligence (PRICAI 2008), Hanoi, December 2008. Springer LNAI 5351.
- H.M. Tran, G. Chulkov, J. Schönwälder: Crawling Bug Tracker for Semantic Bug Search. 19th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM 2008), Samos Island, September 2008. Springer LNCS 5273.
- F. Liu, A. Hadjiantonis, H.M. Tran, M. Amin: An Architecture for Supporting Network Fault Recovery Management. 2nd Conference on Autonomous Infrastructure, Management and Security (AIMS 2008), Bremen, July 2008. Springer LNCS 5127.
- H.M. Tran, J. Schönwälder: Fault Representation in Case-based Reasoning. 18th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM 2007), San Jose, October 2007. Springer LNCS 4785.
- H.M. Tran, J. Schönwälder: Heuristic Search using a Feedback Scheme in Unstructured Peer-to-Peer Networks. 5th International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P 2007), Vienna, September 2007. Springer LNCS.
- H.M. Tran, J. Schönwälder: Distributed Case-based Reasoning for Fault Management. 1st Conference on Autonomous Infrastructure, Management and Security (AIMS 2007), Oslo, June 2007. Springer LNCS 4543.