SSH datasets
This page provides access to the materials accompanying the following publication:
SSH Compromise Detection using NetFlow/IPFIX
Rick Hofstede, Luuk Hendriks, Anna Sperotto, Aiko Pras. In: ACM Computer Communication Review, 2014.
More information regarding this publication can be found here. Any usage of materials provided on this page should reference this publication.
Contents
Datasets
Name | File Size | CRC |
---|---|---|
Flow data | x GB | xxx |
Log files | x GB | xxx |
Some results derived from these data can be found in here.
Flow data
The flow data has been exported by a Cisco Catalyst 6500 with SUP2T supervisor module (PFC4, MSFC 5), and collected using nfcapd. Neither packet sampling nor flow sampling have been applied. The following post-processing operations have however been performed:
- Filtering: Only SSH data has been selected, i.e., the following nfdump filter has been used: port 22 and proto tcp.
- Anonymization: nfanon has been used for anonymizing the flow data in a prefix-preserving manner. More precisely, nfanon relies on the CryptoPAn (Cryptography-based Prefix-preserving Anonymization) module.
Log files
The log files have been gathered from various Linux operating systems. The following post-processing operations have however been performed:
- Merging: On some machines, the authentication logs were distributed over <hostname>.messages and <hostname>.warn. We have merged those log files, sorted them again (if necessary), and removed any introduced duplicates.
- Renaming: The file names have been changed from <hostname>.<extension> into <anonymized_IP_address>.<extension>. As such, the log files can easily be correlated with the flow data.
- Anonymization: We have replaced any usernames by "XXXXX" and hostnames by the anonymized IP address of the considered host.