Difference between revisions of "SSH datasets"
Line 33: | Line 33: | ||
=== Log files === | === Log files === | ||
− | The log files have been gathered from various Linux operating systems. | + | The log files have been gathered from various Linux operating systems. The following post-processing operations have however been performed: |
+ | |||
+ | # '''Merging''': On some machines, the authentication logs were distributed over ''<hostname>.messages'' and ''<hostname>.warn''. We have merged those log files, sorted them again (if necessary), and removed any introduced duplicates. | ||
+ | # "Renaming": The file names have been changed from ''<hostname>.<extension>'' into "<anonymized_IP_address>.<extension>". As such, the log files can easily be correlated with the flow data. | ||
+ | # '''Anonymization''': We have replaced any usernames by "XXXXX" and hostnames by the anonymized IP address of the considered host. | ||
== Scripts == | == Scripts == |
Revision as of 08:33, 3 August 2014
This page provides access to the materials accompanying the following publication:
SSH Compromise Detection using NetFlow/IPFIX
Rick Hofstede, Luuk Hendriks, Anna Sperotto, Aiko Pras. In: ACM Computer Communication Review, 2014.
More information regarding this publication can be found here. Any usage of materials provided on this page should reference this publication.
Contents
Datasets
Name | File Size | Hosts |
---|---|---|
Flow data | x GB | 333 |
Log files | x GB | 333 |
Some results derived from these data can be found in here.
Flow data
The flow data has been exported by a Cisco Catalyst 6500 with SUP2T supervisor module (PFC4, MSFC 5), and collected using nfcapd. Neither packet sampling nor flow sampling have been applied. The following post-processing operations have however been performed:
- Filtering: Only SSH data has been selected, i.e., the following nfdump filter has been used: port 22 and proto tcp.
- Anonymization: nfanon has been used for anonymizing the flow data in a prefix-preserving manner. More precisely, nfanon relies on the CryptoPAn (Cryptography-based Prefix-preserving Anonymization) module.
Log files
The log files have been gathered from various Linux operating systems. The following post-processing operations have however been performed:
- Merging: On some machines, the authentication logs were distributed over <hostname>.messages and <hostname>.warn. We have merged those log files, sorted them again (if necessary), and removed any introduced duplicates.
- "Renaming": The file names have been changed from <hostname>.<extension> into "<anonymized_IP_address>.<extension>". As such, the log files can easily be correlated with the flow data.
- Anonymization: We have replaced any usernames by "XXXXX" and hostnames by the anonymized IP address of the considered host.