Difference between revisions of "Dropbox Traces"

From SimpleWiki
Jump to navigationJump to search
Line 38: Line 38:
  
 
All files are in a format similar to the [http://tstat.polito.it/measure.shtml#log_tcp_complete log_tcp_complete] saved by Tstat.
 
All files are in a format similar to the [http://tstat.polito.it/measure.shtml#log_tcp_complete log_tcp_complete] saved by Tstat.
 +
 +
Tstat produces "log_tcp_complete" and "log_tcp_nocomplete" files which log every TCP
 +
connection that has been tracked.
 +
 +
A TCP connection is identified when the first SYN segment is observed, and
 +
is ended when either:
 +
  - the FIN/ACK or RST segments are observer;
 +
  - no data packet has been observed (from both sides) for a default timeout
 +
    of 10s after the thress-way handshake or 5min after the last data packet
 +
    (see TCP_SINGLETON_TIME and TCP_IDLE_TIME in param.h);
 +
 +
Tstat discards all the connections for which the three way handshake is not
 +
properly seen. Then, in case a connection is correctly closed it is stored in
 +
log_tcp_complete, otherwise in log_tcp_nocomplete.
 +
 +
Both files have the same format with values separated by spaces.
 +
Columns are grouped according to C2S - Client-to-Server
 +
and S2C - Server-to-Client traffic directions.
 +
 +
Here it follows a brief description of the columns.

Revision as of 15:44, 4 September 2012

You can download from this page the flow data used in the following paper:

  • Drago, I. and Mellia, M. and Munafò, M. M. and Sperotto, A. and Sadre, R. and Pras, A. (2012) Inside Dropbox: Understanding Personal Cloud Storage Services. In: Proceedings of the 12th ACM Internet Measurement Conference - IMC'12, Boston, Nov. 2012

As described in the paper, the data was captured at 4 vantage points in 2 European countries. Most of the data were collected from March 24, 2012 to May 5, 2012. A second dataset was collected in Campus 1 in June and July 2012 to complement the analysis.

All data was captured using Tstat: An open source monitoring tool developed at Politecnico di Torino. Tstat exports flow data containing more than 100 metrics. The source code of Tstat can be obtained from here.

Traces

First data capture

  • Campus 1
  • Campus 2 (soon)
  • Home 1 (soon)
  • Home 2 (soon)

Second data capture

Acceptable Use Policy

  • The user must not attempt to reverse engineer the anonymization procedure used to protect the data.
  • If noticing vulnerabilities in the anonymization procedure the user is kindly asked to inform the repository administrators.
  • When writing a paper using this data, we ask the user to cite:
@inproceedings{drago2012_dropbox,
  author        = {Idilio Drago and Marco Mellia and Maurizio M. Munaf\`{o} and Anna Sperotto and Ramin Sadre and Aiko Pras},
  title         = {{I}nside {D}ropbox: {U}nderstanding {P}ersonal {C}loud {S}torage {S}ervices},
  booktitle     = {Proceedings of the 12th ACM SIGCOMM Conference on Internet Measurement},
  series        = {IMC'12},
  pages         = {},
  year          = {2012}
}

Format

All files are in a format similar to the log_tcp_complete saved by Tstat.

Tstat produces "log_tcp_complete" and "log_tcp_nocomplete" files which log every TCP
connection that has been tracked.
A TCP connection is identified when the first SYN segment is observed, and
is ended when either:
  - the FIN/ACK or RST segments are observer;
  - no data packet has been observed (from both sides) for a default timeout 
    of 10s after the thress-way handshake or 5min after the last data packet 
    (see TCP_SINGLETON_TIME and TCP_IDLE_TIME in param.h);
Tstat discards all the connections for which the three way handshake is not 
properly seen. Then, in case a connection is correctly closed it is stored in 
log_tcp_complete, otherwise in log_tcp_nocomplete. 
Both files have the same format with values separated by spaces.
Columns are grouped according to C2S - Client-to-Server 
and S2C - Server-to-Client traffic directions. 
Here it follows a brief description of the columns.