Difference between revisions of "Dropbox Traces"

From SimpleWiki
Jump to navigationJump to search
Line 119: Line 119:
 
<pre>
 
<pre>
 
############################################################################
 
############################################################################
# 110      #  
+
# 110      # C2S messages          # -    #  
# 111      #  
+
# 111      # S2C messages          # -    #  
# 112      #  
+
# 112      # DB host_int            # -    #
# 113      #  
+
# 113      # DB service            # -    #
 
############################################################################
 
############################################################################
 
</pre>
 
</pre>

Revision as of 16:00, 4 September 2012

You can download from this page the flow data used in the following paper:

  • Drago, I. and Mellia, M. and Munafò, M. M. and Sperotto, A. and Sadre, R. and Pras, A. (2012) Inside Dropbox: Understanding Personal Cloud Storage Services. In: Proceedings of the 12th ACM Internet Measurement Conference - IMC'12, Boston, Nov. 2012

As described in the paper, the data was captured at 4 vantage points in 2 European countries. Most of the data were collected from March 24, 2012 to May 5, 2012. A second dataset was collected in Campus 1 in June and July 2012 to complement the analysis.

All data was captured using Tstat: An open source monitoring tool developed at Politecnico di Torino. Tstat exports flow data containing more than 100 metrics. The source code of Tstat can be obtained from here.

Traces

First data capture

  • Campus 1
  • Campus 2 (soon)
  • Home 1 (soon)
  • Home 2 (soon)

Second data capture

Acceptable Use Policy

  • The user must not attempt to reverse engineer the anonymization procedure used to protect the data.
  • If noticing vulnerabilities in the anonymization procedure the user is kindly asked to inform the repository administrators.
  • When writing a paper using this data, we ask the user to cite:
@inproceedings{drago2012_dropbox,
  author        = {Idilio Drago and Marco Mellia and Maurizio M. Munaf\`{o} and Anna Sperotto and Ramin Sadre and Aiko Pras},
  title         = {{I}nside {D}ropbox: {U}nderstanding {P}ersonal {C}loud {S}torage {S}ervices},
  booktitle     = {Proceedings of the 12th ACM SIGCOMM Conference on Internet Measurement},
  series        = {IMC'12},
  pages         = {},
  year          = {2012}
}

Format

All files are in a format similar to the log_tcp_complete saved by Tstat.

The following columns are found in these traces:

############################################################################
# C2S # S2C # Short description      # Unit  # Long description            #
############################################################################
#  1  # 45  # Client/Server IP addr  # -     # IP addresses of the client/server
#  2  # 46  # Client/Server TCP port # -     # TCP port addresses for the client/server
#  3  # 47  # packets                # -     # total number of packets observed form the client/server
#  4  # 48  # RST sent               # 0/1   # 0 = no RST segment has been sent by the client/server
#  5  # 49  # ACK sent               # -     # number of segments with the ACK field set to 1
#  6  # 50  # PURE ACK sent          # -     # number of segments with ACK field set to 1 and no data
#  7  # 51  # unique bytes           # bytes # number of bytes sent in the payload
#  8  # 52  # data pkts              # -     # number of segments with payload
#  9  # 53  # data bytes             # bytes # number of bytes transmitted in the payload, including retransmissions
# 10  # 54  # rexmit pkts            # -     # number of retransmitted segments
# 11  # 55  # rexmit bytes           # bytes # number of retransmitted bytes
# 12  # 56  # out seq pkts           # -     # number of segments observed out of sequence
# 13  # 57  # SYN count              # -     # number of SYN segments observed (including rtx)
# 14  # 58  # FIN count              # -     # number of FIN segments observed (including rtx)
# 15  # 59  # RFC1323 ws             # 0/1   # Window scale option sent
# 16  # 60  # RFC1323 ts             # 0/1   # Timestamp option sent
# 17  # 61  # window scale           # -     # Scaling values negotiated [scale factor]
# 18  # 62  # SACK req               # 0/1   # SACK option set
# 19  # 63  # SACK sent              # -     # number of SACK messages sent
# 20  # 64  # MSS                    # bytes # MSS declared
# 21  # 65  # max seg size           # bytes # Maximum segment size observed
# 22  # 66  # min seg size           # bytes # Minimum segment size observed
# 23  # 67  # win max                # bytes # Maximum receiver window announced (already scale by the window scale factor)
# 24  # 68  # win min                # bytes # Maximum receiver windows announced (already scale by the window scale factor)
# 25  # 69  # win zero               # -     # Total number of segments declaring zero as receiver window
# 26  # 70  # cwin max               # bytes # Maximum in-flight-size (see Tstat docs)
# 27  # 71  # cwin min               # bytes # Minimum in-flight-size
# 28  # 72  # initial cwin           # bytes # First in-flight size, or total number of unack-ed bytes sent before receiving the first ACK segment
# 29  # 73  # Average rtt            # ms    # Average RTT computed measuring the time elapsed between the data segment and the corresponding ACK
# 30  # 74  # rtt min                # ms    # Minimum RTT observed during connection lifetime
# 31  # 75  # rtt max                # ms    # Maximum RTT observed during connection lifetime
# 32  # 76  # Stdev rtt              # ms    # Standard deviation of the RTT
# 33  # 77  # rtt count              # -     # Number of valid RTT observation
# 34  # 78  # ttl_min                # -     # Minimum Time To Live
# 35  # 79  # ttl_max                # -     # Maximum Time To Live
# 36  # 80  # rtx RTO                # -     # Number of retransmitted segments due to timeout expiration
# 37  # 81  # rtx FR                 # -     # Number of retransmitted segments due to Fast Retransmit (three dup-ack)
# 38  # 82  # reordering             # -     # Number of packet reordering observed
# 39  # 83  # net dup                # -     # Number of network duplicates observed
# 40  # 84  # unknown                # -     # Number of segments not in sequence or duplicate which are not classified as specific events
# 41  # 85  # flow control           # -     # Number of retransmitted segments to probe the receiver window
# 42  # 86  # unnece rtx RTO         # -     # Number of unnecessary transmissions following a timeout expiration
# 43  # 87  # unnece rtx FR          # -     # Number of unnecessary transmissions following a fast retransmit
# 44  # 88  # != SYN seqno           # 0/1   # 1 = retransmitted SYN segments have different initial seqno
############################################################################
# 89        # Completion time        # ms    # Flow duration since first packet to last packet
# 90        # First time             # ms    # Flow first packet since first segment ever
# 91        # Last time              # ms    # Flow last segment since first segment ever
# 92        # C first payload        # ms    # Client first segment with payload since the first flow segment
# 93        # S first payload        # ms    # Server first segment with payload since the first flow segment
# 94        # C last payload         # ms    # Client last segment with payload since the first flow segment
# 95        # S last payload         # ms    # Server last segment with payload since the first flow segment
# 96        # C first ack            # ms    # Client first ACK segment (without SYN) since the first flow segment
# 97        # S first ack            # ms    # Server first ACK segment (without SYN) since the first flow segment
# 98        # First time abs         # ms    # Flow first packet absolute time (epoch)
# 99        # C Internal             # 0/1   # 1 = client has internal IP, 0 = client has external IP
# 100       # S Internal             # 0/1   # 1 = server has internal IP, 0 = server has external IP
############################################################################
# 101       # Connection type        # -     # Bitmask stating the connection type as identified by TCPL7 inspection engine (see protocol.h)
############################################################################
# 102       # P2P type               # -     # Type of P2P protocol, as identified by the IPP2P engine (see ipp2p_tstat.h)
# 103       # P2P subtype            # -     # P2P protocol message type, as identified by the IPP2P engine (see ipp2p_tstat.c)
# 104       # ED2K Data              # -     # For P2P ED2K flows, the number of data messages
# 105       # ED2K Signaling         # -     # For P2P ED2K flows, the number of signaling (not data) messages
# 106       # ED2K C2S               # -     # For P2P ED2K flows, the number of client<->server messages
# 107       # ED2K C2C               # -     # For P2P ED2K flows, the number of client<->client messages
# 108       # ED2K Chat              # -     # For P2P ED2K flows, the number of chat messages 
############################################################################
# 109       # HTTP type              # -     # For HTTP flows, the identified Web2.0 content (see the http_content enum in struct.h)
############################################################################

Specifically for this analysis, the following extra columns were added:

############################################################################
# 110       # C2S messages           # -     # 
# 111       # S2C messages           # -     # 
# 112       # DB host_int            # -     #
# 113       # DB service             # -     #
############################################################################