Dropbox Crawler

From SimpleWiki
Revision as of 14:27, 7 January 2013 by Idiliod (talk | contribs)
Jump to navigationJump to search

Personal cloud storage is becoming more and more popular - Dropbox is certainly the best known example. Cloud storage already generates a huge amount of Internet traffic. Because of that, understanding how people interact with such applications is essential for designing efficient cloud storage systems.

We have been doing research on the usage of Dropbox (see our results here). As a next step, we need to know what type of files people store in the service. This would allow us to understand the impact of some technologies on the system performance and on network traffic, among other things. For that, we need volunteers to provide us basic statistics (size, type etc) about files stored in their folders.

Be part of the crowd: Help our research

All you need to do is run a Java application at your PC. This application will read your Dropbox folder, calculate some statistics, show everything to your approval and, only after that, send the statistics to us.

  • Most people will be able to run the application by clicking here
  • In case your browser does not support that, you can download the package and run it: Just double click on it!


What will be captured?

What we do:

We will read all your DropBox Folder; We will collect basic statistics (log format can be viewed in the following); We will send these statistics to our web server.


What we DO NOT do:

We do not copy any file content; We do not copy file or folder name; We do not copy any personal information; We do not install or store anything in your computer.


Client source code

Download the Java Source Code to Capture Files Information The Project may be used direct in NetBeans, version 7.2.1


Policy

We ensure that:

All data we collect are anonymized. We do not copy any file content. We do not collect any personal information and file/dir names.


We also will make our data publicity in a near future. Thus, anyone will be able to use this important data source.

Format

All files are in a simple format. Each line has files attributes, separeted by #.

The following columns are found in these traces:

############################################################################
#     #     # Short description      # Unit  # Long description            #
############################################################################
#  1  #     # Lenght                 # -     # File Size in Bytes
#  2  #     # Modified               # -     # Last modification on file (Unix date/time format)
#  3  #     # MIME                   # -     # File Mime Type using Magic Java Unit
#  4  #     # EXTENSION              # -     # File extension (substring after the last "." on the string)
#  5  #     # MD5                    # -     # MD5 hash code of the initial/final 8 bytes of the file.
#  6  #     # MD5 of the name        # -     # MD5 hash code of file name string.
############################################################################


More information

  • You can find more information on our previous work about Dropbox:

Drago, I. and Mellia, M. and Munafò, M. M. and Sperotto, A. and Sadre, R. and Pras, A. (2012) Inside Dropbox: Understanding Personal Cloud Storage Services. Proceedings of the 12th ACM Internet Measurement Conference - IMC'12, Boston, Nov. 2012

  • This page has more information about the data we used in our research so far.

External Links

These institutes are running this research: