Difference between revisions of "Cloud benchmarks"

From SimpleWiki
Jump to navigationJump to search
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
This page contains the software and data presented in the following paper:
+
This page contains the software and data presented in:
  
* [http://conferences.sigcomm.org/imc/2013/papers/imc092-dragoA.pdf "Benchmarking Personal Cloud Storage"] by Idilio Drago, Enrico Bocchi, Marco Mellia, Herman Slatman and Aiko Pras. In Proceedings of the 13th ACM Internet Measurement Conference. IMC 2013.
+
* Preliminary work: [http://conferences.sigcomm.org/imc/2013/papers/imc092-dragoA.pdf "Benchmarking Personal Cloud Storage"] by Idilio Drago, Enrico Bocchi, Marco Mellia, Herman Slatman and Aiko Pras. In Proceedings of the 13th ACM Internet Measurement Conference. IMC 2013.
  
This paper is a continuation of our work on personal cloud storage. Previous results can be found [[Dropbox Traces|on this page]] and [[Dropbox Crawler|on this page]].  
+
* Extended version: [http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=7096995 "Personal Cloud Storage Benchmarks and Comparison"] by Enrico Bocchi, Idilio Drago, Marco Mellia. In IEEE Transactions on Cloud Computing.
  
The slides of the presentation can be downloaded from [[File:Talk_imc2013.pdf||here]].  
+
More results of our work on personal cloud storage can be found [[Dropbox Traces|on this page]] and [[Dropbox Crawler|on this page]].
 +
 
 +
The slides of the presentation can be downloaded from [http://www.simpleweb.org/w/images/0/07/Talk_imc2013.pdf here].
  
 
== Benchmarks Scripts ==
 
== Benchmarks Scripts ==
  
* [http://traces.simpleweb.org/cloud_benchmarks/benchmarking.tar.gz Download the scripts for running the benchmarks]
+
* [http://traces.simpleweb.org/cloud_benchmarks/benchmark_tools.tar.gz Download the scripts for running the benchmarks and post-process the data]
  
The scripts are written in python. All scripts require: netifaces, pcapy
+
The scripts are written in python. See the README for instructions on how to execute the benchmarks
  
=== How to execute the benchmarks ===
+
== Traces ==
 
 
  sudo ./delta_encoding.py -i wlan0 --seed 123134 --bytes 10000 --test 3 -o /tmp/output/ --ftp 1.1.1.1 --port 2121 --user "user_name" --passwd "password" --folder="."
 
  
Important remarks:
+
Some traffic traces that generated the results in the IMC paper can be downloaded from these links:
  
1 - The folder <nowiki>ftp://user:pass@server/folder/</nowiki> '''must''' be in a synchronized folder of the storage tool.
+
{| class="wikitable" style="text-align: center; width: 400px; height: 100px;"
 +
|-
 +
! scope="col" | Provider
 +
! scope="col" | File Size
 +
|-
 +
! scope="row" | [http://traces.simpleweb.org/cloud_benchmarks/cloud_drive.tar.gz Amazon Cloud Drive]
 +
|  197M
 +
|-
 +
! scope="row" | [http://traces.simpleweb.org/cloud_benchmarks/dropbox.tar.gz Dropbox]
 +
|  88M
 +
|-
 +
! scope="row" | [http://traces.simpleweb.org/cloud_benchmarks/gdrive.tar.gz Google Drive]
 +
|  70M
 +
|-
 +
! scope="row" | [http://traces.simpleweb.org/cloud_benchmarks/skydrive.tar.gz Microsoft SkyDrive]
 +
|  69M
 +
|-
 +
! scope="row" | [http://traces.simpleweb.org/cloud_benchmarks/wuala.tar.gz LaCie Wuala]
 +
|  63M
 +
|}
  
2 - The file delta_encoding.py '''must not''' be in a synchronized folder, otherwise the .pyc files created at run-time will disturb the experiment.
+
These traces produce the results in Figure 7 of the paper. More details about this dataset can also be obtained in Chapter 5 of:
  
3 - The folder /tmp/output/ '''must not''' be in a synchronized folder, for the same reasons as above.
+
* Drago, I. (2013) [http://eprints.eemcs.utwente.nl/24136/ "Understanding and Monitoring Cloud Services"]. PhD thesis, University of Twente. CTIT Ph.D. thesis Series No. 13-279. ISBN 978-90-365-3577-9.
  
4 - Disable as much processes as possible in the benchmarking machine. This will minimize external interference on the test.
+
== Traffic Identification Example ==
 
 
5 - If the storage system is running on a virtual machine, make sure the host machine is powerful enough to support the load. Check also whether the virtual machine limit or shape the network traffic.
 
 
 
6 - Post-processing scripts will be made available soon.
 
 
 
== Traces ==
 
  
The traffic traces that generated the results in the paper will be made available soon.
+
The identification of the cloud storage traffic in the testbed can be done using the FQDNs that cloud storage clients use to contact to servers. This can be achieved, for example, by means of the methodology implemented in Tstat DN-Hunter. [http://www.simpleweb.org/w/images/3/3c/Cloud_storage_fqdn.txt.zip Here we post a list of FQDNs used by three popular providers as an example].  
  
 
== Acceptable Use Policy ==
 
== Acceptable Use Policy ==
Line 44: Line 57:
 
     booktitle    = {Proceedings of the 13th ACM Internet Measurement Conference},
 
     booktitle    = {Proceedings of the 13th ACM Internet Measurement Conference},
 
     series        = {IMC'13},
 
     series        = {IMC'13},
 +
    pages        = {205--212},
 
     year          = {2013}
 
     year          = {2013}
 
   }
 
   }
  
== Paper abstract ==
+
== Acknowledgments ==
  
Personal cloud storage services are data-intensive applications already producing a significant share of Internet traffic. Several solutions offered by different companies attract more and more people. However, little is known about each service capabilities, architecture and - most of all - performance implications of design choices. This paper presents a methodology to study cloud storage services. We apply our methodology to compare 5 popular offers, revealing different system architectures and capabilities. The implications on performance of different designs are assessed executing a series of benchmarks. Our results show no clear winner, with all services suffering from some limitations or having potential for improvement. In some scenarios, the upload of the same file set can take seven times more, wasting twice as much capacity. Our methodology and results are useful thus as both benchmark and guideline for system design.
+
This work was partly funded by the Network of Excellence project Flamingo (ICT-318488) and the EU-IP project mPlane (n-318627). Both projects are supported by the European Commission under its Seventh Framework Programme.

Latest revision as of 08:30, 21 May 2015

This page contains the software and data presented in:

  • Preliminary work: "Benchmarking Personal Cloud Storage" by Idilio Drago, Enrico Bocchi, Marco Mellia, Herman Slatman and Aiko Pras. In Proceedings of the 13th ACM Internet Measurement Conference. IMC 2013.

More results of our work on personal cloud storage can be found on this page and on this page.

The slides of the presentation can be downloaded from here.

Benchmarks Scripts

The scripts are written in python. See the README for instructions on how to execute the benchmarks

Traces

Some traffic traces that generated the results in the IMC paper can be downloaded from these links:

Provider File Size
Amazon Cloud Drive 197M
Dropbox 88M
Google Drive 70M
Microsoft SkyDrive 69M
LaCie Wuala 63M

These traces produce the results in Figure 7 of the paper. More details about this dataset can also be obtained in Chapter 5 of:

Traffic Identification Example

The identification of the cloud storage traffic in the testbed can be done using the FQDNs that cloud storage clients use to contact to servers. This can be achieved, for example, by means of the methodology implemented in Tstat DN-Hunter. Here we post a list of FQDNs used by three popular providers as an example.

Acceptable Use Policy

  • When writing a paper using software or data from this page, please cite:
 @inproceedings{drago2013_imc,
   author        = {Idilio Drago and Enrico Bocchi and Marco Mellia and Herman Slatman and Aiko Pras},
   title         = {Benchmarking Personal Cloud Storage},
   booktitle     = {Proceedings of the 13th ACM Internet Measurement Conference},
   series        = {IMC'13},
   pages         = {205--212},
   year          = {2013}
 }

Acknowledgments

This work was partly funded by the Network of Excellence project Flamingo (ICT-318488) and the EU-IP project mPlane (n-318627). Both projects are supported by the European Commission under its Seventh Framework Programme.