Author(s):

  • Rabl, Tilmann
  • Sadoghi, Mohammad
  • Jacobsen, Hans-Arno
  • Gómez-Villamor, Sergio
  • Muntés-Mulero, Victor
  • Mankowskii, Serge

Abstract:

As the complexity of enterprise systems increases, the need for monitoring and analyzing such systems also grows. A number of companies have built sophisticated monitoring tools that go far beyond simple resource utilization reports. For example, based on instrumentation and specialized APIs, it is now possible to monitor single method invocations and trace individual transactions across geographically distributed systems. This high-level of detail enables more precise forms of analysis and prediction but comes at the price of high data rates (i.e., big data). To maximize the benefit of data monitoring, the data has to be stored for an extended period of time for ulterior analysis. This new wave of big data analytics imposes new challenges especially for the application performance monitoring systems. The monitoring data has to be stored in a system that can sustain the high data rates and at the same time enable an up-to-date view of the underlying infrastructure. With the advent of modern key-value stores, a variety of data storage systems have emerged that are built with a focus on scalability and high data rates as predominant in this monitoring use case. In this work, we present our experience and a comprehensive performance evaluation of six modern (open-source) data stores in the context of application performance monitoring as part of CA Technologies initiative. We evaluated these systems with data and workloads that can be found in application performance monitoring, as well as, on-line advertisement, power monitoring, and many other use cases. We present our insights not only as performance results but also as lessons learned and our experience relating to the setup and configuration complexity of these data stores in an industry setting.

Document:

https://doi.org/10.14778/2367502.2367512

References:
  1. Application response measurement (arm) issue 4.1 v1. http://www.opengroup.org/management/arm/.
  2. The real overhead of managing application performance. http://www.appdynamics.com/blog/2011/05/23/the-real-overhead-of-managing-application-performance/.
  3. J. Buten. Performance & security of applications, tools, sites, & social networks on the internet. In Health 2.0 Europe, 2010.
  4. R. Cartell. Scalable sql and nosql data stores. SIGMOD Record, 39(4):12–27, 2010.
  5. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. In OSDI, pages 205–218, 2006.
  6. B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. Pnuts: Yahoo!’s hosted data serving platform. PVLDB, 1(2):1277–1288, 2008.
  7. B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In SoCC, pages 143–154, 2010.
  8. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon’s highly available key-value store. In SOSP, pages 205–220, 2007.
  9. M. Driscoll. One billion rows a second: Fast, scalable olap in the cloud. In XLDB, 2011.
  10. D. Erdody. Choosing a key-value storage system (cassandra vs. voldemort). http://blog.medallia.com/2010/05/-choosing_a_keyvalue_storage_sy.html.
  11. R. Field. Java virtual machine profiling interface specification (jsr-163). http://jcp.org/en/jsr/summary?id=163.
  12. C. Gaspar. Deploying nagios in a large enterprise environment. In LISA, 2007.
  13. S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. In SOSP, pages 29–43, 2003.
  14. J. Hugg. Key-value benchmarking. http://voltdb.com/company/blog/key-value-benchmarking.
  15. P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-free coordination for internet-scale systems. In USENIX ATC, pages 145–158, 2010.
  16. L. H. Jeong. Nosql benchmarking. http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/.
  17. R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-store: a high-performance, distributed main memory transaction processing system. PVLDB, 1(2):1496–1499, 2008.
  18. I. Konstantinou, E. Angelou, C. Boumpouka, D. Tsoumakos, and N. Koziris. On the elasticity of nosql databases over cloud management platforms. In CIKM, pages 2385–2388, 2011.
  19. A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. SIGOPS Operating Systems Review, 44(2):35–40, 2010.
  20. M. L. Massie, B. N. Chun, and D. E. Culler. The ganglia distributed monitoring system: Design, implementation, and experience. Parallel Computing, 30(7):817–840, 2004.
  21. MySQL. MySQL. http://www.mysql.com/.
  22. S. Patil, M. Polte, K. Ren, W. Tantisiriroj, L. Xiao, J. López, G. Gibson, A. Fuchs, and B. Rinaldi. Ycsb++: benchmarking and performance debugging advanced features in scalable table stores. In SoCC, pages 9:1–9:14, 2011.
  23. P. Pirzadeh, J. Tatemura, and H. Hacigumus. Performance evaluation of range queries in key value stores. In IPDPSW, pages 1092–1101, 2011.
  24. S. Sanfilippo. Redis. http://redis.io/.
  25. Z. Shao. Real-time analytics at facebook. In XLDB, 2011.
  26. B. H. Sigelman, L. A. Barroso, M. Burrows, P. Stephenson, M. Plakal, D. Beaver, S. Jaspan, and C. Shanbhag. Dapper, a large-scale distributed systems tracing infrastructure. Technical Report dapper-2010-1, Google Inc., 2010.
  27. The Apache Software Foundation. Apache Hadoop. http://hadoop.apache.org/.
  28. The Apache Software Foundation. Apache HBase. http://hbase.apache.org/.
  29. Voldemort. Project Voldemort. http://project-voldemort.com/.
  30. VoltDB. VoltDB. http://voltdb.com/.

The SELF Institute