Monday, August 6, 2018

Notes on snappydata

A new IMDG focus on OLTP + OLAP, streaming
the core idea is they use Spark + Gemfire

Spark for OLAP solutions
Gemfire for OLTP solutions
When data be imported, it can be built on row/column based tables

how did they do updates for column based data?

they use delta row update (merge updates in memory) and do lazy update to column based data

each data will be stored in memory, and can overflow to disk when memory is tight

Sincerely, Yuan

Sunday, October 9, 2016

Notes MySQL on Ceph

Key takeaways:

- Ceph works well as IOPS-intensive cloud storage for MySQL and similar databases.

- LAMP is the 1^st majority application for OpenStack, MySQL on Ceph provides lower TCO than AWS

- MySQL performance on SSD-backed Ceph(RBD + filestore based) looks quite good.

- For read, containerized MySQL applications has some performance parity with baremetal based setup, and is much better VM based setup

- For write, performance is almost the same on 3 different setup(bottleneck on filestore?). Tunings on QEMU IO part does not help on the write part.

Some detail notes:

· Diverse organizations are now looking to model the very successful public cloud Database-as-a-Service (DaaS) experiences with private or hybrid clouds — using their own hardware and resources.

· Any storage for MySQL must provide low latency IOPS throughput

· People can choose SSD-backed Ceph RBD at a price point that is even more favorable than public cloud offerings

· Cinder driver trends: RBD is the default reference driver since Apr 2016, along with LVM

· Pros and cons for different configurations of MySQL deployment

· Hardware configuration for OSD server:

o networking: 10Gb

o memory: 32GB

o CPU: E5-2650 V4

o OSD storage: P3700 800GB x 1

o OS storage: S3510 80GB Raid1

· Architecture consideration for MySQL on Ceph

· Public cloud performance (SSD)

· Tests showed one NVMe disk per 10 cores can provide ideal level of performance

· Ceph provides much lower TCO than AWS

· Testing Env

o 5x OSD nodes

§ 2x E5-2650 v3(10 core)

§ 32GB memory

§ 2x 80GB OS storage

§ 4x P3700 (800GB) for OSD storage

§ 10GB nic x2

§ 8x 6TB SAS(not used?)

o 3x monitors

§ 2x E5-2630 v3(8 core)

§ 64GB mem

§ 2x 80GB for OS storage

§ Internal P3700(800GB)

§ 10Gb X 2

o 12x Clients

§ 2x E5-2670 v2

§ 64GB mem

o Software tuning/config:

§ sysbench

§ percona MySQL server

· buffer pool: 20% of datasheet(12.8G?)

· flushing: innodb_ﬂush_log_at_trx_commit = 0

· parallel double write buffer: based on percona tests, the performance is better than community mysql (https://www.percona.com/blog/2016/05/09/percona-server-5-7-parallel-doublewrite/ )

§ Ceph:

· network stack:

o net.ipv4.ip_forward=1
net.core.wmem_max=125829120
net.core.rmem_max=125829120
net.ipv4.tcp_rmem= 10240 87380 125829120
net.ipv4.tcp_wmem= 10240 87380 125829120
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
net.core.netdev_max_backlog = 10000
vm.swappiness=1

· filestore:

o flestore_xattr_use_omap = true
flestore_wbthrottle_enable = false
flestore_queue_max_byes = 1048576000
flestore_queue_committing_max_bytes = 1048576000
flestore_queue_max_ops = 5000
flestore_queue_committing_max_ops = 5000
flestore_max_sync_interval = 10
flestore_fd_cache_size = 64
flestore_fd_cache_shards = 32
flestore_op_threads = 6

· Journal:

o journal_max_write_entries = 1000
journal_queue_max_ops = 3000
journal_max_write_bytes = 1048576000
journal_queue_max_bytes = 1048576000

· MISC:

o # Op tracker
osd_enable_op_tracker = false
# OSD Client
osd_client_message_size_cap = 0
osd_client_message_cap = 0
# Objector
objecter_inﬂight_ops = 102400
objector_inﬂight_op_bytes = 1048576000
# Throttles
ms_dispatch_throttle_bytes = 1048576000
# OSD Threads
osd_op_threads = 32
osd_op_num_shards = 5
osd_op_num_threads_per_shard = 2

· Testing results:

o Performance comparision of mysql on

§ baremetal + KRBD

§ containers + KRBD

§ VM + qemu rbd + various qemu IO settings(QEMU event loop optimization across multiple VMs: see https://github.com/qemu/qemu/blob/master/docs/multiple-iothreads.txt, http://blog.vmsplice.net/2011/03/qemu-internals-overall-architecture-and.html )

Thanks, -yuan

Monday, May 9, 2016

quick tests of Swift on Go

So I did some quick tests of swift on go in the weekends, here's the detail steps

deploy golang env
- wget latest golang package
- GOPATH
- PATH
Setup SAIO
- stop proxy/object server
git checkout feature/hummingbird
- make get test all
- make install
stop proxy/object server, and start hummingbird one
- hummingbird start proxy
- hummingbird start object
looks like the current code is still not stable
- PUT hangs on the some failure: index out of range

Sincerely, Yuan

Friday, January 30, 2015

Using Multiple Backends in Openstack Swift

OpenStack Swift is a highly available, distributed, eventually consistent object/blob store. Object Storage is ideal for cost effective, scale-out storage. It provides a fully distributed, API-accessible storage platform that can be integrated directly into applications or used for backup, archiving, and data retention. For more information please refer http://docs.openstack.org/developer/swift/.

Since V2.0 Swift supports multiple storage policies, which allows for some level of segmenting the cluster for various purposes through the creation of multiple object rings. There is a separate ring for account databases, container databases, and there is also one object ring per storage policy. By supporting multiple object rings, Swift allows the application and/or deployer to essentially segregate the object storage within a single cluster. However Swift has another great feature that supports pluggable backends since Juno release. Thanks to the highly abstracted DiskFile API in object server, storage venders are able to use different backend storage to store objects easily. There are several common things among these projects:

These projects are implemented as some new WSGI application of object server. Swift DiskFile abstraction is the engine of these multiple backend solutions.
These projects are trying to leverage Swift/S3 APIs to be able to join the object storage market or the OpenStack ecosystem.
Currently these projects are mostly in POC state and not very active in development.

Local disk backend

By default Swift will use local disks as the storage devices in object servers. In this implementation, a user uploaded file will be stored individually in the local filesystem on top of the disks. The metadata will be stored as a filesystem attribute along with the file. This requires a filesystem that supports Extended File Attributes, like xfs or ext4.

The DiskFile API in object server was a set of RESTFul interfaces, like READ, WRITE and DELETE. In this local disk backend, these interfaces were mostly implemented with the POSIX APIs. E.g., a WRITE request will call os.write() in python.

To use this backend, you only need to copy the sample object-server.conf. Note that the default WSGI Application should be:

[app:object-server]

use = egg: swift#object'

The other backend solutions need to implement these interfaces with their own interfaces.

In-memory backend

This is a sample implementation exists in Swift distribution. In this implementation, a user uploaded file will be stored in an in-memory hash table (python dict), along with its metadata. Each key is the combination of account, container and the object name. The corresponding value is the contents of the object and its metadata.

filesystem[name] = {data, metadata}

A PUT request in DiskFile would be a simple python dict update. This solution is more a prototype currently, which is not suitable for the production environment. As we can see all of the data will be lost if the object servers are shutdown.

To use this backend, you need to change the default WSGI Application in object-server.conf to:

[app:object-server]

use = egg: swift#mem_object'

And then restart the object servers.

Swift-Ceph backend

Currently this is a stackforge project initiated by eNovance. This implementation uses Ceph as the storage devices for Swift. Swift objects rings are configured as 1x copy only while Ceph can be configured as 3x copies. This means from the view of Swift, only 1 copy of object is stored in cluster. However in Ceph cluster, there will be 3 copies of the object and Ceph will do the consistency/replication work. The general design is a new derived class from DiskFile which translates Swift read/write into rados objects read/write using librados. An object in Swift will be stored as a file in Ceph, with its name as the combination of account, container and object name. The account/container DBs are stored in Swift as origin for now. The project also has a plan to store these SQLite DBs to Ceph also later.

This solution is implemented as a WSGI application. To use this backend, you need to install the swift-ceph-backend project and change the default WSGI Application in object-server.conf to

[app:object-server]

use = egg: swift_ceph_backend#rados_object'

And then restart the object servers.

Swift-On-File backend

Swift-on-File project is also a stackforge project started by Redhat. Currently it is a Swift Object Server implementation that enables users to access the same data, both as an object and as a file. Data can be stored and retrieved through Swift's REST interface or as files from NAS interfaces including native GlusterFS, NFS and CIFS.

To use this backend, you need to install swiftonfile project and then change the default WSGI Application in object-server.conf to

[app:object-server]

use = egg:swiftonfile#object

You also need to mount one NFS partition/GlusterFS volume mounted as /mnt/swiftonfile

Then object-ring is configured as 1 copy only. All the consistency/replication work are handled in GlusterFS/NFS layer.

Seagate kinetics backend

Swift over Seagate is a project started by SwiftStack and Seagate. Currently it’s still under experimenting with the beta Kinetic library. Swift clusters using Kinetic drives allow access to any drives and, thus, any object. For the current Kinetic integration, a fraction of Object Server commands (Object Daemon) are embedded within the Proxy Server acting as a logical construct, shown below.

https://developers.seagate.com/download/attachments/1769521/Current%20Swift%3AKinetic.jpg?version=1&modificationDate=1381516991000&api=v2

There’re some other deployment with kinetic devices. As this project is still under development, there aren’t much documents ready. You need to check the latest code for the details.

References:

https://swiftstack.com/blog/2014/02/04/swift-extensibility/

https://github.com/stackforge/swift-ceph-backend

https://github.com/stackforge/swiftonfile

https://github.com/swiftstack/kinetic-swift

https://developers.seagate.com/display/KV/OpenStack+Swift

Monday, November 24, 2014

Intel 320 SSD 8M bug

Sincerely, Yuan

Tuesday, November 19, 2013

Nov'13 OS Design Summit Notes on Swift

*Storage Polices
John introduced the recent changes in Swift side, DsikFile and DBBroker and Storage Polices feature. With this feature, cloud service provider could provide more flex service, like auto tiering. Paul give a demo to show how a cluster servicing with 2 & 3 replicas at the same time. People showed great interests in this feature. I think this should be the most important feature in Icehouse.

* Swift profiling middleware and tools

Zhang Hua from IBM Beijing demonstrated one profiling tool for Swift. Actually it's a middleware that collects all of the function calls to some temp file and then do post-process to get the CPU/memory/IO time/latency. It should be quite useful for developers/system administrators to get the bottleneck of the current setup. I checked about the overhead and it seems like should be OK if you're investigating on large I/O pattern. In addition, this tools can do the latency breakdown work, which is exactly we want!

*Swift drive workloads and Kinetic opens storage

Seagate shared their new tech on disks: a normal SATA disk with an small controller add-on (something like a NAS product). And they're trying to eliminate the file system layer by using NoSQL tech through network to get/put the data. There're lots of concerns on the missing functionality on NoSQL side compared to Filesystem. I checked about the Swift integration on this and it seems Swift will only use this for object-server - the account/container server will remain the same. I think this could be some good direction however the it will took a long time to use.

*Plugging in Backend Swift Services

Clay showed some key code on how to use the DiskFile/DBBroker class on Swift-kinetic case. it seems that this is quite easy but I guess the code is not working actually. Later I met Catlin from Nexenta, who is the author of 1^st version LFS for Swift. Her opinion is the Swift is not so efficient even if you have a powerful backend. She said they have given up on this. I thought this is a great feature, that allows other storage venders to plug in their produces in to Swift. Also some of the performance issue might be got fixed, like LOSF issue.

*Swift operation experiences with hot contents

One engineer form KT shared a lot experience on how they cache 'hot contents' on several level: CDN, Proxy-server, Obj-server. They're proposing some middleware inside Obj-server to cache objects dynamically. And they are improving SOS to better/easier work with CDN. The key issue is how to identify hot contents? they don't find a good way to do this and is asking for help. Actually this is the key question we met in POC version EC side.

link: https://etherpad.OpenStack.org/p/Swift-kt

*Supporting a global online messing service

one engineer from Line starts with some of the key requirements on Swift. Most of the requirements are not inside the scope of Swift. I guess there needs some external S/W to handle this. e.g. they want to support 1 billion users, the authentication part became the bottleneck before request went to Swift. They're using custom application with Redis currently. In general this is a good session and lots of requirements are raised.

link: https://etherpad.OpenStack.org/p/Swift-for-messenger

* Making Swift more robust to handling failure.

Chuck from RAX shared some thoughts on Swift error handling. He's proposing that we should improve the reliability and minimize the consistency window. overall this is a great work on consistency service. I also raised our work on Flashcache to him and he mentioned some similar work is on going inside RAX.

link: https://etherpad.OpenStack.org/p/icehouse_Swift_robust

* Metadata search

Engineers from HP did a great work. They patched Swift to store Account/container/obj-metadata in NoSQL database. And they encourage community work on this. It seems that this needs some middle-ware work.

link: https://etherpad.OpenStack.org/p/icehouse-Swift-metadata-search

Sincerely, Yuan

Wednesday, December 26, 2012

域名更换zhouyuan.me

用了三年的域名，总算换了个更好的了。
新起点，有新的进步。

life4rent

@Shanghai