https://www.datadoghq.com/2014/03/monitoring-couchbase-performance-datadog/

 

Monitor key Couchbase metrics

March 31st, 2014 by Justin Slattery

·

·

Justin Slattery @jdslatts is the Sr. Director of Software Development at MLS Digital and runs his own blog, fzysqr.

At Major League Soccer, we have been using Datadog in production for almost a year. Datadog has become our exclusive performance monitoring and graphing tool because it strikes the right balance between ease of use, flexibility, and extensibility and provides our team with tremendous leverage.

We love the fact that the Datadog team decided to make their agent an open-source project. This makes it super simple to create your own custom checks and contribute them back to the community. We did just that six months ago when we wrote a new check for Couchbase. The Couchbase integration we developed was based off of the existing CouchDB version. The custom check simply iterates through every possible metric available through the Couchbase REST API.

What is Couchbase?

If you haven’t heard of it before, Couchbase is a distributed NoSQL database. Despite a similar name and shared heritage, Couchbase is a very different product than the more widely recognized CouchDB. I won’t go into the differences between the two here, but if you haven’t heard of it before, Couchbase certainly is worth checking out. We have built several products on top of it, including our API and our real-time matchcenter Golazo.

clip_image001

Being able to monitor and profile Couchbase performance alongside our application metrics has been critical to identify and resolve performance and availability issues in our products.

Key Couchbase Metrics to Monitor

To monitor Couchbase efficiently we need two different perspectives: the cluster as a whole and individual application buckets.

1. At the cluster level, we want to identify which buckets are consuming the most resources.

2. At the application level we want to know how many requests are not handled by upstream caching and are triggering Couchbase operations.

For cluster monitoring, we break metrics out by bucket so we can identify which buckets are under the most load. For application monitoring, we filter down to the appropriate buckets.

With Datadog we monitor the following metrics. For each metric you will find a short summary of what it measures, how to query for it in Datadog and an example to illustrate the metric.

Operations per second

In Datadog: couchbase.by_bucket.ops by {bucket}

What this measures: This straightforward metric simply measures the total number of gets, sets, incrs, and decrs happening on the bucket. This does not include any view operations. This measurement makes it easy to see which app/bucket is getting the most traffic and is helpful for capacity planning and issue triage.

clip_image003Easy to see which app/bucket is getting the most traffic.

View operations per second

In Datadog: couchbase.by_bucket.couch_views_ops by {bucket}

What this measures: In Couchbase, views are precomputed MapReduce index functions. This metric measures how many reads the views in each bucket are getting.

clip_image005What app is abusing views the most?

Current connections

In Datadog: couchbase.by_bucket.curr_connections by {host}

What this measures: This metric simply counts the number of connections per host. We use this metric to make sure we don’t have anything unexpected in our environment configuration such as forgetting to add one of the Couchbase nodes to the load balancer.

Total objects

In Datadog: couchbase.by_bucket.curr_items by {bucket}

What this measures: This metric counts the total number of stored objects per bucket. We watch it to track growth rates of our buckets. A few of our buckets should never grow beyond a few thousand objects so increasing numbers on this graph would be a warning sign.

We actually just caught a serious problem in Golazo thanks to this metric. A runaway process started adding new objects to the bucket at an alarming rate. The graph below helped us catch the issue before it could cause an outage.

clip_image007Uh-oh, something doesn’t look right here…

Resident item ratio

In Datadog: couchbase.by_bucket.vb_active_resident_items_ratio by {bucket}

What this measures: This number represents the ratio of items that are kept in memory versus stored on disk.

The expected value of this metric will vary by application. We expect some of our apps to stay around 100% and others hover more around 10%. Ideally you want this metric as close to 100% as possible so that your app’s most active objects are “hot” and won’t invoke a (much) slower disk read when requested.

clip_image009The higher, the better, but each app will be different.

Memory Headroom

In Datadog: couchbase.by_bucket.ep_mem_high_wat by {bucket} - couchbase.by_bucket.mem_used by {bucket}

What this measures: If the memory used is at the high water mark, then active objects will be ejected. Keeping track of this value gives you an indication of when you need to allocate more memory to a bucket. The bright line below shows that one of our buckets has no headroom. Not good.

clip_image011
One of these buckets has run out of memory…

Cache miss ratio

In Datadog: couchbase.by_bucket.ep_bg_fetched by {bucket} / (couchbase.by_bucket.cmd_get by {bucket} * 100)

What this measures: This composite metric counts the ratio requested objects fetched from disk as opposed to memory. This number should be as close to 0 as possible. You can use it in conjunction with the resident items ratio and memory headroom metrics to understand if your bucket has enough capacity to store the most requested objects in memory.

The example below shows what it looks like when a bucket starts to run out of capacity to keep all active items in memory. This is the same bucket as above.

clip_image013Anything above zero here is a warning sign.

Disk reads per second

In Datadog: couchbase.by_bucket.ep_bg_fetched by {bucket}

What this measures: This metric is the raw number of disk fetches per second. This number is used in our cache miss rate calculation (above), but is worth watching on its own as well so that it is not masked by a higher number of gets per second. Again, this is the same bucket as above.

clip_image015Disk reads should average zero for a healthy bucket.

Ejections

In Datadog: couchbase.by_bucket.ep_num_value_ejects by {bucket}

What this measures: This measures the number of objects getting ejected out of the bucket. Any spike in this value could indicate that something is wrong, such as unexpected memory pressure for that bucket.

The example below shows what this looks like when it happens. This is the same bucket as the previous three graphs.

clip_image017Couchbase is kicking active items out of memory to make space for new objects.

Disk write queue

In Datadog: couchbase.by_bucket.disk_write_queue by {bucket}

What this measures: Couchbase eventually persists all objects to disk. This queue measures how many of these objects are waiting to be written to disk. It should always be a low number. Growing larger over time would be an indication that the cluster is unhealthy. This graph below shows a temporary spike by one of our apps during a recent deployment with data migrations. A non-issue as long as the queue stays low/zero during normal load.

clip_image019One of our apps queues up rapid writes during deployment.

Out of memory errors

In Datadog: couchbase.by_bucket.ep_tmp_oom_errors by {bucket} andcouchbase.by_bucket.ep_oom_errors by {bucket}

What this measures: These two metrics measure the number of times per second that a request is rejected due to memory pressure. Temp errors mean that Couchbase is making more room by ejecting objects and the request should be tried again later. Non-temp errors mean that the bucket is at the quota. Non-temp errors should trigger an alarm.

Couchbase Metrics & Datadog

Couchbase has a ton of other metrics that can be monitored and the Datadog integration exposes all of them. Luckily for us, the admin GUI already displays most of these metrics visually. Simply find a metric that you want to add to Datadog and hover over it. The tooltip will tell you what specifically is getting measured. If you’d like to gain this visibility, you can try Datadog for free for 14 days.

Couchbase also has great documentation. If you’re interested in learning more about these metrics or more about how Couchbase manages its memory and active working set, I recommend reading more about its architecture.

If you are interested in learning more about MLS Digital, check out our blog!

clip_image020

clip_image022

JUSTIN SLATTERY

· TWITTER

· LINKEDIN

Sr. Director of Software Development, MLS Digital

Posted in
Community, Databases, Datadog, Engineering, Integration


WRITTEN BY
blushine
개인자료 보관 용도의 블로그 입니다.

,

 

찾아보려고 하면 없어서 일단 정리 해둔다..


perfmon.html



WRITTEN BY
blushine
개인자료 보관 용도의 블로그 입니다.

,

ssh 터널링

카테고리 없음 2014. 6. 19. 14:17

가끔 사용하는데 쓸 때 마다 봐야 해서 정리..
참고 주소 : http://www.hanbit.co.kr/network/view.html?bi_id=547

스크린하고 같이 사용하면 잊어버리고 쓸 수 있다.

$ screen -S mysession
$ screen -ls

아래 SSH 연결을 맺은 후 Ctrl+a, d 로 스크린을 놔두면 해당 터널링이 동작하게 된다.

 

이하 터널링 관련

로컬 호스트 A의 포트를 대기하고 있다가  원격 호스트 B로 전달하는 경우

ssh -4 -L로컬포트(host A) :원격호스트(host B):원격포트(host B)  원격서버 (host B)

$ ssh -L포트번호1:호스트명:포트번호2 서버명
image
『SSH, The Secure Shell: The Definitive Guide』 329p 참고, Local port forwarding

원격 호스트 B의 포트를 대기하고 있다가  로컬 호스트 A로 전달하는 경우

ssh –4 R원격포트(host B) :로컬호스트(host A):로컬포트(host A)  원격서버 (host B)

$ ssh -R포트번호1:호스트명:포트번호2 서버명

image
image 『SSH, The Secure Shell: The Definitive Guide』 329p 참고


WRITTEN BY
blushine
개인자료 보관 용도의 블로그 입니다.

,