Monitoring theory, from scratch — events and transactions.

If only monitored events could be this colorful and tasty. Alas, they are a different beast.

Events

A system has all sorts of stuff happening in it all the time. It could be as detailed as ‘register AX changed its value to 0’ or ‘a packet has been received’ or it could be as high-level as ‘a customer has just logged in’. Some of these might be expected, some of these might not be (or perhaps expected but not welcome, in the case of nasty third-party integrations.). By directly observing these, we don’t need to wait for them to trigger a metric change.

  1. Iterate and perform RCAs. Haven’t we heard that one before? Your logging should look more like a living organism than a stone. Ask yourself if a log could have been helpful in diagnosing a problem. Do I need more logging? Less logging? Better filtering? Better aggregation?
    I’d also like to add — be brave and send logging patches for underlying products or your cloud provider. Be a good netizen. Thank you.
  2. Collect more, display less. That one also rings a familiar bell. Allowing an operator to access the data they need for a particular analysis but not always displaying can be the best of both worlds. As a particular recommendation over the lessons of metrics, abilities such as template analysis and aggregation of repeat messages can be of great assistance in separating the wheat from the chaff, offering both a metric view of repeating events as well as the ability to filter items of interest.
    Then again, considering the number of things that can possibly be collected, a tasteful limit is in order. Usually, intuition coupled with an RCA process will suffice.
  3. Context matters. A lot. This is a new development over our experiences with metrics and thresholds. We can now say things that are far more specific — including things like stack traces, line of code triggering the event, some particular environment state, particular metrics or indicators… this extended richness and expressivity should be tapped in event collection. This is something to consider in your RCA process as you improve upon your log collection.
Coralogix, a managed logging solution

Transactions

As if that’s not overwhelming already, there is more. We probably shouldn’t be too much of a surprise to guess or learn that somewhat similar to synthetics, there is a form of aggregated distributed event collection — transaction tracing.

Transaction tracing has the additional benefit of explaining what happened to my last salary
Lumigo, a transaction tracing monitoring solution
Epsagon, another transaction monitoring solution.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Gil Bahat (she/her)

Gil Bahat (she/her)

146 Followers

A Gil, of all trades. DevOps roles are often called “a one man show”. As it turns out, I’m not a man and never was. Welcome to this one (trans) woman show.