Select Page

Having come out of the long product development hibernation, I am now in the out talking about CloudMunch 2.0 with some of the prospects and customers. One thing that I keep getting is, Why is this a significant version for you? Why not 1.3 or 1.4, why is it a 2.0? I have given off the cuff answers in some occasions, but then thought that I need to write specifically, “Why this version stands as a significantly different version?” chatops. So here goes a series.

The first and foremost reason is ChatOps. We have re-written the platform to give “Self-service”, a new dimension via ChatOps. When we were thinking of re-writing the core, we had said, we need to build the platform to natively work with Chat systems. We should be able to plug any communication via ChatOps. So our idea of bringing “self-service” was to bring self service to our end users, our business, our developers and our ops teams. Let me expand this below.

1. End user messages: As I mentioned in earlier blog, the team using CloudMunch can get quick view on the operation their teams are performing. Be it application deployment, build performed, asset or environment created or even system integrated. This helps teams to understand the actions that are happening within the team and bring everyone on the same page.

2. Platform messages: There were two options in front of us to ensure we monitor the application stability. One was Log Monitoring. This as you all know, we will have logs for multiple services in multiple nodes, needing a lot of focus on log shipping and consolidation and running queries to figure out what is going on. The other option was to have product spit out issues as and when it finds or when it thinks it may end in error. We picked the latter option and the channel to spit out these alerts was a chat channel. This way our teams in channel get to know what is going on in each environment without having to login into any system. Every suspicious call be it due to security or stability hits our Ops channel for engineering to look and find what the issues could be.

With containers everywhere, Ops team find what issues each developer faces in their own container via this channel. The discussion between ops and dev sometime happens as below, “Hey Mr. Developer, Looks like you have not deployed the latest plugin of this integration in your laptop, I see an error.” This is even before developer notices that his container is hitting error!

While Log Analysis is useful for post facto, if we can build application “Chat centricity” in mind, it is easier to manage changing code effectively.

3. Business messages: Today, given the “as-a-service” economy, business needs to know what is happening in their application more than anything else. This is one of the big decision making input they get on a day-to-day basis to make calls on what the next quarterly strategy needs to be, to what should be the content of next sprint. Given this, getting real time business alerts on what users are accessing, what they are not able to understand, becomes critical.

If some of operational alerts can be plugged into Chat channels, business can take these as triggers to dig deep and take necessary action to move forward with their product strategy, customer feedback sessions, and further loop back with engineering for more details. This in-fact is what is done at CloudMunch today.

4. Operational messages: While we focus on Application alerts, and Business alerts, the key is to keep the system on, even in the worst of the situations. ChatOps can be a important function to provide infrastructural feedback on how various systems are doing and alerting before anything wrong can happen, be it heavy system usage, be it a port being open or be it auto-scaled service is not returning to normal. Bringing them on common channel helps engineering and ops to relate the behaviour to the changes being made.

What we are finding with this approach is that all alerts being equal, we have the whole team looking for reasons for various alerts, and start making sense of these data as a whole. This not only brings Teams together, it also help resolve issues faster and increases mutual respect of all groups. This as a whole has changed culture at CloudMunch, and hence 2.0. Hope this makes sense.

My next reason is Containers and how that brings a significant impact to the quality of service we bring to our customers.

Send in your thoughts.