We just raised a $30M Series A: Read our story

Top 8 Server Monitoring Tools

ZabbixMicrosoft Endpoint Configuration ManagerSevOne Network Data PlatformOracle Enterprise ManagerSolarWinds Server and Application MonitorNagios XIInfraon IMSServiceNow Discovery
  1. leader badge
    We are able to monitor our virtual infrastructure, virtual machines, windows servers, databases, and the network using a simple network management protocol. We are able to pull almost all the metrics that we want, receive notifications, and have them integrate with telegrams for certain devices that are critical, such as UPSs.
  2. leader badge
    Offers good patching.The technical support is good.
  3. Find out what your peers are saying about Zabbix, Microsoft, SevOne and others in Server Monitoring. Updated: October 2021.
    542,721 professionals have used our research since 2012.
  4. The comprehensiveness of this solution's collection of network performance and flow data is one of the basics in the field for what it does. It meets all of our needs. So for all those areas, for the most straightforward collection capabilities, right up to NetFlow and even telemetry, it meets all those demands. Not only just basic or fundamental SNMP collection capability, but the product also supports what we need for the future with telemetry streaming. So it's very comprehensive.
  5. I like that it's stable.Due to the infrastructure's size, the introduction they offer is very, very useful. It helps with an overall understanding of the product.
  6. Monitoring of processes and services is the most valuable feature. It is not necessarily just the server alone in terms of the CPU or the memory. We can go in-depth into services and processes.
  7. Nagios XI helps us monitor the bandwidth of the internet connection, HTTP, DNS, active directory services, and exchange data availability. We have multiple servers to monitor databases, availability of servers, and ping.
  8. report
    Use our free recommendation engine to learn which Server Monitoring solutions are best for your needs.
    542,721 professionals have used our research since 2012.
  9. We use the solution to automatically trigger processes to help us resolve issues. The whole IT process has been automated, such as trying to map all the users and the escalation process. So, if any issue happens, we get an SMS and WhatsApp of the report. If there is a critical issue this has to be sorted out, like the entire data center being down, then there is an alarm.
  10. It does a good job of collecting the data that's necessary for data centers, and IT's operations.The initial setup is pretty easy.

Advice From The Community

Read answers to top Server Monitoring questions. 542,721 professionals have gotten help from our community of experts.
Ariel Lindenfeld
Let the community know what you think. Share your opinions now!
author avatarRanjith Kumar
Real User

There multiple angles that the consultant to look for on the Monitoring Per-se, let me list few.

1.Having separate tool to monitor Server/Network and so on .. is traditional method and this no more a value proposition .. look for a tool which can do a full stack monitoring of the environment. The reason for this is because this'll reduce the unnecessary integration efforts and chopped data due to multiple integration points. And this makes sure the data flow is seamless wherein it helps to manage environment from a single console.

2.The product selection should allow to extend to the AI based Methods as it going to create a huge impact in infra operations. And how complex it is to build is also a question but it always good to start as you don't need to be left out on the AI Ops race.

3.The product implementations should be completed based on the Docker/Container images which helps in scaling of the monitoring solution horizontally.

4.Strong Event Management should available to help in all event correlation and duplication.. so on.. It is considered to be obsolete in future but I believe it is going to be there for some time until the things gets matured in Deep and Machine learning algorithms.

5.Integration capabilities with third party systems(API,SNMP,TCP,Log)

6.Finally the cost plays a major role and see what you want in the environment. Product selection should be based upon to solve & proactively detect the issues in your environment and to add above values.

There are other pointer like ease of use,support,user experience .. so on which is must for any products...

Hope it helps!!

author avatarMonitor Jimmy
User

There are 4 things you should have in mind when looking for a monitoring system.

1. Do not take the articles that review and compare multiple monitoring systems too seriously. These articles usually focus too much on how many sensors a system delivers and too litle about what really matters.

2. Look more at the stuff that lives forever; how the monitoring system handles data.

- What capabilities does it have when it comes to dealing with dependencies?
- Does it store data in a way that makes it easy to implement AI?
- How well can it handle notifications?
- How scalable is it?
- How easy is it to implement custom sensors?
- Does it have any features that are useful that other monitoring systems does not have?

Bjørn Willy Stokkenes, the architect of Probeturion wrote an interesting article about these things on LinkedIn:
https://www.linkedin.com/pulse/5-things-great-monitoring-system-should-help-bj%C3%B8rn-willy-stokkenes/

3. Do the vendor deliver proper support
- Do they answer quickly
- Do they understand your questions or do they make you send a lot of unrelated information about your settings and so on?
- Do they offer to support you in setting up your monitoring system?
- Do they offer to build custom sensors for you?

4. Do not get fooled by a low price. Remember, you and your workers time are worth a lot of money. Sometimes saving 90% cost in purchase of an IT system can make you loose 100 times more in wasted man-hours.

author avatarMark Towler (Ipswitch, Inc.)
Vendor

I think there are three things that should be considered along with the other comments here:

CONTEXT - what else connected to that server is being monitored? Diagnosing faults can be tricky and it's made much for difficult if you have to go from one monitoring tool for the server to (many?) others for all the devices connected to that server. A tool that shows that server in context with all the things it's connected to can make diagnosing network issues simple.

SELF-HEALING - half the time the tried-and-true power cycling of the device in question solves the problem. If the admin understands the system and knows that the server will occasionally require rebooting, why wake him up at 2am? The monitoring solution should be able to automatically execute self-healing actions like this based on preset conditions. This makes the difference between a 2AM call and a note in the admin's inbox when he gets in the next morning.

PROACTIVE ALERTS - if the user notices the network is down you're already losing money and gaining ill-will. A good monitoring tool will let you know when failures are about to happen and alert you before they start impacting your users.

And finally, as one additional last thought, it's nice to have a monitoring tool that will alert the entire IT team via something like Slack in case the admin in question is unable to respond in a timely manner.

author avatarReba Gaines
Real User

Security around protocols supported and what's not supported that relates to security, i.e. FIPS, etc.

What OSes and databases are supported; for capacity planning and clustering support.

What technologies can be monitored.

author avatarD6B8
User

Updated product (or one that continues to get regular updates), ease of use, and aesthetically pleasing.

author avatarNetworkOb0a3 (Network Operation Center Team Leader at a recruiting/HR firm with 1,001-5,000 employees)
Real User

IMO I like to engage the app/system/service owners and ask them what they want to see monitored. The experts are usually going to be those who built the service you are monitoring. Since an engineer is going to get the call at 2 AM when the alarm you set up trips, its important to work close with them also so you can iron out what is a good threshold for the warning and then alarm. Engage the NOC and see if there is 1st level support they can do to avoid that 2 AM call. I stick with a default base template constructed by the OS vendor's recommendations and then we tweak it to be more accurate for our environment. Server / OS monitoring is pretty standard across the board, I find its the application / service monitoring that takes a lot more thought. In the end the one question that usually wraps up the meeting. When do you want me to wake you up at 2 AM? What condition on the system warrants this call? When do you want me to send an automatic email for awareness? When do you want a ticket and email only? Every organization will have their own method for monitoring and it should be an ever growing and evolving process. Every outage should have an RCA and the monitors should be reviewed. Did we know this was coming? Could we have alerted sooner and avoided user impact? How should we monitor going forward.

author avatarreviewer1352679 (IT Technical Architect at a insurance company with 5,001-10,000 employees)
Real User

Server is a rather vague term in these days of virtualization but also gets to the point, context.  If you are monitoring any entity, that entities context in the environment is the most important thing to consider which means you need to understand the role of the "server", who will consume the information, and how to put data in context.  In todays markets, capturing data is typically not your issue.  Your issues are presenting the data in a context for your users to be useful.  A server admin who typically buys the monitor for the server has to consider the developer and business analyst requirements, not just their own.  For example, is a windows server running at 90% meaningful?  Is a jump in I/O latency from 10 -> 100 ms important?  Is a process that consumed 2% cpu, jumping to 10% CPU going to cause you a problem?  How about memory paging....  I think the point is obvious.  So you have to understand the context of information shared with the user consuming this information.


Each of the above are cases I have seen that at times it is important and others not so much....


- 60% cpu on a multi-threaded host but cpu0 is pegged, limiting it's ability to delegate work to other cpu or the process is single threaded.


- 10 - 100 ms jump in I/O latency.... not for a non-prod server writing to legacy disk frames with a lower SLA


- Jump from 2% - 10% cpu for a process, a process which is running on 50 virtual instances on a single physical host now consumes multiple physical CPU, very expensive.  This view is typically more important to someone doing capacity analysis which may or may not be the server admin.


I split context into 2 capabilities; (1) - observe-ability (manually putting things together) and (2) AI - a machine helping give context


(1) Observe-ability: How easy is it to express your metrics in a graphical format?  This may include ability to find your data, having multiple options to visualize (bar, line, top N, hot spot, ...), personalize the view, and finally share the view.  You get a bonus if there are ways to simplify "re-using" the dashboard template, simply updating the context while the tiles are still meaningful.  Think about creating a view for an app, then replicating that and each team can easily change to their personal context.  The interface needs to be usable by people that are not server admins.  Don't underestimate the complexity for someone to "find" their data, especially if they don't administer the platforms.


(2) AI: How does a machine help me understand the context of information.  This may have other applications but alarming is the easiest to consider.  If I can quickly correlate the a CPU utilization alarm to a business transaction that is running slow, I have context.  A typical server admin has no concept of the applications running, at least not any any decent sized company.  In the same way a developer doesn't have sensitivity to the utilization of an underlying hosts (we won't even consider the physical host) if they are running in any kind of shared infra.  The alarms generated, to be meaningful, should be presented in the context of the critical business transactions running on the server.  The AI needs to assist both admins and developers (or a devOps engineer) to become aware of the event and quickly triage through impact and root cause analysis.  Improving MTTR should be the goal of any event.

author avatarreviewer1195575 (Managing Director at a tech services company with 1-10 employees)
Real User

Well, first there's a lot of different types of servers and domains they serve, so it depends what the purpose of the server is. 
The physical server components will stop it providing any service if they fail. So its easy to see that its vital to monitor them all. But if that server is in a cluster, where other physical servers take over when it does fail, then the importance of monitoring each component of each diminishes.
If the server is in the cloud, private or public, then maybe you're not the owner and just pay a fee, in which case monitoring isn't of much to you. Maybe. If the availability and performance of the service you subscribe to can impact YOUR business, then monitoring to see how the service is delivering what you pay for should be considered.
If the "server" is NOT physical but virtual, then the objects you monitor will be similar but others that impact the virtual hosting need to be added. 
If its a database, web, application, or other type sof servers of which there are many, then the type and list of monitored data to detect adverse trends, detect resources starvation will be different each time.
Whatever the "server", we also cite the old adage "You can't manage what you do measure".
Even if the server isn't in a business or commercial perspective, everyone is judged sooner or later on the capacity to get the job done in a reasonable time. Monitoring should provide just enough visibility of anything that might stop that objective from being met
Thus what's important to look for is whatever data monitoring can provide that will reduce the risk of 1. compromise from a security perspective, 2. business impact through resource starvation, and 3. changes that can impact the behaviour intended.


Find out what your peers are saying about Zabbix, Microsoft, SevOne and others in Server Monitoring. Updated: October 2021.
542,721 professionals have used our research since 2012.