We use it almost exclusively for flow data. We use that for a variety of things from network optimization to network capacity to security events, including DDoS protection, etc.
We're using the SaaS version.
Kentik's AIOps Network Traffic Intelligence platform unifies network operations, performance, security, and business intelligence.
With a purpose-built big data engine delivered as public or private SaaS, Kentik captures a high-resolution view of actual network traffic data and enriches it with critical application and business data, so every network event or analysis can be tied to revenue & costs, customer & user experience, performance & risk.
Download the Network Monitoring Software Buyer's Guide including reviews and more. Updated: January 2022
Pandora, Yelp, Neustar, Box, University of Washington, Zoom, Tata, and Cogent.
We use it almost exclusively for flow data. We use that for a variety of things from network optimization to network capacity to security events, including DDoS protection, etc.
We're using the SaaS version.
The drill-down into detailed views of network activity helps us to quickly pinpoint locations and causes. Anecdotally, it has decreased our mean time to remediation. On a per-incident basis, it could save anywhere from five minutes to 60 minutes.
We also believe it has improved our total network uptime. We haven't done any direct before-and-after comparison, though.
Again, anecdotally, it has sped up our security team's ability to respond to attacks that did not surface as readily, prior to having the flow log data.
One of the valuable features is the intuitive nature of building out reports, and then triggering actions based on specific metrics from those reports. It has a really good UI and the ability to surface data through the reporting functions is pretty good. That's helped a lot in the security space. If you get a massive, 100 GB attack coming through, saturating links, you can surface that really quickly and then act to engage DDoS protection or other mitigations from the IPS.
The real-time visibility across our network infrastructure is really good. One of the things that we love it for is our global backbone visualization. Being able to see that utilization in real-time is pretty critical for us.
It also proactively detects network performance degradation and things like availability issues and anomalies when used in concert with the SevOne network management system. In conjunction with that — with all of our polling and availability data coming from that NMS — the flow data provides that type of insight.
We also use Kentik's months of historical data for forensic work. We do 90 days.
I believe they're already working on this, but I would love for them to create better integrations from network flow data to application performance — tracing — so that we could overlay that data more readily. With more companies going hybrid, flow logs and flow data, whether it be VPC or on-prem, matched with application performance and trace data, is pretty important.
The other area would be supplanting companies like SevOne and other companies that are really good in the NMS space, specifically for SNMP data.
We've had it since before I took over this space and took over Kentik, so 2017 is when the initial contract started. We're going on three years.
The stability has been very good. There was only one outage or impacting event that I can remember in the past year. It took them a couple of days to fix it, but the impact was remediated through some mitigation they did on their end to prevent it from causing us too much headache. They got it down to where it only affected some long-term reporting, which wasn't super-critical for us. It wasn't too big a deal.
So far, Kentik has scaled for what we've done with it and we haven't hit any scale issues to date. I don't know if we're a very large user compared to some of their other customers so I don't know if we're a good example to discuss scale, per se. But we haven't encountered any scale issues from our side.
We don't have plans to expand the use of Kentik, other than increasing licenses to gather flow data for more devices. We buy per license and we have 75 or 100 licenses. The size of the teams that use it is 100 people or so. They are security engineers, network engineers, network health analysts, and threat-intelligence folks.
Their tech support is phenomenal. They tell us about an issue before we even get to it.
With the incident that I mentioned in the context of the solution's stability, even before we experienced any issues relating to it, they had already reached out to us and let us know what was going on. They gave us some timelines, and the ongoing communication kept us informed throughout the incident and was able to mitigate any kerfuffle from the executive layer. That can be a giant headache when dealing with those types of situations, but they managed it perfectly and were proactive with their communication and we didn't hear a peep from anyone about it.
I wasn't involved in the initial setup, but there is time involved for us to set up the checks for the flow data and to set up the reports. Depending on what someone is setting up, it could take five minutes or it could take a couple of days. It just depends on what they're implementing with it.
I'm sure we have data available to show ROI but I don't have it available. Where Kentik is bringing us the most value is in the security realm, in terms of attack prevention, but ROI on that is hard to measure.
There have been other folks in our company who have tested a variety of things. Prior to Kentik they went through an evaluation phase, from what I understand, and vetted out a variety of solutions. I believe that what made Kentik stand out was pricing and the intuitive user-experience.
The biggest lesson in using Kentik is that as we continue to use it and learn more, we learn about the use cases that are valuable. Initially, when I came over to the team, we weren't using it to its fullest capabilities. As we started to understand the capabilities and dive in, in specific areas with Kentik engineers themselves for customer success, we learned that we needed to change our thought process a little bit; how we thought about flow logs and what they could provide insight into.
My advice would be to leverage their customer success engineers upfront and don't let them go until you've hit all your use cases. Constantly be in touch with them to understand what some of the forward-thinking ideas are and what some of the cutting-edge use cases are that their other customers might be getting into.
We don't make use of Kentik's ability to overlay multiple datasets, like orchestration, public cloud infrastructure, network paths, or threat data onto our existing data. That is something we're evaluating. We're currently talking with a couple of teams that are moving to AWS, teams that would like to use Kentik to potentially capture VPC flow logs and overlay that with their application performance data. That is something that is currently on-hold, pending some other priority work. We will probably dive back into that, with that team, around mid-2020.
For maintenance, it requires less than one full-time engineer because it's a SaaS model.
In terms of overall vendor partnership, I'd give Kentik a nine out of 10. They're right up there as one of my best partners to work with, amongst all the contracts that I own. They're very customer-centric. They're always available. There's nothing too small or too big that I can't ask them to help with, and they seem to be willing and able to jump in no matter what. That customer focus — which is a theme across the digital world right now with companies trying to try to do more of that — Kentik does a really good job of embodying that.
We use it for traffic management. And when we want to set up new locations or a new market with our own CDN, we use it to scope what kind of internet traffic there is and what kinds of connections we should prepare.
We also use it for some alerting and reporting, like if traffic shifts very much on the link or toward a certain ISP. That could potentially tell us that there are problems or something that we should check out.
We're not super-advanced users, but we also use the API in the product. We have some tooling that we've written around these use cases that pulls data from the Kentik database.
We send the dataflow to Kentik, in their cloud. We don't have any software installed on-prem here or in our data centers. As a company, we've always tended toward not having to manage more hardware and software than necessary. We're extremely happy with having it in the cloud and we're not afraid of sending this data to them in the cloud. We pretty much trust them.
Using the drill-down into detailed views of network activity, we can see where we might have bad performance. Maybe it's in the US and is from a specific ISP. Seeing that we have general bad performance from them doesn't help us that much when troubleshooting with them. When we drill down, we can see that the users we have the most problems with are from this city or that state.
Also, some of these tools can be pretty complex, but what I really like is that when we get new team members we can easily onboard them into the tool. They can be up and running and doing fairly advanced queries very quickly. That's been a positive for us.
Kentik's API has really helped us as well. We have tooling where we can look at a certain POP and then pull the data out of Kentik and make decisions on that in another application. We also use it for cost calculations, since we have the real-time traffic data and we have a pretty good understanding of what the different links cost, and what the data costs on those links. The tooling pulls real-time data or weekly averages and we do calculations on how we're doing per gigabyte in cost.
I can only guess at how much the solution decreases our mean time to remediation, compared to if we had written our own tools. We have had Kentik from day one. I can only imagine a world where we had tried to develop this ourselves and how that would have looked. Compared to what we would have had, I would say it has decreased our MTTR by three times. It all comes down to the drill-down functionality and how easy it is to use the interface; all of the data that you can get out of it very quickly, with all the different graphing options. I would guess if we had developed our own tool, it wouldn't be nearly that advanced where we could add multiple datasets and do graphing. We probably would have had to do a lot of SQL queries ourselves to get to whatever we wanted, especially if we had trickier things to try and remediate. But it's hard for me to say since we've used it for so long.
It also helps with our total network uptime. The anomaly detector is pretty good at detecting weird things, like when traffic drops. But we also have a lot of our own tooling for this. Kentik is not a monitoring solution for us in that sense. It's more on top of what we have. But we have seen weird things where traffic has moved, situations which we probably wouldn't have caught with our own systems. So it gives additional benefits on top of the more rudimentary or standard tooling that we have.
For us, it's valuable to get a general understanding of how we serve different networks on the internet from our CDN. We're extremely happy with the different classifications you can make and also the ease of drilling down. It's a very easy tool to use. You only need 10 minutes and you pretty much have the hang of it, and that's really good.
We're pretty happy with the API functionality. It's web, and it's very simple to set up queries. It has served us well and you don't need to be an expert on the API or the product to set these things up.
It also detects anomalies proactively, but the same is not so true when it comes to real network problems, since they tend to just happen. Sometimes we can see performance degrading over time or we can see traffic drops where we're not expecting them and that could be a problem, but it's not very proactive in that case. But it's pretty good. In fairly real-time we get alerts and can act on them.
They've added a lot of features in the beta product that is coming out, things we told them about before. We asked for a way, regarding the potential networks that exist, to hook Kentik up with external tools like peering DBs to correlate things together and see what we can do.
They've been working on a cost calculator, which would be great for us, so we don't have to do it ourselves.
This is all in the beta now. Those have been my main issues so far, and since we're not a super-large, global internet service provider, we probably use 20 percent of all the features, or even less. So there aren't any major issues that annoy me on a day-to-day basis. We're extremely happy and it seems like they're listening to whatever feedback we have given in the past.
We've been using Kentik for about five years. We were one of the early customers.
The stability is very good.
Today, we only have 12 devices. We're probably not even close to reaching any limits. I would guess if we had thousands of them it might be a different story.
In terms of expanding our usage, we should probably look at the cloud part of the product. The DDoS part might be interesting as well. It's something that we haven't had time to really dig into. It's there and it's free, so why not? But then, our network setup is fairly simple. We don't operate any global backbones or the like. That's why we don't use some of its features. And we're not an internet service provider, so we don't need to understand a million things about what our users are doing.
We have never used their tech support, so that's probably a very good thing. We have never had any weird problems with the product where we had to file support tickets or anything like that. It's just been smooth sailing for us. I don't know if we've been lucky or if the product is just super-stable.
I run into Kentik people at different conferences around the world, so we usually sit down and talk. We don't spend that much time with our account manager. Since we've been a customer for so long we have met everyone in the company from the early days. So we have pretty good contacts.
We made the decision to go with Kentik instead of building something ourselves, and that was mainly due to the graphing features of the product, which are really excellent; the drill-down features. For us to develop something like that ourselves would have taken a lot of time.
It was in their very early days. We met Kentik at some conference and we thought, "Hey, this looks like a cool product and something that we probably need." So we started a trial and were very happy with the product and we continued using it. We really like that they understood our use case. The people who worked at Kentik back then were people who came from the same background as ours, with CDNs and content delivery.
We were extremely happy with the features; they were exactly what we were after. Back then, one big plus for us was not having to operate our own hardware, like appliances, in data centers. Since we're an internet company, we're not afraid of sending data to the cloud, a process which might concern a bank, for example. It was pretty much a no-brainer to continue using the product.
Back then, the setup was really straightforward. There was not much configuration to be done on our side and then data just magically appeared in the portal.
Our deployment took about a day. We only had a few routers and a few POPs back then. We did the setup in three or four locations, so it was fairly small. Today, everything is completely automated on our side. When we set up new locations, we make sure that all the configuration is done automatically. The only thing we need to do is to go in and add the site in Kentik. Pretty much everything else happens automatically on our end. So there really isn't anyone involved in deploying it, per se.
We didn't really have an implementation strategy.
Given that it's a SaaS solution, it also doesn't really require anybody to work to maintain it or administrate it. We push data in and it goes away after 30 days. On an ad hoc basis, where we need a dashboard or something specific, someone may spend an hour creating that in the tool. That's not really maintenance, it's more our using the product.
We have seen a return on our investment. We need to have a tool like this and I could just imagine, if we were to look at the engineering team's hours that would need to be spent on writing something — if we wanted to do it ourselves in-house — that the return on investment is from not having to deal with that and maintain that system. And, of course, if we can spot errors fairly quickly... because if you mess up big, it costs a lot of money and fast. It's pretty good to be able to see those kinds of things in almost in real-time. It's been good for us.
We would probably have to spend a couple of hours per week to maintain an in-house tool. It really depends on how big or how complex the solution we would have built would be. But to be on par with Kentik, that would have been a pretty huge task for us to do and maintain.
We didn't look into any of their competitors at the time. It was very early days. We were in the build-up phase. I know some of their competitors and they're more clumsy when it comes to the graphing part. And we didn't want appliances. For us, a company that doesn't operate that many routers, pricing is not a huge deal, which it could be for other companies with thousands and thousands of devices to monitor. For us, it was a very good tradeoff to not have to deal with the on-prem hardware.
My advice would depend on the network and what your use case is, but I would not underestimate the importance of how easy it is to use. If I were to sell this product to someone else, that's exactly what I would tell them: how easy it is to use. Easy tools get used. If you have a beast of a system where it takes 20 minutes to get the query out, then you're probably not going to use it as much.
The biggest lesson I've learned from using Kentik is that when it's easy to drill down into data, you tend to do it more. We have spotted so many things that we would have never had spotted if this had been a less "real-time-ish" product.
Collecting data is usually very simple, but presenting it in a good way such that people can actually access it and model it as they want, that's the tricky part. Having a tool that is as easy as Kentik is to work with, gives the team motivation to add more stuff to look at.
We don't use its months of historical data for forensic work. We're using it as a real-time snapshot. You can buy the ability to go back further in time. With our license we only have the 30-day period but we rarely even look at 30 days. We usually look at a week to get the cycle of the traffic peaks that we have when people use our service on the weekends. That usually gives us a pretty good average for a month. Of course, we have other tools that we have built ourselves to do more long-term analysis, if we want to see how our traffic has grown.
We also don't make use of Kentik's ability to overlay multiple datasets, at least today. We probably should look at more of these things. We only use it for traffic management or to get an understanding of our traffic flows from the private CDN. We don't look at any trap detection. We do have a very large Google Cloud installed base where we could potentially use that, but we haven't gotten around to doing it.
We have eight people who look at Kentik. They're all working in content delivery. We don't expose it to managers or senior managers. Our structure is a bit different than some companies; we try to solve a problem very close to the problem. So it's basically my team that looks at it and they make the decisions. It's not like we have dashboards for managers and things like that. We do have the cost calculations, but we abstract that away by writing our own tooling to get the data out. It's just network engineers and the product managers for the content delivery network who look at it.
I would rate Kentik a strong nine out of 10. There is always room for improvement here and there, but overall, for our use case, it's been working really well. We haven't had any real issues. I could imagine that if you have a bigger, more complex network, you could run into some issues, but we haven't.
I like the fact that they come from the same background as we do and that they understand, at least from my perspective, the content part and what it's all about. They've been very easy to work with and very keen to listen to feedback. I am super-happy with the product.
We mainly use it for visibility into our traffic but we use it for DDoS detection as well.
We're the third-largest tier-one in the world but, prior to deploying Kentik, we were flying largely blind regarding our IP traffic. We didn't have any kind of visibility into where we should be upgrading capacities. Gaining visibility into the traffic with a network at our scale has been huge.
We've been able to do traffic analysis when we're looking at bringing on a customer or, more specifically, when renewing and re-terming a customer. We can take a look at their traffic profiles and put dollars and cents around it. What does it cost us to haul this customer's traffic? Are we making money on this customer's traffic? How much are we making? That allows us to gauge where we can do things, re-term-wise, and still make money.
We can also do customer prospecting. We can look at our traffic and say, "Hey, here's traffic, either to or from networks, that aren't on net. If we were to bring them on net we would be monetizing traffic that we're currently handling either for free or in some other way. If we were to bring it on, we'd be making money from it.
It has also helped our organization to save money in backbone planning. Previously, if a specific path was full, we would have to throw more bandwidth at it. I think that's what a lot of networks still do. Kentik allows us to see where traffic is really going and coming from. So we've been able to be much smarter about where we choose to upgrade paths. Throwing bandwidth at it costs adding however many more waves. If the traffic goes between A and C instead of A and B and that path happens to be $1,000 a month cheaper, we can make those kinds of changes. We've definitely been able to save money that way.
In addition, the drill-down into detailed views of network activity very much helps to quickly pinpoint locations and causes. We have a handful of saved queries, especially for some of our guys in the NOC who may not be senior-network-engineering-level types, that can be run. It lets them see things at a high level and say, "Okay, there's a spike here." They can drill in from there and get what they're actually after. It's generally DDoS-related in that specific scenario.
We have also used Kentik's months of historical data for forensic work. It tells us what the heck happened. When you're in it, you're just trying to do what you can to get things working again. That historical view allows us to go back and say, "Okay, we had this major outage last week. We know that it was partially due to this, but what actually happened and what was impacted by what was going on?"
Kentik has also decreased our mean time to remediation, with DDoS especially, but also with peering-related issues. We're able to identify and do stuff there as well, more quickly than we were previously. Shooting from the hip, I would say it has decreased our MTTR by 20 percent.
The most valuable features have been anything around traffic engineering: being able to determine the source or destination of a surge of traffic, whether it's DDoS-related, or a customer just happened to have a sudden uptick in traffic. Being able to tell where that's coming from or where it's going to enables us to do things based on that. Prior to having Kentik we were totally blind to that level of detail.
We haven't seen anything else that comes even close to Kentik's real-time visibility across our network infrastructure, and we've demo'ed everything under the sun. We're fans.
We also use it to ingest metrics, log data at scale and business context information, for network analytics, primarily around traffic profitability analysis. For that purpose, it works pretty well. We're able to get traffic statistics, in an adjustable way, out of Kentik and then we marry them with our financials. Bing, bang, boom, we know what our traffic actually costs us.
Version 4 of the platform is good and going in the right direction. It's starting to answer questions before they're asked. The mindset to date has been, "Hey I've got a question. Let me go Kentik to get the answer." They're moving more in a direction where they are saying, "Hey, here's information that you may be interested in or may need," before the question has to explicitly be asked. Continuing to move in that direction would be a good thing.
We've been using Kentik for about three years.
The stability has been great.
We get emails every now and again that say, "We're going to be doing something," or "We've got maintenance," or, "There was a five-minute outage." We've never been impacted by it.
Using it as a service, it scales indefinitely for our use purposes. That's why we did the as-a-service solution. Scaling is their problem. We didn't want to worry about it. From our vantage point, it scales to infinity.
All in, there are between 30 to 40 people who use it on a regular basis. We certainly have more users in the system than that, but there are 30 to 40 at a given time. They are mainly our engineering which includes the peering guys, myself and my team, and our core backbone guys who handle mostly long-haul stuff. Within our NOC for troubleshooting, there are a number of people who use it. And we've created some custom dashboards for our sales and sales engineering folks. Those dashboards make data easy for them to digest. They can go in via a nice, pretty portal. They type in a network they might be interested in and then all the data that they could possibly want, in an easily digestible format, is right in their faces.
We definitely have plans to increase usage. We'd like to get it into the hands of more of our salespeople. Only a small fraction of them are currently using it, mainly the guys in the carrier space. I'd love to get it into the hands of our enterprise people as well. But there are limitations on our side, including available cycles to get our guys up to speed on that kind of thing. The other thing we've also looked at doing is potentially opening it up to our customers and giving them a view into their traffic. We haven't gotten there yet, but those are things we've looked into and are looking into.
Our interactions with their tech support are very good. Response times are generally measured in minutes, which is nice to see. You don't see that very often. They take ownership when we have issues. But it's usually more questions from our side than anything else. They're on it. They actually care, which you don't see very often in customer support areas.
When there is something missing, we are generally able to go to them and work with them on it. Within a reasonable amount of time, it's generally added. At the moment, we've got what we're looking for.
The last issue they helped us with was due to the fact that we do a lot of traffic engineering, especially as it relates to peering. Once we got Kentik we'd say, "Hey this peer is congested. Let's go take a look at what the source addresses are or the destination addresses are so that we can do some traffic engineering around that." They added in a mechanism that allows you to do that whole exercise with the click of one button, which made life for that specific path a whole lot easier.
We communicated that to our customer success rep.
We were using a homebrew solution previously, which was not NetFlow based; it was BTU-based, which was vendor-specific. We are, obviously, a multi-vendor shop, so it only gave us limited visibility.
We switched to have the ability to see much more than what we were seeing. Kentik was platform-independent. There was also the fact that compared to what they were offering, nothing else on the market had the same feature set. Kentik already had more, and that was three years ago. They have been innovators in the space and have continued to push on the available features since. And most important, for us, was the price point. It was highly competitively priced. It was a no-brainer.
We did look into the on-prem option. Within our group, we're just not set up to do that. We're not server guys. And the pricing on the as-a-service-solution was such that it still made sense to go that route for us.
It took us about a day-and-a-half to fully deploy. It wasn't that big a deal.
We had to roll out the device-level config that would start exporting the data to Kentik, but that was incredibly straightforward. There was no impact to doing so. We automated that and were able to push it out to our entire network in about that day-and-a-half, and we were fully up and going without any kind of hitch.
On our side, it was just me who was involved. It was super-simple. I wrote a script, it deployed, and we were up and going.
And there is no overhead when it comes to maintenance.
It's hard to quantify ROI. How do you put the numbers around our use? Anecdotally, we definitely feel we're getting value from it. We are a fiscally conservative organization, and when we've renewed with Kentik it's never even been a question. It's, "Yes, we're renewing."
Without speaking directly about numbers, it's about the cost of a cross-connect, per device per month. Of course, some people are paying $50 a month for cross-connect and some people paying $500 a month for cross-connect. With volume, etc., it's somewhere in between. But with a network of our size and scale, we've got the volume such that we're able to get pretty aggressive pricing on things that we consume.
There are no other costs in addition to the licensing fee for us. It's one-and-done.
We've checked out Arbor, SolarWinds; you name it, we've tried it. We've had some in-house-developed stuff that we tried for about a year. Kentik really blew everything out of the water.
Arbor had a lot of what we were looking for. The problem is that they quit innovating a decade ago, and their price is ridiculous. Arbor is also device-based. You have to stick a big, massive machine in your network and each of those only supports up to about five devices. We're in an environment where we have hundreds and hundreds and hundreds of core devices. So that obviously wouldn't have scaled.
Go for it. The other solutions out there just don't compare. It has definitely been worth it for us. Anytime anyone asks us, we definitely recommend it.
We were expecting to be able to see and understand more about our traffic. I don't think any of us thought we would rely on it as much as we now do.
We have looked into making use of Kentik's ability to overlay multiple datasets onto our existing data and it's something we are thinking about. We're just not there yet within our organization.
It gives us visibility into stuff going on in our network but I don't think it necessarily helps uptime. Where it could help uptime is for specific customers when it's DDoS-related. It helps us quickly determine what's going on with DDoS, where we couldn't have before. But for our network, as a whole, it just allows us to see what's going on. It doesn't do anything itself.
It doesn't improve on the number of attacks that we need to defend. The internet is a wild place. With a network of our scale, there is something under attack literally every minute of every day, every day of the year. What it does is allow us to see quickly — immediately — is what is actually going on, and then take actions around that.
I rate it a nine out of 10. We're happy with it.
For our purposes, where we're at today, and even in the past, to analyze flows and to pull specific data and understand where our traffic is going to — which AS path — that's primarily the value that I extrapolate from Kentik.
It's mostly on-prem. We do some stuff with GCP and AWS, but it was all primarily licensed-based, based on the number of pieces of equipment we have on-prem that we actually attach it to. We have over 55 edge nodes and about 10 compute nodes.
We can actually see what we're doing now. When it comes to making an educated decision on a number of things, if you have no visibility into what you're doing, you really can't make that decision. Collecting that data and having those metrics first-hand, in real-time, allows us to make an educated decision, versus an uneducated guess.
Kentik has proactively detected network performance degradation, availability issues, and anomalies. When we had no visibility. When we had congestion, things would actually happen and it was hard to troubleshoot as to where they were coming from. That was one of the first things we were able to do.
A specific example is where we had a number of tenants that were created that were getting DDoS'ed. We couldn't understand how or why we were getting DDoS'ed because we had no visibility. We were guessing. Kentik opened up and showed us where the traffic was coming from and how we could go about mitigating.
It lets us understand what those attacks are, versus not actually knowing where they're coming from or how they're affecting us. It cuts down the time it takes for us to troubleshoot and actually mitigate by about 50 percent, guaranteed, if not more. But we're running a bunch of GRE IP sectionals. It's not like we have huge amounts of capacity. But for some of our large customers, it really has helped us detect what the problem is, instead of guessing.
At my previous company, it improved our total network uptime by about 20 percent. I wouldn't correlate that back to Kentik in my current company.
The most valuable feature is being able to pull traffic patterns; to and from destinations. We're able to understand where our traffic is going, our top talkers from an AS set, as well as where our traffic's coming from.
The only downside to Kentik, something that I don't like, is that it's great that it shows you where these anomalies lie, but it's not actionable. Kentik is valuable, don't get me wrong, but if it had an actionable piece to it... I keep telling them, "Man, you need to find a way to make it actionable because if you could actually mitigate, it'd be huge what you guys could do."
The way things are, we have to have some sort of DDoS mitigation, like Arbor or something of that nature. Once the anomaly is detected, that's great, but then you have to mitigate. If Kentik had mitigation, or if they could acquire a solution and throw it onto their platform and have that portion available, that would be huge.
I have been using Kentik at this company for about a year and, prior to that, I used it a previous job for about another year.
Coming into this company, I felt they were flying blind, meaning they didn't really have anything from a monitoring standpoint. They didn't understand how decisions were made. And to make educated decisions, you actually have to have the proper tools in place. Kentik was a tool that I know works really well.
Kentik has pretty good intuition, as a company, as to where the market sits and what they're into. They don't delude themselves. They really focus. They've been pretty good. I know the leadership over there and it seems like between Justin and Avi, they're good at what they do and that's why I'll continue to use them.
Anywhere I go, I'm going to use Kentik if I have the chance.
I am in what's called the "data explorers," which is our organization's free-form, "write your own database query with a GUI" to get some numbers out. I do that because I'm usually looking to solve very specific problems or to get very specific questions answered. I'm very familiar with the GUI and it does what I need it to do.
For our company, one of the major uses of it is in our sales organization. They run a lot of customer prospecting using it. Using the API stack, we ended up writing our own, internal sales tool webpage which does a lot of queries on the back-end to get info from the on-prem database.
We are using the on-prem deployment. We did use the SaaS version initially for the test, to see if it met our needs, but for production we decided to go for the on-prem deployment. The reason we went with the on-prem — and I'm not involved in the purchasing aspects — was because at the level of our flows and data rates, when you're doing the cloud solution you're also paying for the hardware. I believe it was determined that a one-time cost for us in buying the hardware, and then just doing the software license part, ends up being more cost-effective. I cannot speak to anyone else's particular pricing model or the amount of data they're sending. That may make it a very different equation. I have a feeling that the on-prem would really only work for the really large customers.
For our organization, the sales-prospecting is really invaluable. We had a previous tool that I wasn't really involved with but which was, to my understanding, very hard to use and which was — I won't say misdesigned — but designed strangely. With this tool I have been able to work with some of the front-end sales-developer people to tighten down the queries that they wanted to use to get the information out. Once they had that, they could go into their sales portal and put them in there. I can help them with the information because I know what it's coming from. I help them make queries: for example, "The customers in New York who are going to Chicago." Whatever that turns out to be, I know what it is. Whereas, with the other tool I didn't really know necessarily how it was working along its model.
We also have alerting from it for attacks and capacity utilization, which we didn't have before. The great thing about it is that it doesn't say, "Okay, this link overloaded," but it does what's called planning or trending. It says, "Hey, this IP usually has ten hosts talking to it. In the past hour, it has had 10,000 hosts talking to it." It will show things that might not necessarily be a situation where something is being overloaded, but which are still events that happened on the network and which we wouldn't have seen before at all.
Kentik has also helped to decrease our meantime to remediation in the case of attacks. We're able to pull out the IP that's being attacked and take action on it. Before we couldn't find that out easily. That process has gone from slow to fast. Attacks happen no matter what. We have a lot more visibility into them, we can see where they're coming from and that has definitely helped us take action against some of our customers who are continually launching attacks. Maybe it's decreased the number of attacks in that we have found out the customers who were doing them and terminated them. But the tool itself doesn't help us reduce the number.
Having the access to the flow. It gives me the ultimate exit type of stuff, which I wouldn't get in a basic flow-analysis engine. Also, I am able to do a lot of work on the visualization end to create different visualizations and different ways to get information out of it.
The real-time visibility across our network infrastructure is good.
The drill-down into detailed views of network activity helps to quickly pinpoint locations and causes. All the information is there. As an organization, we're still trying to figure out the best way to use it across all different skill levels. I worked with some of the sales developers to get a sales view. I'm working with the NOC to get a NOC view, because it is a very information-dense product. Someone who doesn't know what they're doing will easily get lost in it. But it does light up dashboards and views for people who aren't as skilled with it, to answer questions easily.
I would like to see them explore the area of cost analysis.
We started with a trial just about two years ago and then we signed the contract for it at the end of 2017.
The stability is fine. It's a software product so they are going to be issues. There are problems that happen with it occasionally. They notice it, they send out a message, and it gets resolved. But there are no qualms at all about stability.
As far as the hardware goes, I can't speak to that. Hardware is as hardware does, but I presume they have enough stability or excess — spare capacity — in our cluster that I don't hear about anything. Every once in a while I'll hear that a fan died, a hard drive died, but there is no impact to the function of the platform.
The scalability is great. We've had no issues with it. Our network is very large.
Obviously, you want to be a little — I don't know say cautious — but a little aware of what you're doing. It's the same thing as when you use a database. If you run a query: "Show me all zip codes starting with two," you're going to get a huge number. What you really meant is, "Show me all the zip codes starting in two in Maryland." That's a very different query and that will get you a much faster response because you're already only looking in Maryland. Without having someone to help guide you through that process and who knows what a database does, it's very easy to write bad queries.
One of the great things about this product is that it takes away that "middleman," that developer between the user of the tool and the raw database. At many companies, you have the database of customer information, for example. Then you have the users of that data who need it to make tickets and resolve issues. And in between them, there's a developer who figures out what the customer service people need to know: "Oh, you need to know all tickets of this customer in the past week." Or, "You need to know all the tickets that are open right now." The developer pre-writes those queries for them so they don't have to do it. What Kentik does is it eliminates that layer. I can slice data any way I want on the platform. But with that comes the caution that, if I write a query that is stupid, it's going to take a long time. There are ways to write queries which are smart and ways to write queries which are stupid. That's where it does take a little bit of time to understand how it works. Once I know how to do it, I can easily help other people make dashboard queries so that they don't need to know that.
Tech support is very good. They have a form on their front-end where you can submit a problem request. The cool thing about is it is that it takes a snapshot of the query that's being made so they can immediately see what you're looking at. If you have a problem like, "Hey, why does this graph have this jump here?" they will see that right away, and then you can go back and forth with them. I've been working with them a lot on different issues and I've always had very good support from them.
The previous tool we used was an internal module we developed. The previous solution was very sales-driven. It wasn't very good and it was not our main expertise. We had a programmer-and-a-half doing it. There were two parts of the problem. One part was data ingest at our scale. How do you ingest this much stuff? And the second was how do you visualize it? Those are both hard problems that are different from one another. Our skill set is not really that good in either one. It was easier and made more sense to outsource those aspects to people who do know how to do that.
The initial setup was straightforward. We had the on-prem deployment, so they sent us a list of the stuff. I wasn't involved in the setup of the hardware, but our data center guy said it was straightforward. You put this rack here and plug this in. And as far as the computer equipment goes, the great thing about NetFlow is that it is a very standard industry protocol. It is what it is and it's pretty much done.
In terms of how to best utilize the information you have and what you know about your network, and to give it to the platform in a way that that is good, that is still very easy for a network like we have. But for someone who is a lot less rigorous about their internal typography or typology or descriptions or other meta-information, they may find it harder. You don't need to be doing best practices, just reasonable practices. If you're already doing reasonable stuff, it'll be okay. But if you don't have very good standards for your network in terms of descriptions and the like, you're going to have a bad day. But you were already going to have a bad day. It's not fair to knock the platform for that. There needs to be some way to get that meta-information into the platform, to be able to say: What's a customer? What's a peer? What's a core link? If you can't do that, then you have other problems.
We signed with Kentik at the end of 2017. There were a couple of months where we were spinning up the hardware, where we didn't really do any setup. They sent us a list and we did some due diligence to make sure that we had the right buys, etc. It's going to be different for an on-premise versus a cloud solution. But once we got up and running, things went very very quickly.
If you practice good practices in your network, it's very easy. If you have a very sloppy network with bad descriptions, where you can't write a rule that says a description and starts with "customer" is a customer, and a description that starts with "core" is a core, but they're all just "port to go" you're going to have a bad time. That's really work that needs to already be there in a good network. Our network was already designed with a standards base. So our setup was very fast. It took weeks if not days. Once we put the first few routers into the platform to make sure how the API was going, we were able to run all the rest through.
It took one to one-and-a-half people for the setup, excluding the on-prem hardware installation, which I wasn't a part of at all. I'm not a developer, so we had a developer who did the API work to add them into the platform. I guided that API work. It's really not that complex.
We did not work with a third-party. You absolutely do not need that. This is not like HP OpenView or a Salesforce, where you need people who know the system to get going. The API is very easy to use. There's not a ton of Salesforce-level business logic in it.
Basically, it's a NetFlow collector, ingester, database, and a UI front-end to make reports. While it depends on how much in-house capability you have, most people should be able to do this without a problem.
I can't give you numbers. It's something which is very hard to quantify. I have no idea what the investment is, and how do you calculate the return. Is the return that a salesperson closed a deal that they wouldn't have before? I'm sure somebody could, but beyond "good," I wouldn't know what to tell you about ROI.
It's a great product and the company is great. The company iterates and they move fast to add new things. When we did our first trials, almost two years ago, more than once, although not routinely, I would see a missing a filter set. You could filter on this element here, but you couldn't over there, and that's an error. I would put it in as a request and it would get resolved, sometimes in hours. The responsiveness is great.
It really takes a while to figure out the best way to use it for yourself. There is just a ton of information in there. Don't get dissuaded at first. They will help you work through it. You will need to understand your network, and you will understand your own network very well by the end of it.
The biggest thing Kentik has given us is the amount of visibility we have into our own network now and knowing what's going on at a given time. It's giving us the detailed NetFlow records and the ability to visualize them with different tables, with different timeframes, with different filters. It really provides a lot of detailed information on what's going on right now that we just didn't have. It may not have changed the business, but being able to know, "Hey this customer is always sending traffic to Spain and they stopped for two days. And then they started again." What's going on with that? The more information we have to give to our staff about what's going on in the network, the happier the customers are.
Things are moving in different directions at a global or industry level: Old-ops versus AIOps versus DevOps, etc. We are not a very large company, so a lot of these other things are, to my mind, kind of "buzz-wordy" to an extent. I know that they're moving in that direction with V4, which is good, but for me I just want to know exactly what I'm putting in and what I'm getting out. A lot of solutions I've seen in the past that have been very "hand-wavy". They will tell you when they see trends, but they aren't really good at the stuff I need to do. I need to say, "Hey, this line is overloading. What do I need to do right now?" It's been really great to have that ability to go in, put down exactly what I want to look at and get the answers right away.
Of course, if you want to answer more complicated questions, there are limits to how easy you can make it. If you are asking a very complicated question, you need to know what you are doing. It's like any real intelligence or NetFlow analysis tool. It can get complicated if you're asking complicated questions. But the great thing about it is that it does expose all that functionality. So if you want to ask a very complicated question, you can do that.
In terms of the solution's months of historical data for forensic work, we don't have much call for that level of analysis. We do have the 60 or 90-day retention for the full data set, but we haven't had the need to go back that far for that resolution. That doesn't mean we won't. But as of right now, if we do have an issue with abuse, where we need to look at the full data set, we're doing that within a couple of weeks or even a week. So for us, it has not been a plus that we've had the whole data set for the 90 days, but that's what we decided to do when went with the on-prem.
We don't do any public cloud. We don't use it for any threat stuff like that. I could see an enterprise using it to be able to pull in stuff from Google Cloud or Amazon cloud — known routers. We don't do any of those kinds of things. We're trying to figure out the way for us to do it, to feed the list of customers who have bots on their network back to our abuse team to handle. We have that information available to us. We just need to figure out the right way to handle it as an organization.
We're a very small personnel organization and we don't deliver a lot of products. We just deliver one product. We don't do security or cloud.
I wouldn't say it has helped to improve our total network uptime, but we're not really using it for that purpose. Obviously in an attack we would see what happened, we can see traffic shift and take action based on that traffic. But I wouldn't call that actual downtime. Things got overloaded and now they're not overloaded.
In terms of maintenance, it runs by itself. For the on-prem, we bought the hardware and they monitor the servers because it's their own software running on it. We'll get a mail saying, "Hey, the fan on this is funny?" We can just swap it out. Beyond that, there really isn't maintenance per se.
I just say to the business units, "Hey this data's here. What do you want?" I sit down with them and figure out what they want and how often they want it. I then walk them through how to use it for themselves. One of the great things about it is that it's really just a front-end GUI to a database. But that's also a downside of it because it's really complicated. Someone who doesn't know anything about a database is going to have a hard time. But, if I sit with someone in our company and say, "What is it you want to know?" I can walk them through how to do it and, at the end of it, leave them with a dashboard and they can do it. It really depends on their own initiative and what they want to use it for.
The number of users in our organization is less than ten. The sales team is now starting to use it, but they're not really using the product itself. They're using our internal sales page which makes API calls to their back-end to get graphs. They're not really users in the sense that they're using the UI, they're making queries, they're making dashboards, or playing with all the parameters. They just have a constrained view that the sales development organization said, "This is what I want them to know. Give it to them." Those few hundred salespeople are using it, but they're just really consumers of the data that I, in consultation with the sale development people, said, "This is the data you're getting." Beyond that, there are a few in the NOC, people in abuse, people in planning, and me, who use it for different purposes.
We're using Kentik for flow data, so we can do things like peering management and interconnection research, as well as capacity management. We also use it fairly heavily in our tech cost-reporting so we can see things such as how many dollars per gigabyte and how much we're using.
The deployment model is cloud, which Kentik provides.
Before using the solution, we had to do all these manual tasks, such as running all these queries manually, and building our tech cost-report used to be a two or two-and-a-half-week effort. Using Kentik, and the automation that it provides us, we've brought that down to a day or two, which is a massive time savings.
Our capacity managers would say the visibility and the dashboards have improved the way our organization functions. They can see, at a glance, which of our data centers is serving which countries, and that really helps them in their planning.
In terms of the solution's months of historical data for forensic work, we reformulated the way we calculate costs so we had to go back into the historical data and use that raw data to calculate the costs again. It wasn't necessarily forensic networking, but it was forensic cost and business work.
The queries, in general, are great, being able to add tagging to the flows.
Having the API access allows us to do a great deal of automation around a lot of our reporting and management tools. I'm the manager of automation at our company, so for me, that's the big winner.
We're using the dashboard, but that's not as high-priority a use case for us.
We're also using Kentik to ingest metrics. It's a useful feature, and its response time, whenever we're pulling back the data, is higher than our on-prem solution. That's one of the points in its favor.
There is room for improvement around the usability of the API. It's a hugely complex task to call it and you need a lot of backing to be able to do it. I should say, as someone who's not in networking, maybe it's easier for people who are in networking, but for me that one part is not very user-friendly.
It's very stable. We have not had any issues that I've heard of since it was brought online.
So far we've been able to more than double and triple our use cases for it and it hasn't really even hiccupped. The scalability seems good.
We have data centers all around the world and Kentik is in every data center. We plan to build many more, and it's going to be included into each of those builds. We're using it globally and expect to keep growing its use.
Technical support has been very excellent. We've had many novel use cases which we've had to send back to them and their response times, and the solutions that they've given us, have always been better than satisfactory. They've met our needs and then some.
We're also using a competitor, Druid, which is an open-source version that we host on-prem. So we're actually doing both cloud and on-prem solutions. At the moment we use each solution to verify the other.
We decided to bring in Kentik partially because of cost. Our particular instantiation of Druid is enormously expensive. The director of capacity management wanted to spin up Kentik, originally as a PoC, to see what kind of data we could get out of it for a lower price point. But then he quit, and we're still just going along with it. Eventually a decision will be made between them, but initially it was cost that pushed us toward Kentik, to see if we could get more bang for our buck.
A related difference between the two solutions is that with Kentik, and the infrastructure owned by someone else since it's a SaaS solution, the overhead on our side is much lower. We're also not necessarily on the hook for dedicated infrastructure costs. So, the price is amazing.
With Druid, for every node that's recording all of this data that goes across the routers and such, there are no limits. With Kentik, occasionally, we'll drop data or we'll get a spike in the traffic that will make the data unreliable. That does not happen in Druid because it's not metered the same way.
So far, we have seen ROI by going with Kentik through the time savings I already spoke about, with the ability to automate. But there is also ROI just straight up on the cost. It's considerably cheaper, and from what we've found, the data is just as rich. It's a very meaty data point in some of our planning for the future.
We have an annual contract with Kentik that we renew each year for a set number of licenses. We also have some burstable licenses which we can spin up and spin down, and those are paid as they are used. We have 20 of those licenses but we don't get charged for them unless we use them.
Rely on the customer service reps. That would be my biggest piece of advice because they've got all the good tips and tricks.
The user base of Kentik in our company is very small, about 15 people or less. That includes our interconnection managers, peering managers, and capacity managers, as well as my small team of software developers.
For deployment and maintenance of the solution it requires one person or less. Nobody needs to make it their full-time job, which is nice. Those responsibilities are spread across several people. One of the interconnection managers helps me, for example.
Overall, I would rate Kentik at eight out of ten. The fact that we can lose data whenever we have traffic spikes, which our business does pretty regularly, is what would keep any solution from a ten, because I can't always, for every data point, say this is accurate. Occasionally there's an asterisk next to the data.
The primary need is to really understand where our traffic is going, not just the transit ASNs — we know that — but where else is it going? How much traffic are we sending to those other ASNs?
Of course, DDoS is also another use case for us. We have identified DDoS.
And we're also using alerting now to help us understand when service owners are perhaps utilizing more than they should.
We had an event with one of our service centers, internally, and we were able to get them to understand that they were causing adverse effects for our customers on our circuits because they were over-utilizing circuits when they should not have been doing so. Kentik allowed us to peel back the entire network aspect of what they were doing and it allowed us to get an agreement from them that they would police themselves regarding their traffic, so that we did not have to do so for them.
And it allowed us to continue to have shared resources rather than duplicating everything. We were able to continue to allow them to utilize our transit, or our shared network connections, rather than saying, "Okay, you can't use this anymore. You have to duplicate everything." As a result, we're saving, in this case, about $40,000 a year, because we're not duplicating the network. If you understand what's happening, you can say, "Okay, this is what you can do, this is what you can't do." You can't get to that point unless you understand what's happening first, and Kentik allowed us to do that.
The solution has proactively detected network performance degradation or anomalies. For instance, right now I'm tracking another service center that is trying to provide a backup solution going to one of the cloud providers. What's happening is that their traffic is not hashing, it's not load-balancing over multiple circuits. I can easily prove that because I can pull up the circuits and see all of the flows from this particular service owner going over one circuit. That's an anomaly Kentik detected and I can go back to the service center and tell them. And it alerts me when it's happening, when it's getting too high, when it's about to saturate the circuit. It then tells me, "Oh, by the way, they're doing it again." That is very helpful.
The drill-down into detailed views of network activity help to quickly pinpoint locations and causes, especially if you set it up properly so you have all your routers and your interfaces. It's super-easy. In this case, it sends me an alert. I pull up the dashboard and it's all right there. It tells me everything. For example, when I pull up the alert that I got this morning it gives me a traffic overview and tells me, before I've done anything in the source or destination ASNs, which service center it is, if I have a separate ASN for them. It shows where it's going and how much traffic is spiking. It gives me the total traffic hits per second and packets per second, as well as source country, destination country, subnet — everything. It's telling me exactly who, what ports, and everything that is causing the anomalous traffic. If you have it pre-set-up, it just takes you through to the dashboard with everything already there. That's super-helpful because I can go back to the service center and tell them that they're saturating the link and this is how they're saturating it. I have proof.
I have also used Kentik's months of historical data for forensic work, especially with my old job. I was at a service provider previously and we got DDoS'd all the time, constantly. It was much easier for me to go back in time and look at some of these DDoS events and look at the signatures so I could just figure out which buckets most of them fit into. I could say, "Okay, I had these many incidents, these are the different types of issues I saw, and maybe if we take these actions we might be able to stop this kind and that kind of DDoS." It was much easier for me to go back and look at it as a holistic view.
In addition, it has decreased our mean to time remediation for anomalous traffic moments. For instance — and I'm not in the operations team — it has certainly allowed the operations team to detect and figure out what's happening much more quickly than they previously were.
At my previous company, it probably went from about a 30-minute detection to about a ten-minute detection, and that included making sure we understood which IP address was being attacked. As a service provider you can see what the interface is, but the question is which IP address on the interface is being attacked. That's the thing that you get much faster and you're able to surgically black hole that IP address, as opposed to shutting down the entire port for the customer. That kind of thing is huge.
Kentik has also improved our total network uptime. We're able to check the customer-effecting incidents much faster than we previously were. And at my previous company I can say wholeheartedly that it improved uptime because when you can detect so that you're not shutting down ports, you can get to the router faster, and the router is not falling over anymore because it's being attacked.
In terms of improving on the number of attacks we have to defend, at the previous company I would say it did because I did all the analytical work, and we were able to determine a couple of different types of attack that we might be able to defend a little bit better. Here, it has reduced the number of internal incidents we've had. Service owners are not really thinking properly about how they're using the network and have service-effecting incidences that they didn't know about. If you point it out, they stop doing it, if you have data for that. Before, we weren't really able to point it out in a way that they understood. Now, it's much easier for us to detect it, clearly determine that it was them, and then say, "Could you stop this? Don't do that."
The analytics part is really important for me. I have seen some things pop up periodically that I did not expect, so it is important for me to dig into them. The ability for me to look at the traffic and see where it's going to is extremely important.
I really love the Data Explorer. I use it all the time to go in and craft exactly what I need to see. I'm able to then take that story and explain it to the executives. I've done that a couple of times and it is helpful.
And I'm really liking the alerting. It's super-helpful.
In terms of the solution’s real-time visibility across our network infrastructure, I have not been able to find any other monitoring or netflow visualization tool that gives me the kind of information I get from Kentik. If I need to take a deep-dive into something that I see, it's really easy for me to do that. Whereas with most other things, I have to use five or six other tools to get that kind of data, with Kentik, I have it all in one place. Data visualization is extremely important.
I've checked out the V4 version of the interface and it's still a little bit clunky for me to use. I still go back to the old interface. That's definitely one that they still need to work on. It doesn't seem like everything that you get in the V3, the older interface, is there. For instance, I was trying to add a user or do the administrative tasks in V4, and I couldn't figure out where I was supposed to do that. The interface just wasn't working for me so I went back to V3 to do that stuff.
Also, with the alerting page, that traffic overview page, sometimes I really want to share it with someone. Usually, you can get a quick URL on most of the other pages to share that particular view, but I can't do that on the traffic overview page that is given to me from an alert. That would be really helpful.
Generally speaking, I have found it to be fairly stable. Do they have periodic outages? Yes. But almost never is the whole thing down, it's just one aspect that is down. I haven't really had an issue with it.
At my previous company we probably had one of the largest installations ever. I would say Kentik is fairly scalable. That company is one of the biggest ISPs in the world. They had 200,000 netflow flows per second. So it's pretty scalable. The scale I'm dealing with now is so minimal in comparison. It's a different world.
I have used technical support a few times and they have been knowledgeable and easy to work with. I really like them. I haven't had any issues at all. I've dealt with a lot of vendors, so it's like a breath of fresh air for me.
If you've ever dealt with Cisco before, or a telecom vendor, you know what I mean. But I send an email to Kentik and within a few hours I've got something back asking me a couple of questions and helping me fix the problem. It's a vastly different experience because if you try to do that with Cisco, for instance, or one of the network equipment vendors, you're going to be in for a very long process. And if you talk call a telecom company, same deal. You're probably not going to get a human the first time. If you get an email, it's going to be automated. It's just going to take forever. But Kentik is very quick.
We have DDoS mitigation providers but they don't really provide the analytics. They detect and mitigate, but they don't really provide you any information on what's really happening.
At my previous company we had tried, several times, to build our own solution, and I can tell you that it was not terribly successful. We could only ever get analytics on one very small use case, as opposed to all of the use cases that Kentik has. I was intimately involved with each one of those attempts, so I can tell you it was not easy.
At my previous location the solution was on-prem and I helped with the entire process of getting it into the network. I helped them do the proof of concept, I helped do the executive briefing, I helped do the modeling of the entire implementation, and I also helped and worked on the implementation itself.
Because it was an on-prem setup I found it pretty straightforward. We had to do a whole bunch of work on the network to get it working properly because you have to change all of your configurations to make sure it's sending to the right locations, but otherwise, it went very smoothly. They told us we had one of the fastest implementations ever. From the time that we actually started the implementation, it was only about a month, and we actually got all the routers in there too. And that was with a huge, massive, on-prem installation. Probably one of their biggest ever.
For the servers, Kentik worked with our IT department, but for the network stuff, for anything that was on the routers, we deployed it ourselves.
At my current company, they have the cloud solution, and I was not a part of the installation. I'm not sure why they decided to go with cloud versus on-prem. I don't understand it. I know why my other company went on-prem but I don't know why they did cloud versus on-prem here.
I worked with Kentik directly and I had a very good experience with them. They were knowledgeable, helpful, and easy to work with. They used Slack and it was very easy for us to communicate with them, even across teams. They were working with our IT team and the backbone engineering team. It was very easy.
We are working in one country where transit is very expensive and Kentik has allowed us to identify those peers we're sending traffic to so that we can then get onto the exchanges in that country and significantly reduce the cost of our transit in that country.
If you're talking about Japan, it tends to have higher transit costs. We brought up our exchanges and then we targeted a lot of peers so that we're not spending, five bucks a meg or so to send traffic to those peers.
I don't have a final cost analysis yet, but I can tell you that the IX is much cheaper than the transit is. And our customers are getting much better latency. Our latency numbers have decreased by about ten percent because we're peering directly with the customer at an exchange. That's one place I can say the ROI is great.
We did the same thing with any of our transit providers as targets. If we can privately peer with someone somewhere, rather than have them go over transit, we target those peers and pull them off of the transit. Anytime we can do that, it's much cheaper.
I believe pricing is by device, the number of devices with BGP sessions, and then by the amount of flow you expect from that device, if I remember correctly. We did ours on a yearly basis. That was easier for us. I think they will happily do multi-year if you want.
Carefully analyze your routers and how much flow they're sending to a collector. I would also suggest if you can minimize the number of routers that have to send BGP, so you have a good enough view of the BGP, but you don't have to have every router sitting at BGP sessions, that might help. Those are suggestions for implementation.
The biggest lesson I have learned from using Kentik is "don't do it yourself." At my previous company they were being very stubborn and they didn't want to use an off-the-shelf product, so I went through three iterations of a netflow interface trying to get it correct, and I kept telling them, "Okay, but there's a product out there that does this. So please let's stop spending all this money." And they went so far as to spend a couple of million dollars on hardware to deploy it out to the network and everything, and we still ended up going to Kentik. That is one of the biggest things I learned, that sometimes you cannot do it all. You have to go to someone who's an expert in a particular kind of big data, and that's what they are.
We don't currently make the use of solution's ability to overlay multiple data sets such as orchestration, public cloud infrastructure, network path, or threat data onto existing data. But with the public cloud providers we are working with, we are looking at pulling in VPC logs so that we can see if we're getting the performance that's necessary out of our public cloud providers. That's the next step with this product for us.
We're not pulling in other data sources like logs or ThousandEyes data, for instance, at this point. We did talk to Kentik about trying to pull ThousandEyes data in and marrying it with their product. But not quite yet. I hope to add that into the product as well at some point. We do use BGP as another metric to figure out what's happening with the different paths.
We probably have about 30 users. Everything from our monitoring team is in there so they're working with me on pulling together an interface that uses the API to pull the data out of Kentik to put it on one of our internal interfaces. That way, some people won't have to log in to get some data. It's more of an executive view for them. But some of our executives actually have access to Kentik too. We have a couple of network backbone engineering executives who have access and who do look at it. Then we have a lot of our operations team, the network architecture and backbone engineering. They all have access. It's a wide range.
In terms of deployment and maintenance, there are two of us who put stuff in. I've created users. One thing we are going to do is automate getting the routers in there. We would generally suggest, and this is what I did previously, that you write scripts to do your updating of everything, plus you have the scripts that just does it automatically for you. That's super-helpful.
In this environment we don't have that many routers in it. It's about 40 to 50 routers at the moment. We mainly use it on their engines. We're starting to work with our security team to get it from data center to data center as well. That's really limited by our need for security rather than how we would use it entirely. At my previous company, when I left, we had 667 routers in it. It was used everywhere for everything. We absolutely have plans to increase usage of Kentik at my current company. I'm working with our security team to get approval to do that. I have to meet their security needs in order to expand the usage.
Honestly, it is one of those products that I would suggest to almost any network operator. I would go with a ten out of ten as my rating. I have not felt like this about any other company out there. It has just been so useful for me on so many different levels from operations, to ROI. It's just helpful.
We have put it on half of our large monitoring screens. Sometimes, it is actually easier to identify and attack incoming traffic using Kentik, than it is to use our own gear.
Even when we know what the traffic is, it allows us to jump directly into the next steps of our process more quickly, since we can visually see everything in one place and on one screen through the customizable dashboards.
Instead of just total traffic in bits or packets, we can get protocol, destination port, TCP flags; everything you might want.
Kentik has been remarkable at anticipating the design requirements of their customers. They have provided everything that I might want already. After using it for over six months constantly, I am still discovering new things.
The only times I’ve felt that “I wish I could use this to XYZ,” I’ve contacted support and it turns out that I can do that already. However, I just didn’t know if I could do it using the existing controls or via a combination of query types.
Perhaps a better explanation would be to see how tagging is captured and a method of comparing my tagged interfaces on Kentik’s side. Right now, I can go in and look at all of the interfaces that they’re receiving the flow for and also sort/filter it, but there is no way for me to easily compare them between my nodes. I need to add, though, that’s really not a missing feature of their product; it is just a way to help troubleshoot my own (potentially broken) systems.
I add the tags to my own devices, not them. However, if we’ve made a mistake on our side, it’s a basic row-by-row comparison. I believe there is a way to use their SQL query feature to pull a better comparison but a method of using the GUI would be nice.
I have used this solution for about 6-8 months. For five months, I have used it as a standard user. Now, my organization created a separate admin account for me, so in total I have used it for eight months.
We have not experienced any stability issues. Other than the planned maintenance, which is short, it is always available and working great.
There have been a few very minor bugs; for instance, the auto-refresh was not working on the dashboards. When we notified them of it, they responded in less than an hour; they had replicated the issue and were working on a fix. A day later, it was done.
We have not scaled the product past the current level we are at. However, I don’t see that could ever be an issue. You just send them the flow from your devices.
If you’re scaling, you make sure your interfaces are sending the data and you're golden.
The level of technical support is beyond any vendor that I have ever worked with before.
The service is totally hosted by Kentik, with a web portal and API. I have not had issues with it being available to use. I have not tried to get to it expecting it to be available and had it not load. Occasionally we’ll get an email or pop up notification on the Web UI that planned maintenance will take Kentik down for an hour or so, these come a few days in advance of the planned service.
The only issue we have had of a technical nature was with their dashboards. Dashboards are a custom page you build and layout manually with different “Data Explorer” queries, then you turn on auto-refresh and let it continue to build the graphs as time moves on. This auto-refresh feature stopped working after an update to the Kentik UI’s look and feel. When we noticed it was not functioning we sent them an email, they responded back quickly and told us they had replicated the issue and were going to work on a fix. It was the next day when they told us to try it again, and they had indeed fixed it already! I rarely get such prompt attention to an issue.
I have used SolarWinds in another company. You get a very simple, non-configurable type of view with green, yellow, red and ingress/egress numbers. It doesn’t compare to the analytical capabilities that Kentik has.
It was set up before I joined this organization.
I am not a part of the purchasing or evaluation in any way. We still use Cacti for general stuff, but Kentik has replaced it on half of our boards so far.
While I was not a part of the implementation, if you know how to set up NetFlow on your device, just point it at Kentik. They have another setup option for a sensor that lives in your network. I have only heard of it; never used it or spoken to anyone that has.
This product is easily the best network monitor that I’ve ever seen or heard about.
The DDoS alerting was, at first, the most useful. It was able to alert the entire team of more than 20 that the issues with the website were actually network based, instead of, say, bad code. In time, we mitigated the DDoS attack surface, so the usefulness is still there. We just don't see it every day.
Now we use Kentik for more nuanced traffic insight. This is ad hoc usually, but we do email 'peering' reports daily to the lead network engineers. This gives them some view into new traffic patterns we are picking up in IXes.
I find it very useful to see when traffic destined for a prefix that we prefer ingress on in the East Coast actually ingresses or egresses on the West Coast. It shows the difference between BGP paths vs. regional expectations.
The alerting ability is greatly improved. I think there is some movement still to make this into a 'dumb mode' vs 'expert mode'. There is the SQL-like syntax, but that is expert+.
I have used Kentik for 2.5 years.
We rarely, if ever, had any stability issues.
I have not had any scalability issues.
Technical support is second to none.
We used in-house, hand-built things. All based on binary RRDs or worse.
Initial setup was very straightforward. Nothing I needed too much help with.
There is a large difference between BGP and normal nodes. I don't think this plays out to the best for the customer or Kentik. To be able to split off the BGP vs PPS requirements would be good.
We've evaluated almost everything except SiLK.
Use the technical support if you need it. They are excellent.
DDoS Alarming allows us to get a feel for the bandwidth of an attack and determine if mitigation is needed to prevent collateral damage. Secondly, the flow analysis lets us look at how traffic is transiting our network. This allows us to optimize metrics to reduce cost.
Kentik answers the flow question: what are my flows, where are they are going, and what can I do to better optimize my connectivity. Kentik also baselines flow behavior and can alert you when there are abnormal flows such as DDoS.
We now have real metrics on DDoS attack vectors and use the alerting dashboard to gather information used in CLI filters and eventually in RTBH.
Firstly, my Dashlane password manager attempts to fill in the dimensions field for me, so I just turn off my password manager when that occurs.
Secondly, sometimes its difficult to order the dimensions correctly when trying to make Sankey flow diagrams. It’d be nice if there was a knob somewhere in my users settings that allowed me to make the dimensions box a single column from top to bottom so I don’t have to spend extra time tying to drag a dimension into the correct column to get the order correctly.
I have used Kentik since April of 2016; usually four times a week.
We have not encountered any stability issues.
We have not encountered any scalability issues. Kentik allows us to set sampling of flows on a per device basis.
Technical support is proactive in letting us know when we accidentally stop sending them flows. Additionally, when asking for help in configuring BGP settings, they have expert level knowledge in CLI configuration of network devices.
We did trials on a few competitor solutions. They were too slow, too complex, and required lots of on-premises touches to fix their equipment. They crashed often and they had poor customer service.
Initial setup was relatively straightforward. We had to evaluate which method of flow export/ingestion to use, implement the samplicator instance and then send Kentik the flows. We also had to exchange some information for BGP and SNMP settings.
I’ve told others that they charge based on the amount of devices and provide a discount for education customers. In my role, I haven’t been exposed to the cost of the product.
We looked at Plixer Scrutinizer.
If they haven’t already decided to use it, I typically log into my portal and show them it’s capabilities. Then, I let them know they can get a trial for their network. If they have already decided to use the product, then I tell them they are in capable hands, because the customer support knows networks and servers very well.
Kentik is a mature software product and is provided as SaaS. This is very valuable to me, because I don’t have to maintain a server, worry about resources, updates to software or OS, hard-drive space, or backups. I can just use Kentik to get the data in the format I want, such as reports or ad-hoc information.
Good NetFlow visualization and analysis tools are valuable to anybody who needs to understand the traffic that flows in and out the network.
This solution helps with planning, security, and troubleshooting. Kentik’s data explorer allows easy access to all data, grouped by any variable. You can do a quick overview of the top ten users or peer performance with very little effort.
One of our Network Operations Centers has a large overview screen with a web browser that shows Kentik and the data explorer running. This provides a constant overview of live traffic sorted by the source port.
We use Kentik to monitor the network and get alerts from its alert module if there is a DDoS or other attack on our network.
Kentik is constantly improving. I have seen their alert portion grow this year to include a new Beta that allows you to use automatic mitigation with multiple platforms.
Kentik is used when customers call in to troubleshoot their internet service and to decide on new peering partners.
I would like to see more granular user and security rights. Currently, a user can be a member or an administrator. I would like to limit what a user can see, be it IP or interface. I would like to be able to give my customers access to the data explorer with just their data.
We have been using this solution for about a year.
We did not encounter any issues with stability.
We did not encounter any issues with scalability.
The technical support is very good. Emails to email@example.com are answered quickly and competently. Kentik keeps in touch, listens to feedback, and cares.
We used multiple NetFlow products. Kentik shines with the ease of running reports and looking at data.
I was involved in the installation. The setup was easy. Exporting NetFlow records is all that is needed. I also setup a BGP session with Kentik. This allows Kentik to see the AS path and to record it with each record as well.
The benefit in our case outweighs the cost. We use Kentik at the core of our network, which provides us with a central and detailed view of our traffic.
We have used open-source options such as NfSen and SolarWinds.
I would suggest giving it a try. A demo can be set up in no time. You can see for yourself whether or not you like it.