Please share with the community what you think needs improvement with VMware SRM.
What are its weaknesses? What would you like to see changed in a future version?
The challenge it has is with the speed of failing over. Sometimes it can cause a bit of downtime during switchovers. Sometimes you realize that when you are failing over you can have downtime due to the fact that you're stepping down on one side and powering up on another side.
Unfortunately, SRM is not stable and it therefore requires continuous monitoring. If there are any issues associated with the Center, SRM doesn't work well and as a result, it generates a lot of tickets and productivity drops.
We've had configuration issues on occasion. We start to fail over, and then we have to call it off because the configuration is not right, or the data stores aren't configured correctly in the secondary data center. Oftentimes, it is just the experience level of the team, and we have to bring in the vendor to help and validate our configuration.
VMware SRM does not have the capacity to do DR tests. We had issues whenever we were doing tests with the root cause analysis. We had 70 to 80 percent successful results because the vCenters were overloaded and that was the reason that we were having capacity issues. We have been experiencing an additional problem when adding a regular VM in the replicated storage. By default, it will show an error. However, there is not any monitoring mechanism that would show you are not supposed to have a regular VM which is non-VR in the replicated SRM storage. Whenever we used to do testing we had to figure out that a regular VM is there and remove it manually.
I would like to see this solution be more scalable. We currently use our security in addition to VMware SRM.
VMware introduced the two next versions of the solution. They are SRM 6.5 and 6.7. I don't have any experience with these two products. However, if I was to talk about version 6, which we are using, at that time we faced a problem specifically when we create recovery plans. After the creation of the recovery plan, sometimes an issue happened in the GUI, in the Center. I'm not sure if that has since been resolved. We've faced issues with the licensing. If you don't choose a specific license, you can only cover around five or ten virtual machines. The biggest issue for us is that this product does not have any demo for customers. They should offer demos so that clients can try it out before they commit to buying a license.
I would say VMware has room for improvement with this product. I am sure it is probably better in their 7.0 version, but there are still some bugs in the 6.O version that relates to using it with different browsers. I think a lot of what I run into is related to the 6.0 version. I believe a lot of those bugs have been fixed in the UI once you upgrade to 7.0.
The decision to move to another product is a matter of room for improvement around functionality and requirements that we had with AWS and moving to the cloud. We are not going to be procuring any more licensing for SRM when we make the move to the cloud. We were looking at a cloud-native solution in order to provide the same functionality as the SRM provides but in the cloud. That is just a matter of the changing environment. If the functionality of SRM could be replicated in the cloud, that would be the improvement we are looking for in the product.
When used in conjunction with storage replication software it is not possible to separate and failover an individual VM. When the VMs are sitting on the same storage LUN, the granularity is not sufficient. Ideally, we should be able to choose one virtual machine and separate it from the rest. If the price were more competitive then it would be very good.
I would like to see a detailed history of the events for each site because I have found difficulty with that. The two vCenters have to be synchronized, which sometimes gives us problems because Keberos does not tolerate more than five minutes in time difference.
The interface is not easy to use and can be made more user-friendly.
The configuration process could be improved.
I would say a lot could be changed to improve the product in terms of troubleshooting and supportability. I think about every two weeks, we had an incident somewhere in the software stack. There were problems that we faced with the vRA (vRealize Automation) multiple times. We had to fix the problem and redeploy it more than once to get it to work properly. Then we had to completely redo our replication. That is a big drawback because it means we had to cancel other plans that had already been scheduled. To summarize it briefly: users need a lot of enhancement to the quality and functionality of the software for it to be very useful. For support of VMware version 3, a more recent patch needs to be released. There were a few times that fixes were released but we have already upgraded to those latest levels and the known compatibility problems are not fixed. The replication advantage the product has does not work for all VMs. For example, if you have a large difference in change frequency within a VM and the VM is big — in one case our VM was 42 terabytes — the data just does not get across in the migration. So the product is really not able to handle either very big VMs or a very large change frequency. I remember we tried it with one Data Mart SQL database where we do continuous ETLs (Extract, Transform and Load). The data reloads on a daily basis. The replication takes too long to complete. The next afternoon after the migration started, we were more or less at 50%. By the evening, we were at 70%. We scratched the data reloaded and started all over again. We found no means to accelerate that. By the time you appear to be progressing, you have to redo the migration. So that is another disadvantage when trying to use SRM. There are a lot of minor things that need to be in place on both sides of the migration to make it work. If something goes wrong in the middle of the migration, you will have a tough time trying to troubleshoot it. The product has an insufficient method of logging, an insufficient level of operability, and an insufficient level of detailed technical tracing. This lack of information makes it so you can not immediately pinpoint the issues to troubleshoot them. It cost us multiple weekends of lost time while trying to troubleshoot because we do not get this information from the product. But the things I would like to see for sure in a new release are: * Fix all minor connectivity issues with auto-recovery. * Auto-diagnose, auto-identify, and auto-correct issues as they occur and at least try to fix the issues a few times before allowing it to fail. If the fix is not successful then at least inform users that the fix attempt was made and the particular area where the issue is suspected so that users do not lose hours to troubleshooting. * Open up the solution to be more environmentally agnostic. It should not be so strongly integrated with vCenter. It should be loosely coupled with vCenter and allow other solutions. * Make the product more robust and much faster. Many replications we have initiated took two weeks before going to the switchover. A lot happens in two weeks. It seems like an eternity when you have no idea why replications stalled over that long of a period of time.
What I think can be improved is the data replication aspect. For example, I know of another repetition solution called RP for VM. I don't really know how to use it since I've never used it before, but I've read about it. I know its features and I've spoken to some IT practitioners who have experience with RP for VM, who work with Dell EMC, and they gave me the feeling that RP for VM is better than VMware replication technology. The argument is that RP for VM has the ability to get your application going even when there is a loss of connectivity. Whereas in VMware you have to have something like 50% connectivity for the configuration. So in that respect, RP for VM has that feature which makes it better than VMware solutions. I guess VMware should make sure they are on top of their virtualization and data replication solution, more than every other company. Overall, I can't point to any other thing, apart from whatever feature makes some people think artificial DNE is better than the replication application and SRM. If they can just take care of that then I don't think there's anything else.
Cost is definitely an area where the product could be improved, I'd definitely say it should have cheaper pricing. Definitely the product could be faster and of course in IT everything is about pricing.
There are sometimes performance issues when working with outside links, and it would be better if this were improved. You need a lot of knowledge to work with the interface because it is not really easy to use, and it would be great if the dashboard were simplified.
If you have a failover case, you need to work on it manually. It would be helpful if this could be automated. It would simplify things.
DR drill report is good but needs to be improved, and the replication monitoring feature is not available.
I would like to see better integration with other storage solutions. I would hope to see that within the next two or three years.
We all know it's really hard to get good pricing and cost information.
Please share what you can so you can help your peers.