We changed our name from IT Central Station: Here's why
Get our free report covering Databricks, Dataiku, IBM, and other competitors of Microsoft Azure Machine Learning Studio. Updated: January 2022.
564,143 professionals have used our research since 2012.

Read reviews of Microsoft Azure Machine Learning Studio alternatives and competitors

Data Scientist at a energy/utilities company with 10,001+ employees
Real User
Top 20
Has a good feature set but it needs samples and templates to help invite users to see results
Pros and Cons
  • "Imageflow is a visual tool that helps make it easier for business people to understand complex workflows."
  • "The product needs samples and templates to help invite users to see results and understand what the product can do."

What is our primary use case?

I am a data scientist here and that is my official role. I own the company. Our team is quite small at this point. We have around five people on the team and we are working with about five different businesses. The projects we get from them are massive undertakings. Each of us on the team takes multiple roles in our company and we use multiple tools to help best serve our clients. We are trying to look at creative ways that different solutions can be integrated and we try to understand what products we can use to create solutions for client companies that will be effective in meeting their needs.  

We are personally using Databricks for certain projects where we want to consider creating intelligent solutions. I have been working on Databricks as part of my role in this company, trying to see if there are any kind of standard products that we can use with it to create solutions. We know that Databricks integrates with Airflow, so that is something that we are exploring right now as a potential solution for enabling a creative response. We are exploring the cloud as an option. Databricks is available in Azure and we are currently figuring out the viability of using that as a cloud platform. So we are exploring the way Databricks and Azure integrate at the same time to give us this type of flexibility.  

What we use it for right now is more like asset management. If we have a lot of assets and we get a lot of real-time data, we certainly want to do some processing on some of this data, but you do not want to have to work on all of it in real-time. That is why we use Databricks. We push the data from Azure through Databricks and work on the data algorithm in Databricks and execute it from Azure with probably an RPA (Robotic Process Automation) or something of that sort. It intelligently offloads real-time processing.  

What is most valuable?

Of the available feature set, I like the Imageflow feature a lot. It is very interesting. It gives me clarity on the execution of a process. I can draw the complete flow from start to finish in the exact way that I want it to execute. It is more visual and it is also easier for the people in businesses where I make presentations to understand.  

When I demonstrate a process to a business and show them the approach I am taking using code and technical language, then of course not many are going to understand that. But when I show them the process in terms of the graphical layout Imageflow helps provide, then they will be able to understand it much easier. They understand why I am choosing a particular way of executing the process and why I am taking certain steps in the way I have chosen to do it. The point is to help other people understand the solution more clearly.  

What needs improvement?

I think the automatic categorization of variables needs to be improved. The current functionality is not always efficiently identifying the features of the data that is collected. Probably that is the only thing I can think of. Apart from that, I have not explored the product enough yet to go into more depth because there is only one asset project that I have taken on right now. Because I own this company, I have been doing more to run it than to explore this product very deeply. But when you get any form of data inside there, if it could understand what type of variables there are and what features the data has, it would help massively in taking processing to the next step. If it does not exactly identify the variables you may have to modify them a little. Apart from working with Databricks to understand its capabilities, I am also trying to learn Apache Spark right now. Some members of my team want to work with Apache Spark as a solution and at this point, we are evaluating both and we are planning to use Spark or Databricks.  

As far as what might be added, some custom algorithm samples would be useful. All of the other products of this type — Azure, AWS, SageMaker — they all have customizable algorithms. You have the capability to implement a sort of workflow from that by modifying things in the sample and changing it to fit your purposes. Probably that is something that might help in doing some small NDP (Near-Data Processing) development. It might not help in the project directly, but it will help while we work on some NDP development of our own so that we can quickly evaluate how something is going to work. Templates or other samples could make working on things easier.  

That would also help massively in getting people to understand the potential of what the product can actually do. But I also think not many people would strongly agree with this. Many people go to the first solution they can think of that they know very well already in the IT field even if they could imagine that something could be better.  

To get the value out of this technology, people will need to come to accept it. Technical people will accept Databricks more if they understand that this is something that they can use and start working on without a lot of experience. Adopting it will take time for new users who have no experience. But to feel like they can have success with a product, they have to execute something in a very short time and see how it can work. When you talk about AI — or really when you talk about anything new — people do not initially want to invest the time in discovery. These processes do take time to learn, but with templates or samples, you get to see immediately what the possibilities are and what you might get out of it. Then when they try something of their own and are able to get it working in less than a week's time, they will be encouraged to look into the product and the technology some more.  

For how long have I used the solution?

We have been using the Databricks product for approximately three months.  

What do I think about the stability of the solution?

It is very hard to comment on the stability right now. We will need more time to experience the product in actual usage to render any opinions about stability accurately at that level.  

What do I think about the scalability of the solution?

We have not really gotten to the point of scaling and testing scalability at this point. We only have two people involved with the product. One is a data scientist and one is a data engineer.  

How was the initial setup?

The initial setup was not complex at all. The documentation is good. It is clear and not very difficult to understand. Because the documentation is good, the installation is fine.  

We did the implementation by ourselves — within our team and with the help of the documentation. But I would not say that we have already deployed the model yet. This is an ongoing process, as there are certain inputs that changed over time.  

So we have not implemented the product completely, but we have gotten to advance with the product and our understanding of it. It is good, but our company is still trying to get much better data from it. At this point, it is like the data is just junk and more junk. So we are now working toward that goal of improving the result. Whenever the data result gets better, we'll try to implement the workflow to see how it performs. I would say it will probably take two to three months more before we actually get good data.  

Which other solutions did I evaluate?

I did have some experience with SageMaker before looking at Databricks, but apart from we have not been looking into any of the other solutions that are available. We were just exploring a few of the different solutions that the members of the team already have experience with. Most of the team came to our company with some experience using Azure, and most of them came with experience in EBS (Elastic Block Store) and some of them come with experience on various other platforms. We wanted to mine that knowledge and just explore some of these possibilities to see which one works with all of us as a team.  

What other advice do I have?

On a scale from one to ten where one is the worst and ten is the best, I would rate Databricks overall as around a 7 or 7.5. If we had more experience with it and could be sure we had a solid understanding of what it could do and the reliability, I might recommend it with a better score. I do not think I should give it more than a seven for now.  

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Scientist at a university with 5,001-10,000 employees
Real User
Top 10
Super scalable, awesome stability, open-source, and cost-effective
Pros and Cons
  • "It is open-source, and it is being worked on all the time. You don't have to pay all the big bucks like Azure and Databricks. You can just use your local machine with the open-source TensorFlow and create pretty good models."
  • "It would be nice to have more pre-trained models that we can utilize within layers. I utilize a Mac, and I am unable to utilize AMD GPUs. That's something that I would definitely be like to be able to access within TensorFlow since most of it is with CUDA ML. This only matters for local machines because, in Azure, you can just access any GPU you want from the cloud. It doesn't really matter, but the clients that I work with don't have cloud accounts, or they don't want to utilize that or spend the money. They all see it as too expensive and want to know what they can do on their local machines."

What is our primary use case?

With TensorFlow, it is all just personal research that I've done. I'm hoping to bring it to work. TensorFlow is one of the most commonly used platforms for machine learning and deep learning. I specialize in natural language processing and computer vision. Right now, a lot of the clientele work that I have is basic data science of just cleaning and managing data and getting it to fit. I am planning to give a nice example of what we could do by building models that actually predict things that they're looking to do. The models that they have right now are literally just basic, statistical, and linear regression models. They can easily be outperformed with just a very shallow Deep Neural Network.

It is usually on-prem. We run all programs on local machines. A lot of our clients are more old school.

What is most valuable?

It is open-source, and it is being worked on all the time. You don't have to pay all the big bucks like Azure and Databricks. You can just use your local machine with the open-source TensorFlow and create pretty good models. 

What needs improvement?

It would be nice to have more pre-trained models that we can utilize within layers. 

I utilize a Mac, and I am unable to utilize AMD GPUs. That's something that I would definitely be like to be able to access within TensorFlow since most of it is with CUDA ML. This only matters for local machines because, in Azure, you can just access any GPU you want from the cloud. It doesn't really matter, but the clients that I work with don't have cloud accounts, or they don't want to utilize that or spend the money. They all see it as too expensive and want to know what they can do on their local machines.

For how long have I used the solution?

I have been using this solution for a year.

What do I think about the stability of the solution?

It is awesome.

What do I think about the scalability of the solution?

It is super scalable. You can parallelize it. You can even visualize all the different nodes with TensorBoard. There are so many cool apps you can use. It is heavily used in big industries.

How are customer service and technical support?

I have not used support at all.  

How was the initial setup?

It is not hard at all as long as you read the documentation.

What's my experience with pricing, setup cost, and licensing?

It is open-source software. You don't have to pay all the big bucks like Azure and Databricks.

What other advice do I have?

I would definitely advise understanding your data and what you're doing because it may not be worth the time if you're going to dive deep into Deep Neural Networks or even just basic Convolutional Neural Networks when you don't really need to. What's the point of building a regressor that is going to be scalable with TensorFlow if all you're trying to do is basic statistics? It depends on the size of the data science work that you're doing.

You can just use your local machine with the open-source TensorFlow and create pretty good models. Getting it into production depends on the security of the system. I don't know what the data engineers are going to have to do to close the pipelines.

I would rate TensorFlow a ten out of ten any day.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Rushabh-Shah
Manager - Analytics at Bigtree Entertainment Pvt Ltd (Bookmyshow)
Real User
Top 20
Stable, scalable, and offers excellent drag and drop features
Pros and Cons
  • "The solution has excellent drag and drop functionality. There's no need for coding."
  • "The solution just lacks in terms of data visualization. That is why we use Tableau and Qlik in our organization. They help to pick up the slack. If data visualization was added, Alteryx would be a very good tool, and much more complete."

What is our primary use case?

We primarily use the solution for ETL purposes and also for a lot of our modeling and scheduling.

What is most valuable?

The solution has excellent drag and drop functionality. There's no need for coding. There are nodes ready for us to use as well. It makes everything extremely easy. The user-friendliness is a big draw for us.

What needs improvement?

The solution just lacks in terms of data visualization. That is why we use Tableau and Qlik in our organization. They help to pick up the slack. If data visualization was added, Alteryx would be a very good tool, and much more complete.

The solution could maybe use a connector as a direct input for MySQL or something like that. Right now, we use ODBC connectors for MySQL. If a direct connector was available on this product that would be very helpful, especially for our organization.

For how long have I used the solution?

I've been using the solution for five years.

What do I think about the stability of the solution?

The stability of the solution is good. We don't have issues with the product crashing and we don't find there are bugs and glitches that affect its quality. It's reliable.

What do I think about the scalability of the solution?

The scalability is very smooth. If a company needs to expand the service, they should be able to do so easily.

How are customer service and technical support?

The technical support is great.

There's a very good community that surrounds the solution. However, you can also reach out to Alteryx directly as well. They're just a call away if you need them. I would say our organization is satisfied with the level of service they provide. 

What's my experience with pricing, setup cost, and licensing?

The pricing of the solution is a bit on the expensive side. We use an Alteryx Server and Designer as well. The server itself is a bit pricey.

What other advice do I have?

We're just a customer. We don't have a business relationship with the company.

Our organization is dealing with large amounts of data, yet we do not have the money to buy a Microsoft product like SQL. Alteryx is a pretty good alternative to handle the ETL part of the workload.

We've been largely satisfied with the solution.

I would rate the solution eight out of ten overall. We've been happy with the results that have been provided to us so far. If the solution offered a connector as a direct input for MySQL, or better data visualization, I'd rate it higher.

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Get our free report covering Databricks, Dataiku, IBM, and other competitors of Microsoft Azure Machine Learning Studio. Updated: January 2022.
564,143 professionals have used our research since 2012.