We changed our name from IT Central Station: Here's why

Weka OverviewUNIXBusinessApplication

Weka is #3 ranked solution in top Anomaly Detection Tools and #4 ranked solution in top Data Mining tools. PeerSpot users give Weka an average rating of 8 out of 10. Weka is most commonly compared to KNIME: Weka vs KNIME. The top industry researching this solution are professionals from a comms service provider, accounting for 32% of all views.
What is Weka?
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
Weka Buyer's Guide

Download the Weka Buyer's Guide including reviews and more. Updated: January 2022

Weka Pricing Advice

What users are saying about Weka pricing:
"Currently, I am using an open-source version so I don't know much about the price of this solution."

Weka Reviews

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
Abuto Vincent
Data Scientist at Freelancer
Real User
Top 5Leaderboard
Relatively stable with excellent accuracy and there's no need to know coding
Pros and Cons
  • "With clustering, if it's a yes, it's a yes, if it's a no, it's a no. It gives you a 100% level of accuracy of a model that has been trained, and that is in most cases, usually misleading. Classification is highly valuable when done as opposed to clustering."
  • "The filter section lacks some specific transformation tools. If you want to change a variable from a numeric variable to a categorical variable, you don't have a feature that can enable you to change a variable from a numeric variable to a categorical variable."

What is our primary use case?

I've handled different projects with this solution. After college, I've handled different projects. The most recent project that I handled was for a company from India. They were looking for a measure classification in regards to the type of engines that cars have, and the pollution levels that they have.

There was a mixture of text data that had to be classified. There was the need to transform the text data to a data type that would be easily classified. When employing text data you can't do classification directly. I had to clean the data and program all the variables to suit the required information.

How has it helped my organization?

The client has not provided me with a review yet.

However, I have benefited greatly as it's given me the confidence to tackle tedious projects that clients did not want to tackle in either Python or any other data analytics or data science software. By gaining that confidence, knowing very well that you have enough analytical skills, and now translating that to software-based platforms and juggling around with the same, has actually given me some level of proof and confidence. 

Besides experience, the fear of not wanting to handle things is what might hold you down. As far as I know, with the number of years that I've used Weka, the confidence part is the most important item I receive from the solution.

What is most valuable?

The features that I found most valuable are the classification features. They have a lot of information and a lot of intel. With classification, there's always a chance to split the data into two datasets. You can split one metadata into 92 datasets during that train or test, and the performance can easily be identified after you've trained a model. 

With clustering, if it's a yes, it's a yes, if it's a no, it's a no. It gives you a 100% level of accuracy of a model that has been trained, and that is in most cases, usually misleading. Classification is highly valuable when done as opposed to clustering.

What needs improvement?

If you were to open the software, there's a section written filter. Then you'd choose your filtering. The filter section lacks some specific transformation tools. If you want to change a variable from a numeric variable to a categorical variable, you don't have a feature that can enable you to change a variable from a numeric variable to a categorical variable. This needs to be improved. 

Also, when you go to classification, there are some cases in which, under any employed data, under the classification section that you can not actually use tests data alone or trend data alone. Under classification and clustering as well, they should give options to only supply when you're making classification or performing classification on a dataset, then there needs to be an option to either use at trend data first, and then you supply a test data later on.

If they went full open-source, like Python and R, it would help the growth of the solution.

For how long have I used the solution?

I used Weka in my undergraduate studies. I've majored in mathematics for the past several years.

I had used the solution for three years, as I started using Weka when I was in the second year of school for daily tasks. Then I used it for two years for my masters. I didn't use it for my Ph.D. program as I was researching how to use Python and integrate it. During my studies, I used it for five years and in the field, in the actual field of employment, for contracts and employment, I've also employed it for another two years, so that makes it a total of seven years that I've used the solution.

What do I think about the stability of the solution?

The stability depends on the predictor variables and the independent variables that have been employed. You have to find predictor variables that are a true representation of the response variable. If the predictor variables are a true presentation of the response variables or variable, then you definitely have higher percentages, which is a true reflection of the classification algorithm and performance element. 

If you've selected the most accurate predictor variable or independent variable, then there will be a highly stable solution. However, if you've selected a predictor variable that does not accurately with the response variable, then the stability of the solution will not be very accurate.

The reasons as to why an individual might be loyal to the services of a telecommunication company or might opt for another telecommunication company are the things that revolve around the subscription rate fee, for example. And the speed of offering services. We look at things that are in relation to the company itself, and the financial relations of individuals that are subscribed to services of the company. However, when you look at different things like age and gender those are not very important. When you choose the best variable, you will definitely get a highly stable solution.

What do I think about the scalability of the solution?

I've never tackled very, very big datasets with this solution, in the way I've tackled them with other data science software. However, from what I know, it is that it can highly scalable, and can perfectly handle very big datasets without any complications.

A very big dataset is a dataset that has, for example, more than 100,000 rows, or rows that run into the millions.

How are customer service and technical support?

The technical support is highly responsive. The university developed a 360-degree problem-solving platform. You find it entirely manageable, and interactive. Alongside that, if by any chance you're stuck at any point, then someone will get back to you via email or live chat. They're, highly interactive, and of course, they're there. It's why this is a very, very good platform.

Which solution did I use previously and why did I switch?

I have experience with solutions like R, Bacillus, and Python for data science. Python, in itself, has the best visualization ever. You can clearly see graphs nicely floated, normal distributions perfectly done, key distribution perfectly done, perfectly elaborated and perfectly labeled. 

This is different from Weka, as, when you want to visualize everything all at once, you'll get a tiny graph. With Python, you can visualize all the graphs, and it doesn't matter whether they have picked your 100. All you have to do is change the scale of the graph, and then you will have a longer chart, but with highly defined graphs. If you want to visualize one particular graph, then the visualization will also be clear. 

A pro on the side of Weka is that you do not need to have programming skills. With Weka, you just point as you grow, as you change, as you drag, as you drop, and as you click and you just run, and things show up. If you are a data scientist or you are a data analyst and you don't have enough coding tiers, then Weka is the right tool to use. But if you are good at coding, then you can go to Python.

How was the initial setup?

The process is straightforward. They did excellent work in easing up the processes of actually installing Weka. They also have tutorials on different platforms that make it easier for one individual to make references, go back and forth, and clear errors. 

The data and the specifications that I've set up, when running any machine learning algorithm using Weka, are easy. For instance, let's say, I have a dataset with 400 rows, and then I have another dataset with maybe 50,000 rows. If I am to classify a specific variable which has yes or no entry, and the one with 400 rows, when I'm using folds, it will be classified faster. Then, with the one with the 50,000 rows, it will be classified slower as we have to do 10 fold for maybe 50,000 rows. It really depends on the size of the dataset how long it takes.

What's my experience with pricing, setup cost, and licensing?

I use both the paid and the open-source versions of the product. If you're a client and you don't want very many details incorporated in your solution, then we will go full open-source. Open source doesn't have very many solution alignment incorporations. However, the paid version has very many options and stuff that needs to be incorporated when providing a solution. It depends on the specifications of a client which we would use. It's not about the price.

What other advice do I have?

The solution is a desktop application. I did not deploy it on the cloud, actually. It's an application that is on my desktop, on my laptop.

If they want their task done faster, and they do not have enough coding expertise, this is definitely an excellent solution to choose from. If they want additional experience because Python and R might be a good option. With Weka, it looks like you're using maybe something like a Microsoft power BI. With Python or R you're actually giving a data scientist a run for his money as things change every day and things evolve and you have to dig deeper, you have to provide new stuff. 

Overall, I'd rate the solution nine out of ten. It's tied with R in terms of how I would rate it. However, I find Python the best.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Solution Architect / Data Scientist (upwork) at Freelancer
Real User
Top 20
Has a good machine algorithm for clustering systems but is lacking a few newer algorithms
Pros and Cons
  • "I like the machine algorithm for clustering systems. Weka has larger capabilities. There are multiple algorithms that can be used for clustering. It depends upon the user requirements. For clustering, I've used DBSCAN, whereas for supervised learning, I've used AVM and RFT."
  • "I believe is there are a few newer algorithms that are not present in the Weka libraries. Whereas, for example, if I want to have a solution that involves deep learning, so I don't think that Weka has that capability. So in that case I have to use Python for ... predict any algorithms based on deep learning."

What is our primary use case?

Weka is a machine learning tool where we can use supervised and unsupervised learning tools to detect anomalies, for clustering, or classification algorithm.

The deployment method depends on the business's requirements. When I worked at the Air Force, it was all cloud. I deployed it on the cloud but that was treated as on-premise because that is confined within the Air Force. It depends upon the requirement of the user. If they want it on-premise, I can provide that. If they want it to be hosted on AWS or any other cloud services, that can also be done.

How has it helped my organization?

Our customers wanted a scale-based query to generate anomalies based on the data. We had a good experience when there is a small dataset or there is a known set of attributes. If you have at least a definition of the differences between attributes, then you can use the SQL, whereas in machine learning it is quite different. You don't have a case, it a kind of fuzzy logic being used to detect anomalies.

When they were using SQL they were getting they had quality data. We used Weka for a learning period, meaning how much data we have used to train and model to generate a condition. It was generating thousands of anomalies and those were not correct, because the attributes they were using and the SQL can be used with that difference between attributes at least.

When I used Weka for processing, I used these kinds of algorithms and it was very clear when I tested that output of the string algorithm using different techniques. I ran another Java program to check whether these anomalies are being properly predicted or not. So there I found that Weka is quite helpful compared to other programming techniques or the SQL-based solutions.

What is most valuable?

I like the machine algorithm for clustering systems. Weka has larger capabilities. There are multiple algorithms that can be used for clustering. It depends upon the user requirements. For clustering, I've used DBSCAN, whereas, for supervised learning, I've used AVM and RFT. 

Weka is useful for analyzing any data set you want to analyze or if you want to run algorithms of small data sets. When it comes to the enterprise solution, you can use Weka libraries or at least this algorithm that is very available in the Weka libraries. In Java, I can manipulate all these algorithms and the libraries of Weka to produce the desired result for a customer.

What needs improvement?

I believe there are a few newer algorithms that are not present in the Weka libraries. If I want to have a solution that involves deep learning, I don't think that Weka has that capability. In that case, I have to use Python to predict any algorithms based on deep learning.

What do I think about the stability of the solution?

Weka is a stable solution. It has been working well for the past two years. I spoke to a few of my work colleagues. Even a 40-year-old was built over on PowerPoint Weka frequencies still works well. So Weka is definitely a stable solution.

What do I think about the scalability of the solution?

Weka is not horizontally scalable. If I had to run a large dataset over Weka I would have to have a very large usage. If I add another node into Weka and I want to have a cluster environment for Weka, it will not work. If I have data from various sources and it's a large amount of data, if it's possible to speed into various parts, and I can view this data in two different machines I can install Weka into four machines and then I program and move this data into four machines.

In that way, Weka can be horizontally scalable, but as a solution, it is not horizontally scalable. It is vertically scalable.

Weka doesn't require maintenance. Once the solution is left and it is deployed nobody is required to maintain it. Weka is quite stable, it doesn't cause any problems. If you want to deploy this in your enterprise, they help to properly implement those profits. Once it is properly implemented no maintenance is required.

How are customer service and technical support?

I have never used their technical support. 

Which solution did I use previously and why did I switch?

Python is quite a hostile solution. If I get data it may not be in the format I request to run an analysis. Python is quite handy and it is easier than Weka to implement.

Weka provides a UI. If a person is very new to machinery or if somebody wants to run an analysis, Weka requires minimal programming but you need to have the knowledge of artificial learning. If somebody doesn't know it, they can't implement it. 

How was the initial setup?

The initial setup was very straightforward. I have been doing Java programming for the last 20 years. Java is quite easy for me. It is written in Java and it is open-source. All courses are available in the first course of the Weka library.

When I tried to implement a Weka solution along with Java for any customer, it is quite straightforward because I just need to put a dependency of their JAR file inside the project and then I can use all their function and capabilities that are provided by Weka. That can be applied very well. There is good documentation of that and there are examples of the processes where Weka's features could be implemented. It is quite easy to use.

The amount of time it takes to deploy depends on the requirements. For performance, it took me only a day, meaning eight hours of work, and I could provide a solution for the Weka part only. For the UI and for other things, that is different. 

Hardware took quite some time because the data was too large. Weka is not capable of handling a large amount of data. They wanted the solution to be Java and we didn't have any other libraries to do that. So I split out that data into the smallest chunks and then I ran these algorithms on that smallest data set. I combined that data and then manually produced the results. In that case, it took around six months to provide them a solution. It can take a day and then it can take up to six months.

Implementing the algorithm doesn't take much of your time. What takes time is how much data a customer has and how clean the data is. In terms of performance, it was quite a good data set. Every field of their attributes was available. There was a feature called collation-based features and I used that and it collated the results within a few minutes. Based on that, I implemented KLN on that. It is quite dependent on the data set the customer provides, how clean the data is, and what the output they want out of that data set is.

What was our ROI?

I think Weka is definitely a good investment, that is why we still use it. It has performance analytics as well so I think it is a better solution than others.

What other advice do I have?

Weka is pretty comprehensive and easy to use.

This is the first time that I used machine learning. I have a master's in technology. I analyze small data to get insights into algorithms. I learned a lot from all the files, then I implemented those into a Dell program.

It has many features that are not available and there is not much development since it is open source. It should be developed faster. I would rate Weka a six out of ten for these reasons. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Learn what your peers think about Weka. Get advice and tips from experienced pros sharing their opinions. Updated: January 2022.
563,208 professionals have used our research since 2012.
Data Scientist - Upwork at Freelancer
Real User
Top 10Leaderboard
Straightforward and easy-to-use, but not as easy-to-use as other solutions
Pros and Cons
  • "Working with complicated algorithms in huge datasets is really easy in Weka."
  • "Within the basic Weka tool, I don't see many tools that are available where we can analyze and visualize the data that well."

What is our primary use case?

I work a lot with university students.

One of the latest projects I did was related to a classification problem. I had to use different algorithms such as neural networks, Support Vector Machines, nearest neighbor algorithm, decision trees — those types of different algorithms in order to do the machine learning parts. 

I can't remember the exact data set that I recently worked with, but when it comes to machine learning and data mining, I have worked with different data sets. I use many algorithms in Weka in order to train and test those data sets.

How has it helped my organization?

In one circumstance, a client of mine wanted to cluster their data into different classes in order to identify their different values. I used the given data set that I've mainly preprocessed using Weka, then I was able to identify valuable clusters for themselves. The clustering was very useful for them; I could identify the different features and the traits of those clusters and communicate my results to the customer. It was very useful to them.

What needs improvement?

More accurate documentation should be published by the Weka company — that would be really helpful. When it comes to data visualization, I think there are lots of ways in which the data could be visualized, like pie charts. There are many more, but within the basic Weka tool, I don't see many tools that are available where we can analyze and visualize the data that well. If they could improve that area, I think it would be really good. They should focus more on data visualization, that would be really great as I have experienced many issues relating to this.

For how long have I used the solution?

I learned Weka during my MSC, around two years back. From time to time, I used it for different projects, data visualization, machine learning, and using different algorithms through Weka. That's an experience that I have gained. Actually, many of the projects that I have done have been through Upwork.

What do I think about the stability of the solution?

Stability-wise, it's good. The main issue that I have is related to the output. If everything could be more dynamic, and if the visualization, the final output, was better, then we would be able to gain a lot more from Weka — It would be more powerful like Python and other languages, as well. As a tool, it would be great. It's a stable environment, but I think proper documentation, if available, is needed; that would be great. 

What do I think about the scalability of the solution?

When it comes to capacity, I'm not too experienced with handling large numbers of data in Weka, so I can't really comment on the scalability.

How are customer service and technical support?

The technical support isn't that great. On a scale from one to ten, I would give their support a rating of five to six.

I have very little experience when it comes to requesting support with Weka's official site. The support has been good, but it hasn't been quick — it takes some time. Generally speaking, with platforms such as Stack Overflow, the customer service is not that great.

Which solution did I use previously and why did I switch?

Currently, I am also using Tableau, SPSS, Python, TensorFlow, and a couple of other machine-learning platforms. 

Compared to Weka, there are thousands and thousands of materials available in Python and R Programming. Their support teams are great and if you have questions, you'll get answers very quickly. Python is compatible with many other platforms as well, for example, you can use TensorFlow. You can go very deep into neural networks and everything can be implemented in programming languages, such as Python and R.

When it comes to Weka, I have not seen very deep neural networks — that kind of stuff is very complicated. It can be done, but it's very complicated. It's much easier with Python. That is one of the main differences that I've seen. I feel like Python is more popular than R Programming, but either way, we have the ability to do the same stuff with both programming languages. Overall, I feel like Python is easier to work with.

How was the initial setup?

Installing Weka is not that hard, it's really easy. Loading the data set into the Weka tool, and analyzing it is a bit tricky in the beginning, but when you're used to it, it's not too hard. We can easily use different classification algorithms, and we can train the data sets using those classification algorithms and save them. Then, we can easily use those models to test the data sets again. So, it's not that hard, it's easy. That's something good that I have experienced in Weka; setting up is also really easy, it's not hard at all.

Overall, it takes roughly 15 minutes to set up this solution.

Sometimes it can be a bit hard to identify the proper documentation packages to install into Weka. If that could be improved, it would be really great.

What about the implementation team?

Typically, I have my own implementation strategy that I follow; however, I would like more experience in this area.

I am looking forward to learning more about deploying these big concepts in cloud environments — enterprise applications as well. I haven't had the chance to do that yet but I am looking forward to getting into deeper areas related to Weka.

What's my experience with pricing, setup cost, and licensing?

Currently, I am using an open-source version so I don't know much about the price of this solution.

What other advice do I have?

The basic configuration is very easy. Compared to writing code in Jupyter Notebook, it's really easy to handle and work with very complicated algorithms in Weka. There are some steps that are not very simple, but overall, it's very easy. It's easy to load data and implement different algorithms with Weka. From my experiences so far, that's the basic advantage with Weka — it's easy to use, easy to handle, and once you learn it, it's not that hard to work with.

Working with complicated algorithms in huge datasets is really easy in Weka. Training datasets is equally easy and it's quite speedy as well — the same goes implementation-wise. Without writing immeasurable amounts of code, we can quickly perform machine learning using Weka. That's the main advantage of Weka.

Overall, on a scale from one to ten, I would give Weka a rating of six.

If they improved the visualization issues, the documentation issues, and the implementation capabilities, I would give them a higher rating. According to my knowledge, there are not any boundaries when it comes to machine learning. The possibilities are endless, it's really big.  

It would be really helpful if pre-process data sets were used in machine learning as well — If more data visualization options and pre-processing options were supported. That's something very basic that we need when doing machine learning. If that could be improved, that would be really great. And if more documentation was available, again, that would be great. You can find specific knowledge on YouTube, but you can't go much further than that because the resources are just not available. These are the reasons why I am giving it a six. 

With Python and R, you can do anything — you have that confidence, but with Weka, I don't have that confidence.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Freelance Data Scientist at Freelancer
Real User
Top 20
Can plug in any machine learning algorithm and it works perfectly but needs better visualization
Pros and Cons
  • "Weka is a very nice tool, it needs very small requirements. If I want to implement something in Python, I need a lot of memory and space but Weka is very lightweight. Anyone can implement any kind of algorithm, and we can show the results immediately to the client using the one-page feature. The client always wants to know the story. They want the result."
  • "If you have one missing value in your dataset and this missing value belongs to a specific attribute and the attribute is a numeric attribute and there is only one missing data, whenever you import this data, the problem is that Weka cannot understand that this is a numeric field. It converts everything into a string, and there is no way to convert the string into numerical math. It's really very complicated."

What is our primary use case?

My domain is pure data analysis and data science machine learning. The first time I used Weka, five years back, I did a research project. I prefer to work with Weka whenever I have small and clear projects.

Weka is a very nice tool and it helped me to solve any machine learning problem in one minute. In case of machine learning algorithms, classification, or support machines, I used to use this tool to implement those algorithms. 

Whenever I get any work on any other platform suppose in hours. So what I initially do, I ran the data set in the Weka platform first. It gives me a clear view that this data set has certain attributes and offers some observations. I can implement different machine learning algorithms if this is a classification.

I use two or three algorithms. If we find that the performance of the logistic regression is good then I can implement those in other platforms also.  Weka is a good tool for any analysis. 

There are some missing values there. We can replace the missing values using the mean values. I use that filter to see which names were replaced. It's in the filter, then we have to go to that unsupervised, then replace missing values.

I use that filter to replace missing data. Weka has the option to check important attributes. I use that internally, I found that everything is important. Then initially I applied my dataset to implement the classification problem. 

There is less demand for projects that require Weka as opposed to R or Python.

How has it helped my organization?

Weka is a very nice tool, it needs very small requirements. If I want to implement something in Python, I need a lot of memory and space but Weka is very lightweight. Anyone can implement any kind of algorithm, and we can show the results immediately to the client using the one-page feature. The client always wants to know the story. They want the result. 

Accuracy is 90% and we can show the predicted results. What the predicted error is and what the true positive rate is. This is what the user wants. They don't want to know how complex your queries are. They don't bother about it.

It saves time for the developer and provides a summarized view without unnecessary information for the customer. 

What is most valuable?

Performance is one of the most valuable features. I can plug in any machine learning algorithm and it works perfectly. It is very easy to use the filter. Weka has a good number of filter options. If I compare it with other platforms, I can just use a filter with Weka and apply it. If I want to just convert the numeric data into categorical data, is only a one minute job. We just use a filter and it works. In other platforms, I need to write at least four or five lines of code and I have to check the data.

I'm very comfortable with Weka, for these kinds of things, especially their filter and their Classification Algorithm. They have a good number of algorithms. If I want to do this kind of thing with Python, it will take 20 lines of code minimum.

What needs improvement?

If you have one missing value in your dataset and this missing value belongs to a specific attribute and the attribute is a numeric attribute and there is only one missing data, whenever you import this data, the problem is that Weka cannot understand that this is a numeric field. It converts everything into a string, and there is no way to convert the string into numerical math. It's really very complicated. You will be lucky enough if you get clean data. Every time we get this kind of data with missing values, if we try to understand how many missing datasets there are if it is very less, we just remove this from the dataset itself before importing that. 

There is no use of algorithm pipelines. In Python, we create a pipeline. First, we use that kind of clustering algorithm, suppose K means clustering, based on that specific cluster, we can choose one cluster. And based on that cluster, we can implement an algorithm. This pipeline is missing in Weka. 

There is also a problem with the visualization. It only can do only two or three types of visualizations.

What do I think about the stability of the solution?

The solution is stable. There are no problems. Weka is perfect, there is no problem with Weka. 

What do I think about the scalability of the solution?

They have a good set of machine learning algorithms. There aren't any problems with building algorithms.  

The best thing is that it always gives the result in one output page, it contains which algorithm I have implemented, the accuracy of the model, the matrix, and the output of that specific classifier. It gives everything. As a single machine learning tool, Weka is a very good tool.

How was the initial setup?

The initial setup is very easy. If we get a medium set of data and it's clear data, and if we get the tiered instruction, then Weka is the best tool. For simple analysis, I use Weka to first implement different kinds of models in Weka. If logistic regression gives me good prediction and accuracy, I implement this in R or Python. I use Weka as my first tool.

The time it takes to implement depends on the data set. If I know the data inspection, data cleaning, data implementation model, choose the best model, get the accuracy, it will take around a minimum of two hours.

What other advice do I have?

Weka is a very simple tool and it has built-in algorithms, which we need do not to implement. It gives concise results that we can display to our clients. Weka is also a very useful tool for filtering. There are a set of built-in filters that we can use to filter our data.

If you want to take a sample set of data, suppose a specific percentage of data, or if we want to convert a specific data type to another data type, Weka has good filtering features. We can also use cluster and association rules in Weka. These are the advantages of Weka. If I compared this with R and Python, both can do things better than Weka. There is no doubt. But it is not easy to implement algorithms in R and Python, you need at least 20 lines of code and you need a specific setup. You need a specific setup. You need to import the data set. You need to use a different kind of package. With respect to Weka, those are a bit complex. Weka cannot use its visualization power. 

I would rate Weka a six out of ten. The visualization and statistical analysis need improvement.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
ChristianRivera
Freelance Engineer at Autonomo
Real User
Top 5Leaderboard
You can standardize data in an easier way but it should work with big data
Pros and Cons
  • "There are many options where you can fill all of the data pre-processing options that you can implement when you're importing the data. You can also normalize the data and standardize it in an easier way."
  • "The product is good, but I would like it to work with big data. I know it has a Spark integration they could use to do analysis in clusters, but it's not so clear how to use it."

What is our primary use case?

I used Weka for my Master's thesis. I've used it a couple of times for my personal usage or a quick analysis or graph. You can do a reselection quicker and you can get the graph and put it in our report and do classification. If any project is present, I could develop it.

How has it helped my organization?

For my Master's thesis, I could do a quick analysis. We have a huge amount of data. It was not big data, but it was many Giga. I had to work with it in batches, but it helped me to apply prediction models to do some hypotheses with the data to analyze the reason why some people were migrating, depopulating the rural towns, and going to the peripheral grounds around the big cities. In this case, I used many tools like Access, Weka, and some other GIS (Geographic Information Services) for mapping. But Weka helped me with this analysis in an easy way. I could select and apply models. It helped me to do it faster.

What is most valuable?

There are many options where you can fill all of the data pre-processing options that you can implement when you're importing the data. You can also normalize the data and standardize it in an easier way. You have to do it in Python and you have to write some lines but in this case, it's in data pre-processing. For the model evaluation, you can build models for classification. 

What needs improvement?

The product is good, but I would like it to work with big data. I know it has a Spark integration they could use to do analysis in clusters, but it's not so clear how to use it. In this case, it would be more how to handle big amounts of data. My project in my thesis was not so big. It was not 100 Gigabytes, but for sure these tools could be really useful. They should integrate it in a better way with Spark and have better cluster processing.

For how long have I used the solution?

I have been using Weka for one year. 

What do I think about the stability of the solution?

It crashes sometimes, but in this case, I think the product is good. You have to have a good computer to do a better analysis. I think it's stable but I think it can grow. It's good, but it's not the past product they have used for data mining, but it can give good results. You have to have patience, you have to take care of your memory, and take care of your CPU resources. It could improve its performance but it is stable.

What do I think about the scalability of the solution?

I have not exported Python and then used it in a Kubernetes application. In that way, scalability is not so good. It should improve. With the amount of data it can process without crashing, it has to work better with other plugins or integration with other tools. The scalability is not so good.

How are customer service and technical support?

We don't use their official support but we use their forums. There's a lack of information for specific points, but in general, I could find their answer to 80% of my questions.

Which solution did I use previously and why did I switch?

I have previously used Knime and Orange. Knime has better reviews. Weka has been used but it did not add too many features in the past few years. I think Weka and Orange are a bit stuck. Knime has grown a lot by adding more plugins, adding more capabilities for machine learning, and adding more algorithms. 

Knime is more spread out and has better functionalities. 

How was the initial setup?

The initial setup was straightforward. It is very intuitive and the tool is easy to use. There is some previous knowledge you have to have because you have to know some of the parameters are and what the influence on the final model is.

What other advice do I have?

Weka is good to start in data mining. The base had to be clear with base concepts about the models or algorithms you are going to use. You want to test or do some research first. But for production, it's not the best option. It would be a good tool for prototyping. Knime is the best tool for data mining. 

Weka is good for structured table data. You can use many supervised or unsupervised algorithms, but it's very difficult to get interpretable results about the multilayer option it has. It's not so easy to understand the neural networks if you work with Weka. It would be better to work with unsupervised algorithms like tree-based or clustering algorithms, but not for neural networks. There are other tools that can be more useful.

I would rate Weka a seven out of ten. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Data Science at Freelancer on UpWork
Real User
Top 10Leaderboard
An excellent tool for data classification and clustering
Pros and Cons
  • "The path of machine learning in classification and clustering is useful. The GUI can get you results. No programming is needed. No need to write down your script first or send to your model or input your data."
  • "If there are a lot more lines of code, then we should use another language."

What is our primary use case?

I have only used Weka for classification and clustering. I have also used classification with embossing.

What is most valuable?

The path of machine learning in classification and clustering is useful. The GUI can get you results. No programming is needed. There is no need to write down your script first or send to your model or input your data. 

Weka classification is very valuable, it gives results, like prediction results. Weka is a little bit better than other tools I have expertise on. Weka is just much better for the classification path and clustering path. 

If you are going with some predictions that a procedure recalls, it's better than any other tool like R Programming and Python. In machine learning, like deep learning, if the network works, I can run it with the console buttons. 

What needs improvement?

I think there is a little bit of space for improvement.

For how long have I used the solution?

I've been using Weka for five years.

What do I think about the stability of the solution?

I'm not sure if it's reliable. It's a little difficult to get results, especially if you are on some other programs like Tableau.

How was the initial setup?

There is no complexity in the setup. It took a total of 10 minutes to set it up.

What's my experience with pricing, setup cost, and licensing?

I like how the classification and prediction work. We should use Weka because the path is very big and much better. If there are a lot more lines of code, then we should use another language.

Which other solutions did I evaluate?

I enjoy using Weka most of the time for machine learning and development. We only perform a task from data mining, classification, and collecting on top of Weka. TIBCO Jaspersoft is only for masterwork and analysis and visualization.

What other advice do I have?

I would give Weka a nine out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Xavier Suriol
Freelancer at Freelancer
Real User
Top 5Leaderboard
Very easy to implement with great regression trees and association rules.
Pros and Cons
  • "I mainly use this solution for the regression tree, and for its association rules. I run these two methodologies for Weka."
  • "Not particularly user friendly."

What is our primary use case?

I mainly use this solution for regression trees, and for association rules. Also, some descriptive statistics because they are very easy. 

What is most valuable?

It is quicker than using languages like R and Python. Wizards make the job more "comfortable" than syntax in languages.

What needs improvement?

Help documentation could be more user friendly. For instance, all ordinary manuals in R follow the same structure, with examples ready to be run and many times with the interpretation of the outputs. For some packages, R has the so-called “Vignettes”, with plenty of explanations and pictures, like in a book. I don’t think Weka has such examples. In Weka packages, documentation is not so “uniform”, not the same structure, as written by different (free style) authors.


For how long have I used the solution?

I've been using this solution for the last five or six years. 

What do I think about the stability of the solution?

For my requirements, the stability is fine. 

How are customer service and technical support?

There is support but I have never used it. 

How was the initial setup?

The initial setup was really easy, simpler than a solution such as R language. The installation and launch for the linearization and regression tree were very quick. But to get to the last corner was problematic and might have taken a month or more. 

What other advice do I have?

My main recommendation is that if you want artificial intelligence, or machine learning, go for an easy and quick tool like Weka, otherwise, any language will have a more expensive entry cost. 

I would rate this solution an eight out of 10. 

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Weka Report and get advice and tips from experienced pros sharing their opinions.