We just raised a $30M Series A: Read our story

Databricks OverviewUNIXBusinessApplication

Databricks is #1 ranked solution in Streaming Analytics tools and #2 ranked solution in top Data Science Platforms. IT Central Station users give Databricks an average rating of 8 out of 10. Databricks is most commonly compared to Microsoft Azure Machine Learning Studio:Databricks vs Microsoft Azure Machine Learning Studio. Databricks is popular among the large enterprise segment, accounting for 57% of users researching this solution on IT Central Station. The top industry researching this solution are professionals from a computer software company, accounting for 27% of all views.
What is Databricks?

Databricks creates a Unified Analytics Platform that accelerates innovation by unifying data science, engineering, and business. It utilizes Apache Spark to help clients with cloud-based big data processing. It puts Spark on “autopilot” to significantly reduce operational complexity and management cost. The Databricks I/O module (DBIO) improves the read and write performance of Apache Spark in the cloud. An increase in productivity is ensured through Databricks’ collaborative workplace.

Databricks was previously known as Databricks Unified Analytics, Databricks Unified Analytics Platform, Redash.

Databricks Buyer's Guide

Download the Databricks Buyer's Guide including reviews and more. Updated: November 2021

Databricks Customers

Elsevier, MyFitnessPal, Sharethrough, Automatic Labs, Celtra, Radius Intelligence, Yesware

Databricks Video

Pricing Advice

What users are saying about Databricks pricing:
  • "Licensing on site I would counsel against, as on-site hardware issues tend to really delay and slow down delivery."
  • "I am based in South Africa, where it is expensive adapting to the cloud, and then there is the price for the tool itself."
  • "Whenever we want to find the actual costing, we have to send an email to Databricks, so having the information available on the internet would be helpful."
  • "The price is okay. It's competitive."

Databricks Reviews

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
VP
Data Scientist at a energy/utilities company with 10,001+ employees
Real User
Top 20
Has a good feature set but it needs samples and templates to help invite users to see results

Pros and Cons

  • "Imageflow is a visual tool that helps make it easier for business people to understand complex workflows."
  • "The product needs samples and templates to help invite users to see results and understand what the product can do."

What is our primary use case?

I am a data scientist here and that is my official role. I own the company. Our team is quite small at this point. We have around five people on the team and we are working with about five different businesses. The projects we get from them are massive undertakings. Each of us on the team takes multiple roles in our company and we use multiple tools to help best serve our clients. We are trying to look at creative ways that different solutions can be integrated and we try to understand what products we can use to create solutions for client companies that will be effective in meeting their needs.  

We are personally using Databricks for certain projects where we want to consider creating intelligent solutions. I have been working on Databricks as part of my role in this company, trying to see if there are any kind of standard products that we can use with it to create solutions. We know that Databricks integrates with Airflow, so that is something that we are exploring right now as a potential solution for enabling a creative response. We are exploring the cloud as an option. Databricks is available in Azure and we are currently figuring out the viability of using that as a cloud platform. So we are exploring the way Databricks and Azure integrate at the same time to give us this type of flexibility.  

What we use it for right now is more like asset management. If we have a lot of assets and we get a lot of real-time data, we certainly want to do some processing on some of this data, but you do not want to have to work on all of it in real-time. That is why we use Databricks. We push the data from Azure through Databricks and work on the data algorithm in Databricks and execute it from Azure with probably an RPA (Robotic Process Automation) or something of that sort. It intelligently offloads real-time processing.  

What is most valuable?

Of the available feature set, I like the Imageflow feature a lot. It is very interesting. It gives me clarity on the execution of a process. I can draw the complete flow from start to finish in the exact way that I want it to execute. It is more visual and it is also easier for the people in businesses where I make presentations to understand.  

When I demonstrate a process to a business and show them the approach I am taking using code and technical language, then of course not many are going to understand that. But when I show them the process in terms of the graphical layout Imageflow helps provide, then they will be able to understand it much easier. They understand why I am choosing a particular way of executing the process and why I am taking certain steps in the way I have chosen to do it. The point is to help other people understand the solution more clearly.  

What needs improvement?

I think the automatic categorization of variables needs to be improved. The current functionality is not always efficiently identifying the features of the data that is collected. Probably that is the only thing I can think of. Apart from that, I have not explored the product enough yet to go into more depth because there is only one asset project that I have taken on right now. Because I own this company, I have been doing more to run it than to explore this product very deeply. But when you get any form of data inside there, if it could understand what type of variables there are and what features the data has, it would help massively in taking processing to the next step. If it does not exactly identify the variables you may have to modify them a little. Apart from working with Databricks to understand its capabilities, I am also trying to learn Apache Spark right now. Some members of my team want to work with Apache Spark as a solution and at this point, we are evaluating both and we are planning to use Spark or Databricks.  

As far as what might be added, some custom algorithm samples would be useful. All of the other products of this type — Azure, AWS, SageMaker — they all have customizable algorithms. You have the capability to implement a sort of workflow from that by modifying things in the sample and changing it to fit your purposes. Probably that is something that might help in doing some small NDP (Near-Data Processing) development. It might not help in the project directly, but it will help while we work on some NDP development of our own so that we can quickly evaluate how something is going to work. Templates or other samples could make working on things easier.  

That would also help massively in getting people to understand the potential of what the product can actually do. But I also think not many people would strongly agree with this. Many people go to the first solution they can think of that they know very well already in the IT field even if they could imagine that something could be better.  

To get the value out of this technology, people will need to come to accept it. Technical people will accept Databricks more if they understand that this is something that they can use and start working on without a lot of experience. Adopting it will take time for new users who have no experience. But to feel like they can have success with a product, they have to execute something in a very short time and see how it can work. When you talk about AI — or really when you talk about anything new — people do not initially want to invest the time in discovery. These processes do take time to learn, but with templates or samples, you get to see immediately what the possibilities are and what you might get out of it. Then when they try something of their own and are able to get it working in less than a week's time, they will be encouraged to look into the product and the technology some more.  

For how long have I used the solution?

We have been using the Databricks product for approximately three months.  

What do I think about the stability of the solution?

It is very hard to comment on the stability right now. We will need more time to experience the product in actual usage to render any opinions about stability accurately at that level.  

What do I think about the scalability of the solution?

We have not really gotten to the point of scaling and testing scalability at this point. We only have two people involved with the product. One is a data scientist and one is a data engineer.  

How was the initial setup?

The initial setup was not complex at all. The documentation is good. It is clear and not very difficult to understand. Because the documentation is good, the installation is fine.  

We did the implementation by ourselves — within our team and with the help of the documentation. But I would not say that we have already deployed the model yet. This is an ongoing process, as there are certain inputs that changed over time.  

So we have not implemented the product completely, but we have gotten to advance with the product and our understanding of it. It is good, but our company is still trying to get much better data from it. At this point, it is like the data is just junk and more junk. So we are now working toward that goal of improving the result. Whenever the data result gets better, we'll try to implement the workflow to see how it performs. I would say it will probably take two to three months more before we actually get good data.  

Which other solutions did I evaluate?

I did have some experience with SageMaker before looking at Databricks, but apart from we have not been looking into any of the other solutions that are available. We were just exploring a few of the different solutions that the members of the team already have experience with. Most of the team came to our company with some experience using Azure, and most of them came with experience in EBS (Elastic Block Store) and some of them come with experience on various other platforms. We wanted to mine that knowledge and just explore some of these possibilities to see which one works with all of us as a team.  

What other advice do I have?

On a scale from one to ten where one is the worst and ten is the best, I would rate Databricks overall as around a 7 or 7.5. If we had more experience with it and could be sure we had a solid understanding of what it could do and the reliability, I might recommend it with a better score. I do not think I should give it more than a seven for now.  

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
MM
Lead Data Architect at a government with 1,001-5,000 employees
Real User
Top 10
Good integration with majority of data sources through Databricks Notebooks using Python, Scala, SQL, R.

Pros and Cons

  • "The initial setup is pretty easy."
  • "Overall it's a good product, however, it doesn't do well against any individual best-of-breed products."

What is our primary use case?

We used Databricks in AWS on top of s3 buckets as data lake. The primary use case was providing consistent, ACID compliant data sets with full history and time series, that could be used for analytics.

How has it helped my organization?

Databricks (delta lake) and the underlying files storage (data lake) is in the centre of the organisation's enterprise data hub. Most of our data is structured (csv files), have some semi-structured (json files) but we are beginning to ingest unstructured (pdf files) and use Natural Language Processing (Textract) to obtain insights driven by key words.  

What is most valuable?

The Databricks notebooks with SQL and Python provide good intuitive development environment. The Delta Lake, the reading of underlying file storage, the delta tables mounted on top of data lake (AWS in our case) are providing full ACID compliance, good connectivity and interoperability.  

The initial setup is fairly straightforward. The stability is good.

What needs improvement?

The product is quite ambitious. It's trying to become a centralized platform for all data ingestion, transformation, and analytics needs. It may encounter a stiff competition from best of breed solutions powered by open source software. 

Overall it's a good product, however, it might get challenged over time with with individual best-of-breed products. 

For example in the area of Data Science, RStudio seems to be the industry standard at the moment. RStudio IDE is richer, there are a more out of the box functionalities like a push-button publishing, etc. It's more difficult to run R within Databricks. Especially when it comes to synchronizing the R packages, it legs behind. It's not even supporting the latest version of R 1.3. I believe eventually all analytics will converge into data science. The analytics of the future will be data science, because predicting the future will be one of the most prevalent use cases. The stuff we used to do before, slicing and dicing, drilling through, trend analysis, etc. will become redundant operations after the analytics toolsets become powered by AI/ML and fully automated. Unless the organisations acquire these platforms that can cater for machine learning and artificial intelligence, including natural language processing they will have a hard time surviving.

With Databricks I would like to see more integration with and accommodation of  open-source products. This could be controversial, as it could question the whole configuration and the purpose of the product. I'm pretty sure Microsoft is trying to position it in a monopoly market as they did with Windows and MS Office so that they could charge the premium. We are beginning to see the similar product strategy behind Databricks. 

For how long have I used the solution?

I've been working with Databricks for about two years. 

What do I think about the stability of the solution?

From what I know and from what I've heard, talking to our data operations team, it is stable and it's quite powerful. 

What do I think about the scalability of the solution?

Obviously running on top of Spark, ensures fast processing and elasticity for coping with big data volumes, up to 2 petabytes. You can spin up the cluster very quickly, and shut it down. It's elastic.

How are customer service and technical support?

Excellent customer service from Databricks. Very proactive, constantly attuned to customer needs, even connected us with other customers for knowledge exchange. 

Which solution did I use previously and why did I switch?

I am an IT Consultant and in the past have used different solutions for ETL on top of databases, particularly if we are talking about data warehousing. However, in the last 6 years I have seen large client using a mixture of open source and proprietory technologies, like Informatica stack with data lake in AWS, or Kafka Confluence with MQ Series on top of mainframes and data lake in AWS, Databricks and Azure data lake, etc.

How was the initial setup?

It was pretty easy to set up. At least, that is my understanding. I'm not the data engineer though. I don't actually do installs and configurations. I explore features and build them in my architecture designs.

What about the implementation team?

We implemented Databricks through vendor, and the vendor was pretty good. 

What was our ROI?

Don't really know.

What's my experience with pricing, setup cost, and licensing?

I can't speak on pricing of the solution. It's not an aspect of the solution I deal with directly.

Which other solutions did I evaluate?

The options were Talend, EMC Isilon, native AWS services, and others.

What other advice do I have?

In the current capacity as and Architect and the end user of Databricks I would say I do have confidence that Databricks can provide a wealth of functionalities to start with. 

My advice to future adopters of Databricks would be to be careful about the overall architectural roadmap for this application, adopt a flexible, modular, microservices like architecture whose components could be replaced in the future should they deem inadequate to cater for evolving business needs. 

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: November 2021.
553,954 professionals have used our research since 2012.
RameshCh
Sr. BigData Architect at ITC Infotech
MSP
Top 5
Very elastic, easy to scale, and a straightforward setup

Pros and Cons

  • "It's easy to increase performance as required."
  • "Instead of relying on a massive instance, the solution should offer micro partition levels. They're working on it, however, they need to implement it to help the solution run more effectively."

What is our primary use case?

We work with clients in the insurance space mostly. Insurance companies need to process claims. Their claim systems run under Databricks, where we do multiple transformations of the data. 

What is most valuable?

The elasticity of the solution is excellent.

The storage, etc., can be scaled up quite easily when we need it to.

It's easy to increase performance as required.

The solution runs on Spark very well.

What needs improvement?

Instead of relying on a massive instance, the solution should offer micro partition levels. They're working on it, however, they need to implement it to help the solution run more effectively.

They're currently coming out with a new feature, which is Date Lake. It will come with a new layer of data compliance.

For how long have I used the solution?

We've been using the solution for two years.

What do I think about the stability of the solution?

I don't see any issues with stability going down to the cluster. It would certainly be fine if it's maintained. It's highly available even if things are dropped. It will still be up and running. I would describe it as very reliable. We don't have issues with crashing. There aren't bugs and glitches that affect the way it works.

What do I think about the scalability of the solution?

The system is extremely scalable. It's one of its greatest features and a big selling point. If a company needs to scale or expand, they can do so very easily.

We require daily usage from the solution even though we don't directly work with Databricks on a day to day basis. Due to the fact that we schedule everything we need and it will trigger work that needs to be done, it's used often. Do you need to log into the database console every day? No. You just need to configure it one time and that's it. Then it will deliver everything needed in the time required.

How are customer service and technical support?

We use Microsoft support, so we are enterprise customers for them. We raise a service request for Databricks, however, we use Microsoft. Overall, we've been satisfied with the support we've been given. They're responsive to our needs.

Which solution did I use previously and why did I switch?

We work with multiple clients and this solution is just one of the examples of products we work with. We use several others as well, depending on the client.

It's all wrappers between the same underlying systems. For example, Spark. It's all open-source. We've worked with them as well as the wrappers around it, whether the company was labeled Databrary, IBM insights, Cloudera, etc. These wrappers are all on the same open-source system.

If we with Azure data, we take over Databricks. Otherwise, we have to create a VM separately. Those things are not needed because Azure is already providing those things for us.

How was the initial setup?

The situation may have been a bit different for me than for many users or organizations. I've been in this industry for more than 15 or 17 years. I have a lot of experience. I also took the time to do some research and preparation for the setup. It was straightforward for me.

The deployment with Microsoft usually can be done in 20 minutes. However, it can take 40 to 45 minutes to complete. An organization only requires one person to upload the data and have complete access to the account.

What about the implementation team?

I deployed the solution myself. I didn't require any assistance, so I didn't enlist any resellers or consultants to help with the process.

What's my experience with pricing, setup cost, and licensing?

The solution is expensive. It's not like a lot of competitors, which are open-source.

What other advice do I have?

There isn't really a version, per se. 

It's a popular service. I'd recommend the solution. The solution is cloud-agnostic right now, so it really can go into any cloud. It's the users who will be leveraging installed environments that can have these services, no matter if they are using Azure or Ubiquiti, or other systems.

I don't think you can find any other tool or any other service that is faster them Databricks. I don't see that right now. It's your best option.

Overall, I'd rate the solution eight out of ten. The reason I'm not giving it full marks is that it's expensive compared to open source alternatives. Also, the configuration is difficult, so sometimes you need to spend a couple of hours to get it right.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
TB
Data Scientist at iOCO
Real User
Top 20
Good built-in optimization, easy to use with a great user interface

Pros and Cons

  • "The built-in optimization recommendations halved the speed of queries and allowed us to reach decision points and deliver insights very quickly."
  • "The product could be improved by offering an expansion of their visualization capabilities, which currently assists in development in their notebook environment."

What is our primary use case?

We are using this solution to run large analytics queries and prepare datasets for SparkML and ML using PySpark.

We ran on multiple clusters set up for a minimum of three and a maximum of nine nodes having 16GB RAM each.

For one ad hoc requirement, a 32-node cluster was required.

Databricks clusters were set for autoscaling and to time out after forty minutes of inactivity. Multiple users attached their notebooks to a cluster. When some workloads required different libraries, a dedicated cluster was spun up for that user.

How has it helped my organization?

Databricks took care of all the underlying cluster management seamlessly. We could configure our clusters to run and deliver results without any delays due to hardware configuration or installation issues.

Databricks allowed us to go from non-existent insights (because the datasets were just too large) to immediate and rich insights once the datasets were ingested into our PySpark notebooks.

What is most valuable?

Immense ease in running very large scale analytics, with a convenient and slick UI. This saved us from having to tweak, tune, dive into deeper abstractions, get involved in procurement, and also having to wait for other workloads to run.

The built-in optimization recommendations halved the speed of queries and allowed us to reach decision points and deliver insights very quickly. 

The Delta data format proved excellent. Databricks had already done the heavy lifting and optimized the format for large scale interactive querying. They saved us a lot of time.

What needs improvement?

The product could be improved by offering an expansion of their visualization capabilities, which currently assists in development in their notebook environment. Perhaps a few connectors that auto-deploy to a reporting server?

More parallelized Machine Learning libraries would be excellent for predictive analytics algorithms.

For how long have I used the solution?

I have been using this solution for three years.

What do I think about the stability of the solution?

This solution is stable and proved very robust. When very obvious programmatic recommendations were not followed, causing memory overruns on a driver, the clusters required restarting.

What do I think about the scalability of the solution?

Absolutely, seamlessly, and massively scalable, within only budgetary limits. Also, the product itself offers real-time efficiency and optimization recommendations. 

How are customer service and technical support?

So brilliant, it was never required. Their documentation is comprehensive, clear, simple, and thorough. 

Which solution did I use previously and why did I switch?

Previously I used Hive and Livy in Zeppelin on an in-house Hadoop installation. The queries constantly threw exceptions and timeouts and the necessary configuration changes proved time-consuming and problematic. Databricks, on the other hand, simply made all those problems vanish. 

How was the initial setup?

Setup and Support are single-click.

What about the implementation team?

We used an in-house team for implementation.

What was our ROI?

Our ROI was of the order of USD $75k per year for one deployment. We were able to switch our workloads from an onsite Hadoop cluster, billed to our department for more than USD $100k per year, to a Databricks workspace in Azure for a quarter of that expenditure. 

Further, we were able to transparently and efficiently scale our queries to run under fifteen minutes per major analytics use case, while being subject to unstable queries and highly brittle data flow use cases from the in-house Hadoop cluster.

We are further reducing spending on our traditional RDBMS solution by offloading reporting workloads to the Databricks PySpark notebooks, which is reducing our expensive datacenter resources and freeing up RDBMS resources for OLTP loads. 

What's my experience with pricing, setup cost, and licensing?

Set up a cluster in your cloud of choice, but Databricks' service might also be very competitive as their pricing units will be built in. 

Licensing on site I would counsel against, as on-site hardware issues tend to really delay and slow down delivery.

Which other solutions did I evaluate?

I evaluated Hortonworks, Livy, and Zeppelin. These were unsuitable due to the unavailability of sufficiently skilled personnel.

What other advice do I have?

By investing in people skilled in data querying, Python coding, and even basic Data Science, a Databricks setup will reward the business. 

Once the Databricks data flows are established, it is a matter of a few incremental steps to opening up streaming and running up-to-the-minute queries, allowing the business to build its data-driven processes. 

Databricks continues to advance the state-of-the-art and will be my go-to choice for mission-critical PySpark and ML workflows. 

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Disclosure: I am a real user, and this review is based on my own experience and opinions.
SN
Head of Data & Analytics at a tech services company with 11-50 employees
Real User
Top 5
Helpful integration with Python and notebooks, but it should be more user-friendly and less complicated to use

Pros and Cons

  • "The integration with Python and the notebooks really helps."
  • "Databricks is not geared towards the end-user, but rather it is for data engineers or data scientists."

What is our primary use case?

We are a consulting house and we employ solutions based on our customers' needs. We don't generally use products internally.

I am a certified data engineer from Microsoft and have worked on the Azure platform, which is why I have experience with Databricks. Now that Microsoft has launched Synapse, I think that there will be more use cases.

What is most valuable?

You can spin up an Azure Databricks clustered, and integrating with it is seamless.

The integration with Python and the notebooks really helps.

What needs improvement?

There is definitely room for improvement.

This is the type of solution where you need to have people with technical expertise to use it.  Other products are self-service and can be employed by end-users. Databricks is not geared towards the end-user, but rather it is for data engineers or data scientists. I'm not sure whether Databricks is working towards it, or not.

It would be nice if it were more user-friendly, where you don't have to rely on Power BI or a visualization tool. I know that there is integration in the notebook where you can do it, but still, the relationships and semantics make it more difficult. It would be better to do it right in Databricks. You could put them within the portal and I don't have to log out and bring that into Power BI and then visualize.

What do I think about the stability of the solution?

We have not done any major implementation yet, although I think it's stable to an extent. I can't comment on it in terms of benchmark and experiencing any issues. It works seamlessly in the places where I've used it.

What do I think about the scalability of the solution?

Our implementations have been small and we haven't needed to scale as of yet. 

Databricks can help you to build a data lake, and it's something that they need to help make more popular. People are slowly understanding it because if you look, there are lots of data lakes that people are trying to create. I'm not intimate with it, but the concept seems complicated. I think they need to write up something where videos can explain it better. What I have seen on YouTube is quite complicated for an end-user to understand.

How was the initial setup?

The initial setup is easy. It's not difficult when you are used to Azure.

What's my experience with pricing, setup cost, and licensing?

I am based in South Africa, where it is expensive adapting to the cloud, and then there is the price for the tool itself. 

The cost is difficult to estimate. I've got customers who went to the cloud and then they realized that the costs were more, compared to what they used to be on-premises. Also, because our exchange rate is so weak, I would always advocate that prices being lower is better, although I don't know how feasible it is.

What other advice do I have?

From a purely technical perspective, I would rate Databricks and eight out of ten. However, there is a failure in terms of user adoption. After I look at other products, including Synapse, those are better. I still feel that Databricks is quite complicated for the average person.

I would rate this solution a five out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Abhijith Dattatreya
Business Intelligence and Analytics Consultant at a tech services company with 201-500 employees
Consultant
Top 10
Easy to switch loads between clusters and automation is easy using the API

Pros and Cons

  • "Automation with Databricks is very easy when using the API."
  • "Some of the error messages that we receive are too vague, saying things like "unknown exception", and these should be improved to make it easier for developers to debug problems."

What is our primary use case?

I am a developer and I do a lot of consulting using Databricks.

We have been primarily using this solution for ETL purposes. We also do some migration of on-premises data to the cloud.

What is most valuable?

The most valuable feature is the ability to switch loads between multiple clusters.

Automation with Databricks is very easy when using the API.

The ability to write code and SQL in the same interface is useful.

It is easy to connect notebooks to a cluster.

There are a large number of inbuilt functions that help to make things easier.

What needs improvement?

Some of the error messages that we receive are too vague, saying things like "unknown exception", and these should be improved to make it easier for developers to debug problems. As it is now, we have to go into the driver logs to identify the error messages properly. 

There is not much information about Databricks available online, such as cost. Whenever we want to find the actual costing, we have to send an email to Databricks, so having the information available on the internet would be helpful.

I would like to see integration with Power BI or Tableau for the business users. They may use Databricks to check on things, but it will be a little bit complicated for them. The GUI interfaces for Tableau and Power BI are ones that they are used to, so the integration would help.

For how long have I used the solution?

I have been using Databricks for about five and a half years.

What do I think about the stability of the solution?

We have found that in the development environment, Databricks is pretty stable. We have had problems where something works in development but does not work in production, and this can happen when the version is updated and certain features have been deprecated. This means that more testing is required before moving to production, but this is the only drawback that we have seen.

Basically, when we move across version we have found issues, but otherwise, it's pretty stable.

What do I think about the scalability of the solution?

Scalability is one of the main features of Databricks. We have used datasets that are one hundred megabytes in size up to one terabyte, and we can manage, so it's easily scalable.

We have a large company with between 400 and 500 people using this solution.

How are customer service and technical support?

We have not reached out for technical support on Databricks.

How was the initial setup?

I found the initial setup easy because I had previously worked on Spark.

If somebody goes through the training, which is available on the website, then it should be straightforward. I don't think that it is very hard.

When it comes to developing things based on use cases, it can take between three days and two weeks, plus two to three days for testing and deploying it. I would say that for an entire use case, it will take a maximum of three weeks.

What other advice do I have?

My advice for developers who are interested in working with this solution is to first go through the Spark architecture.

I would rate this solution a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
OB
Security Consulting, Manager at a computer software company with 1,001-5,000 employees
Vendor
Top 5Leaderboard
A scalable solution to quickly process and analyze streams of information

Pros and Cons

  • "Databricks helps crunch petabytes of data in a very short period of time."
  • "Costs can quickly add up if you don't plan for it."

What is our primary use case?

We are working with Databricks and SMLS in the financial sector for big data and analytics. There are a number of business cases for analysis related to debt there. Several clients are working with it, analyzing data collected over a period of time and planning the next steps in multiple business divisions.

My organization is a professional consulting service. We provide services for the other organizations, which implement and use them in a production environment. We manage, implement, and upgrade those services, but we don't use them.

What is most valuable?

Databricks helps crunch petabytes of data in a very short period of time for data scientists or business analysts. It helps with fraud analysis, finance, projections, etc. I like it.

This is exactly the purpose of big data and analytics. It provides the mechanism to process and analyze a stream of information. It's best for share analysis and stream analysis.

What needs improvement?

Costs can quickly add up if you don't plan for it. 

For how long have I used the solution?

I've been using Databricks for just over a year.

What do I think about the stability of the solution?

Databricks is stable. It also helps that their support is included as part of the service.

What do I think about the scalability of the solution?

Databricks is scalable. The only issue is how much money you have for it. For example, if you need to run 100 servers, there's an eight-course with 256 gigabytes of RAM. You run out of money easily. It's charged to your credit card or your account, and you'll have to pay for it if you don't plan for that in advance.

How are customer service and technical support?

Databricks technical support is excellent. They provided their responses on time, and they're useful. However, I don't have extensive experience with them.

Which solution did I use previously and why did I switch?

I have used different Microsoft solutions before.

How was the initial setup?

The initial setup depends on the readiness of the team working with Databricks. There is no one template saying that it's easy, and it isn't easy. It can be complex to set up if you don't have a really good plan.

You can get in this environment at least for a test. You can do it in the lab, follow it step by step, and that'll take about an hour. The difficulty depends on the business requirements. 

If it's for training purposes, you can do it in about half an hour, and you're good to go. If you need it to support a business, it will be much more rigorous because multiple divisions would be interested in running their own environment, working with their data.

What's my experience with pricing, setup cost, and licensing?

The price is okay. It's competitive. 

What other advice do I have?

If you're thinking of implementing Databricks, I would recommend working with professionals. It'll help you save time. Also, plan the work and work the plan. Otherwise, it'll be a waste of time and money.

On a scale from one to ten, I would give Databricks a nine.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Oscar Estorach
Chief Data-strategist and Director at theworkshop.es
Real User
Top 5Leaderboard
Flexible, stable, and reasonably priced

Pros and Cons

  • "The solution is very easy to use."
  • "The integration of data could be a bit better."

What is our primary use case?

We primarily use the solution for retail and manufacturing companies. It allows us to build data lakes.

What is most valuable?

The solution is very easy to use. 

The storage on offer is very good. 

The solution is perfect for dealing with big data.

The artificial intelligence on offer is very good.

The product is quite flexible.

We have found the solution to be stable. 

The cloud services on offer are very reasonably priced.

Technical support is very good. They also have very good documentation on offer to help you navigate the product and learn about its offerings. 

What needs improvement?

The solution works very well for us. I can't recall any missing features or anything the solution really lacks. It's very complete. 

It would help if there were different versions of the solution on offer.

The integration of data could be a bit better.

For how long have I used the solution?

I've worked for about 20 to 25 years in business intelligence analytics and have worked with Databricks for about four years at this point. 

What do I think about the stability of the solution?

The stability of the solution is very good. It doesn't crash or freeze. There are no bugs or glitches. Its performance is very good.

What do I think about the scalability of the solution?

The scalability is quite good. A company that needs to expand it can do so with ease.

We only have four people on the solution at this time. The front-end users never use the product directly. The companies aren't that big here. If the economy improves, we'll likely have more of a need for the product.

How are customer service and technical support?

I've dealt with technical support in the past and have found them to be very good. They are helpful and responsive. We are satisfied with their level of service.

Which solution did I use previously and why did I switch?

I work with  Databricks, Cloudera and Snowflake.

How was the initial setup?

The solution is on the cloud and therefore there isn't really an installation process that you need to go through. You only really need to configure the clusters. 

Within the clusters, you configure according to how many platforms you need, or if you want to, you can build a cluster for artificial intelligence. You just configure it as required. 

What's my experience with pricing, setup cost, and licensing?

The pricing of the product is very reasonable. The fact that it is on the cloud makes it a less expensive option. Other solutions that are on-premises are quite expensive.

What other advice do I have?

We are customers and end-users. 

Databricks is on the could and therefore, we're always on the latest version of the solution. It's constantly updated for us so that we have access to the latest updates and upgrades. 

I'd rate the solution at a nine out of ten. The capability of the product is quite good and we are very satisfied with it overall. 

I'd recommend the solution to other companies and organizations.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.