This is the backup and resources page for my talk on the cultural, business, and technology aspects of making the most of Big Data, paritcularly Hive and Hadoop.
A few years ago Red Gate had a rich pool of data on feature usage across our applications. Teams enthusiastically added to the database, SQL Server ground to a halt and no one used the data any more. This sessions tells how we used Hadoop to get the value back, and how we spread that value to the whole business. It goes into the tools, technical and cultural, we used to make big data work.
This is the real-world story of how a 300 employee business experienced and solved a classic big data problem using a range of Hadoop and Hive based technologies. We found and demonstrated the value locked-up in our growing body of unwieldy data, and opened it up, to be a useful source of information for our product development teams.
We’ll look at how Red Gate spread the solution throughout the business, putting the data into the hands of ordinary business analysts, marketers and developers, and how we hurdled barriers to acceptance and adoption. I will also cover the issues we hit, and overcame, with data cleanliness, quality, and exposing complex data in an accessible way to non-data-scientists, along with some of the tools we used and developed along the way.
If you are looking to get value out of your big data, or to take your Hadoop experiment beyond the lab and into the rest of your organisation, this talk will help you avoid some of the pitfalls. I will also introduce some great technical and cultural tools you need to make big data work for your business.
This talk was first delivered at Code PaLOUsa 2014. Slides are available at slideshare.
With NoSQL, NewSQL and plain old SQL, there are so many tools around it’s not always clear which is the right one for the job.more…
HDInsight is Microsoft’s Hadoop PaaS offering on Azure. Microsoft have partnered with HortonWorks to bring a hosted version of the HortonWorks Data Platform to the Azure service, which is great, but there’s more. The really clever thing they’ve done is to essentially replace (or at least sideline) the HDFS part of Hadoop and insert Azure Storage blocks in its place. This means you don’t need your cluster to be persistent, all your data is off cluster, but you still get the benefits of distributed compute, and pretty close to the benefits of data localisation that you get with regular Hadoop. This lets you use the nice cheap redundant storage all the time, and only pay for huge compute nodes when you need them, but how do you manage turning clusters on and off all the time?more…
- See the archive for more posts