Strata Hadoop World NY 2016 - Hadoop use cases Track
Strata Hadoop World NY 2016 had following interestinig talks in its Hadoop use cases sessions
Hadoop application architectures: Architecting a next-generation data platform for real-time ETL, data analytics, and data warehousing by Jonathan Seidman and Mark Grover and Ted Malaska
Jonathan Seidman, Gwen Shapira, Mark Grover, and Ted Malaska demonstrate how to architect a modern, real-time big data platform and explain how to leverage components like Kafka, Impala, Kudu, Spark Streaming, and Spark SQL with Hadoop to enable new forms of data processing and analytics such as real-time ETL, change data capture, and machine learning.
How the largest US healthcare dataset in Hadoop enables patient-level analytics in near real time by Navdeep Alam
The need to find efficiencies in healthcare is becoming paramount as our society and the global population continue to grow and live longer. Navdeep Alam shares his experience and reviews current and emerging technologies in the marketplace that handle working with unbounded, de-identified patient datasets in the billions of rows in an efficient and scalable way.
Planning your SQL-on-Hadoop cluster for a multiuser environment with heterogeneous and concurrent query workloads by Jun Liu and Zhaojuan Bian
Many challenges exist in designing an SQL-on-Hadoop cluster for production in a multiuser environment with heterogeneous and concurrent query workloads. Jun Liu and Zhaojuan Bian draw on their personal experience to address these challenges, explaining how to determine the right size of your cluster with different combinations of hardware and software resources using a simulation-based approach.
Creating real-time, data-centric applications with Impala and Kudu by Marcel Kornacker and Todd Lipcon
Todd Lipcon and Marcel Kornacker explain how to simplify Hadoop-based data-centric applications with the CRUD (create, read, update, and delete) and interactive analytic functionality of Apache Impala (incubating) and Apache Kudu (incubating).
Big data processing with Hadoop and Spark, the Uber way by Praveen Murugesan
Praveen Murugesan explains how Uber leverages Hadoop and Spark as the cornerstones of its data infrastructure. Praveen details the current data architecture at Uber and outlines some of the unique challenges with data processing Uber faced as well as its approach to solving some key issues in order to continue to power Uber's real-time marketplace.
How a Spark-based feature store can accelerate big data adoption in financial services by Kaushik Deka and Phil Jarymiszyn
Kaushik Deka and Phil Jarymiszyn discuss the benefits of a Spark-based feature store, a library of reusable features that allows data scientists to solve business problems across the enterprise. Kaushik and Phil outline three challenges they faced—semantic data integration within a data lake, high-performance feature engineering, and metadata governance—and explain how they overcame them.
Zillow: Transforming real estate through big data and data science by Jasjeet Thind
Zillow pioneered providing access to unprecedented information about the housing market. Long gone are the days when you needed an agent to get comparables and prior sale and listing data. And with more data, data science has enabled more use cases. Jasjeet Thind explains how Zillow uses Spark and machine learning to transform real estate.
Hadoop and Spark at ING: An overview of the architecture, security, and business cases at a large international bank by Bas Geerdink
Bas Geerdink offers an overview of the evolution that the Hadoop ecosystem has taken at ING. Since 2013, ING has invested heavily in a central data lake and data management practice. Bas shares historical lessons and best practices for enterprises that are incorporating Hadoop into their infrastructure landscape.