Strata Hadoop World NY 2016 - IoT & real-time Track
Strata Hadoop World NY 2016 has following interestinig talks in its IoT & real-time sessions
Learn stream processing with Apache Beam by Tyler Akidau and Jesse Anderson
Come learn the basics of stream processing via a guided walkthrough of the most sophisticated and portable stream processing model on the planet—Apache Beam (incubating). Tyler Akidau and Jesse Anderson cover the basics of robust stream processing (windowing, watermarks, and triggers) with the option to execute exercises on top of the runner of your choice—Flink, Spark, or Google Cloud Dataflow.
Conquer the time series data pipeline with SMACK by Patrick McFadin
We as an industry are collecting more data every year. IoT, web, and mobile applications send torrents of bits to our data centers that have to be processed and stored, while users expect an always-on experience—leaving little room for error. Patrick McFadin explores how successful companies do this every day with powerful data pipelines built with SMACK: Spark, Mesos, Akka, Cassandra, and Kafka.
An introduction to Apache Kafka by Ian Wrigley
Ian Wrigley demonstrates how to leverage the capabilities of Apache Kafka to collect, manage, and process stream data for both big data projects and general-purpose enterprise data integration. Ian covers system architecture, use cases, and how to write applications that publish data to, and subscribe to data from, Kafka—no prior knowledge of Kafka required.
Powering real-time analytics on Xfinity using Kudu by Sridhar Alla and Kiran Muglurmath
Sridhar Alla and Kiran Muglurmath explain how real-time analytics on Comcast Xfinity set-top boxes (STBs) help drive several customer-facing and internal data-science-oriented applications and how Comcast uses Kudu to fill the gaps in batch and real-time storage and computation needs, allowing Comcast to process the high-speed data without the elaborate solutions needed till now.
Apache Kafka: The rise of real-time data and stream processing by Neha Narkhede
Neha Narkhede explains how Apache Kafka serves as a foundation to streaming data applications that consume and process real-time data streams and introduces Kafka Connect, a system for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library. Neha also describes the lessons companies like LinkedIn learned building massive streaming data architectures.
Watermarks: Time and progress in Apache Beam (incubating) and beyond by Slava Chernyak
Watermarks are a system for measuring progress and completeness in out-of-order streaming systems and are utilized to emit correct results in a timely manner. Given the trend toward out-of-order processing in existing streaming systems, watermarks are an increasingly important tool when designing streaming pipelines. Slava Chernyak explains watermarks and explores real-world applications.
Triggers in Apache Beam (incubating) by Kenneth Knowles
Triggers specify when a stage of computation should emit output. With a small language of primitive conditions, triggers provide the flexibility to tailor a streaming pipeline to a variety of use cases and data sources. Kenneth Knowles delves into the details of language- and runner-independent semantics for triggers in Apache Beam and explores real-world implementations in Google Cloud Dataflow.
Analytics for large-scale time series and event data by Ira Cohen
Time series and event data form the basis for real-time insights about the performance of businesses such as ecommerce, the IoT, and web services, but gaining these insights involves designing a learning system that scales to millions and billions of data streams. Ira Cohen outlines a system that performs real-time machine learning and analytics on streams at massive scale.
Fast cars, big data: How streaming data can help Formula 1 by Ted Dunning
Modern cars produce data. Lots of data. And Formula 1 cars produce more than their fair share. Ted Dunning presents a demo of how data streaming can be applied to the analytics problems posed by modern motorsports. Although he won't be bringing Formula 1 cars to the talk, Ted demonstrates a physics-based simulator to analyze realistic data from simulated cars.
Pulsar: Real-time analytics at scale leveraging Kafka, Kylin, and Druid by Tony Ng
Enterprises are increasingly demanding real-time analytics and insights. Tony Ng offers an overview of Pulsar, an open source real-time streaming system used at eBay. Tony explains how Pulsar integrates Kafka, Kylin, and Druid to provide flexibility and scalability in event and metrics consumption.
Implementing extreme scaling and streaming in finance by Jim Scott
Jim Scott outlines the core tenets of a message-driven architecture and explains its importance in real-time big data-enabled distributed systems within the realm of finance.
Shifting cities: A case study in data visualization by Brian Kahn and Edward Wisniewski
Radish Lab teamed up with science news nonprofit Climate Central to transform temperature data from 1,001 US cities into a compelling, simple interactive that received more than 1 million views within three days of launch. Alana Range and Brian Kahn offer an overview of the process of creating a viral, interactive data visualization with teams that regularly produce powerful data stories.
Twitter's real-time stack: Processing billions of events with Heron and DistributedLog by Karthik Ramasamy
Twitter generates billions and billions of events per day. Analyzing these events in real time presents a massive challenge. Karthik Ramasamy offers an overview of the end-to-end real-time stack Twitter designed in order to meet this challenge, consisting of DistributedLog (the distributed and replicated messaging system) and Heron (the streaming system for real-time computation).
When one data center is not enough: Building large-scale stream infrastructures across multiple data centers with Apache Kafka by Ewen Cheslack-Postava
You may have successfully made the transition from single machines and one-off solutions to large, distributed stream infrastructures in your data center. But what if one data center is not enough? Ewen Cheslack-Postava explores resilient multi-data-center architecture with Apache Kafka, sharing best practices for data replication and mirroring as well as disaster scenarios and failure handling.
Implementing streaming architecture with Apache Flink: Present and future by Kostas Tzoumas
Apache Flink has seen incredible growth during the last year, both in development and usage, driven by the fundamental shift from batch to stream processing. Kostas Tzoumas demonstrates how Apache Flink enables real-time decisions, makes infrastructure less complex, and enables extremely efficient, accurate, and fault-tolerant streaming applications.
How GE analyzes billions of mission-critical events in real time using Apache Apex, Spark, and Kudu by Venkatesh Sivasubramanian and Luis Ramos
Opportunities in the industrial world are expected to outpace consumer business cases. Time series data is growing exponentially as new machines get connected. Venkatesh Sivasubramanian and Luis Ramos explain how GE makes it faster and easier for systems to access (using a common layer) and perform analytics on a massive volume of time series data by combining Apache Apex, Spark, and Kudu.
How to achieve zero-latency IoT and FSI data processing with Spark by yaron haviv
Yaron Haviv explains how to design real-time IoT and FSI applications, leveraging Spark with advanced data frame acceleration. Yaron then presents a detailed, practical use case, diving deep into the architectural paradigm shift that makes the powerful processing of millions of events both efficient and simple to program.
Stream analytics in the enterprise: A look at Intel’s internal IoT implementation by Moty Fania
Moty Fania shares Intel’s IT experience implementing an on-premises IoT platform for internal use cases. The platform was designed as a multitenant platform with built-in analytical capabilities and based on open source big data technologies and containers. Moty highlights the lessons learned from this journey with a thorough review of the platform’s architecture.
Amazon Kinesis: Real-time streaming data in the AWS cloud by Roy Ben-Alta
Roy Ben-Alta explores the Amazon Kinesis platform in detail and discusses best practices for scaling your core streaming data ingestion pipeline as well as real-world customer use cases and design pattern integration with Amazon Elasticsearch, AWS Lambda, and Apache Spark.