|講者： Jicheng Shi / Sr. Software Engineer @ Kyligence
講題：Apache Kylin 2.0 – From Classic OLAP to Realtime Data Warehouse
Apache Kylin, started as a big data OLAP engine, is reaching its v2.0. Armed with snowflake schema support, full SQL interface, and can consume realtime streaming data, Apache Kylin is closing its gap to a realtime data warehouse.
This talk will present the latest features of Apache Kylin v2.0 and introduce the technical thinking and designs behind.
Since v1.6, Apache Kylin can support micro-batch data loading from Kafka, enabled minutes latency near realtime analysis. Start from v2.0, Apache Kylin will consume realtime records from Kafka to the latest second, analyzing realtime and historic data in one single platform.
Apache Kylin used to support star schema only, which is quite a limitation for some real world cases. In v2.0, by supporting snowflake schema directly, user can import arbitrary E-R model into Kylin, support the most comprehensive data model out-of-box. A big step forward to business deployment.
The SQL features of Kylin has been improving continuously. Some recent updates are window function support, percentile function, time functions, and more.
And as always, Apache Kylin focuses on replacing online calculation with offline pre-calculation, making it quite different from other SQL on Hadoop solutions. With the ever growing data volume, pre-calculation (and Apache Kylin) maybe the only way out to ensure a constant query response time on big data.
How to maximum the utilization of Hadoop computing power is
the biggest challenge for Hadoop administer. In this talk I
will explain how we use Machine Learning to build the predict
model for the computing power requirements and setting up
the MapReduce scheduler parameters dynamically, to maximum
utilize our Hadoop cluster computing power.
Apache Kylin committer, was Software Engineer in Microsoft Commerce Team
- DataCon.TW 2017 議程票選
- Data Science in the Enterprise