|講者： Josh Yeh / Software Engineer @ Cloudera
講題：Data Science in the Enterprise
“Machine learning is all the rage. ML poses great opportunities for enterprises who already capture vast amount of data, and Cloudera’s customers are using our platform to solve ML problems everyday.
However, the reality is: getting data from an enterprise data hub is no trivial task for a data scientist. The data access must be secured through Kerberos authentication; the tools and libraries that data scientists use often conflicting each other, which creates a management problem. A data scientist could download data to his/her own laptop for data modeling, but it creates data silo, small dataset problem, in addition to data governance problems for cluster administrators. Data scientists also want to shorten the time of putting a model into production, which is really hard in today’s environment.
Cloudera developed Cloudera Data Science Workbench to help making data science easier at enterprise scale. In this talk, I will review a few problems that today’s data scientist have, and I will talk about how CDSW makes data scientist’s’ life easier. Finally, I will do a demo.”
- Apache Kylin 2.0 – From Classic OLAP to Realtime Data Warehouse
- The Rise of Open Source Data Platforms: An Insider’s view