All Videos Tagged Databricks (Data Science Central) - Data Science Central 2019-08-26T09:03:21Z https://www.datasciencecentral.com/video/video/listTagged?tag=Databricks&rss=yes&xn_auth=no DSC Webinar Series: From Pandas to Apache Spark™ tag:www.datasciencecentral.com,2019-07-03:6448529:Video:851584 2019-07-03T19:18:10.864Z Tim Matteson https://www.datasciencecentral.com/profile/2edcolrgc4o4b <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-from-pandas-to-apache-spark"><br /> <img alt="Thumbnail" height="135" src="https://storage.ning.com/topology/rest/1.0/file/get/3189210470?profile=original&amp;width=240&amp;height=135" width="240"></img><br /> </a> <br></br>***Please be aware there is a slight audio issue from approximately 10:45-13:00 in the recording***<br></br> <br></br> Presenting Koalas, a new open source project unveiled by Databricks, that brings the simplicity of pandas to the scalability powers of Apache Spark™.<br></br> <br></br> Data science with Python has exploded in popularity over the past few years and… <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-from-pandas-to-apache-spark"><br /> <img src="https://storage.ning.com/topology/rest/1.0/file/get/3189210470?profile=original&amp;width=240&amp;height=135" width="240" height="135" alt="Thumbnail" /><br /> </a><br />***Please be aware there is a slight audio issue from approximately 10:45-13:00 in the recording***<br /> <br /> Presenting Koalas, a new open source project unveiled by Databricks, that brings the simplicity of pandas to the scalability powers of Apache Spark™.<br /> <br /> Data science with Python has exploded in popularity over the past few years and pandas has emerged as the lynchpin of the ecosystem. When data scientists get their hands on a data set, pandas is often the most common exploration tool. It is the ultimate tool for data wrangling and analysis. In fact, pandas’ read_csv is often the very first command students run in their data science journey.<br /> <br /> The problem? pandas does not scale well to big data. It was designed for small data sets that a single machine could handle. On the other hand, Apache Spark has emerged as the de facto standard for big data workloads. Today many data scientists use pandas for coursework, and small data tasks. When they work with very large data sets, they either have to migrate their code to PySpark's close but distinct API or downsample their data so that it fits for pandas.<br /> <br /> Now with Koalas, data scientists get the best of both worlds and can make the transition from a single machine to a distributed environment without needing to learn a new framework.<br /> <br /> In this latest Data Science Central webinar, the developers of Koalas will show you how:<br /> <br /> Koalas removes the need to decide whether to use pandas or PySpark for a given data set<br /> For work that was initially written in pandas for a single machine, Koalas allows data scientists to scale up their code on Spark by simply switching out pandas for Koalas<br /> Koalas unlocks big data for more data scientists in an organization since they no longer need to learn PySpark to leverage Spark<br /> <br /> Speakers:<br /> Tony Liu, Product Manager, Machine Learning - Databricks<br /> Tim Hunter, Sr. Software Engineer and Technical Lead, Co-Creator of Koalas - Databricks<br /> <br /> Hosted by:<br /> Stephanie Glen, Editorial Director - Data Science Central DSC Webinar Series: Patterns for Successful Data Science Projects tag:www.datasciencecentral.com,2019-03-14:6448529:Video:809851 2019-03-14T21:54:32.450Z Tim Matteson https://www.datasciencecentral.com/profile/2edcolrgc4o4b <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-patterns-for-successful-data-science-projects"><br /> <img alt="Thumbnail" height="135" src="https://storage.ning.com/topology/rest/1.0/file/get/1424893480?profile=original&amp;width=240&amp;height=135" width="240"></img><br /> </a> <br></br>Running data science workloads is a challenge regardless of whether you are running them on your laptop, on an on-premises cluster, or in the cloud. While buying 100% managed service is an option, these tools can be expensive and lack extensibility. Therefore, many companies opt for open source data science tools like scikit-learn and… <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-patterns-for-successful-data-science-projects"><br /> <img src="https://storage.ning.com/topology/rest/1.0/file/get/1424893480?profile=original&amp;width=240&amp;height=135" width="240" height="135" alt="Thumbnail" /><br /> </a><br />Running data science workloads is a challenge regardless of whether you are running them on your laptop, on an on-premises cluster, or in the cloud. While buying 100% managed service is an option, these tools can be expensive and lack extensibility. Therefore, many companies opt for open source data science tools like scikit-learn and Apache Spark’s MLlib in order to balance both functionality and cost.<br /> <br /> However, even if a project succeeds at a point in time with any set of tools, these projects become harder and harder to maintain as data volumes increase and a desire for real-time pushes technology to its limit. New projects also struggle as new challenges of scale invalidate previous assumptions.<br /> <br /> In this latest Data Science Central Webinar, we will discuss some patterns we see that companies leverage to succeed with their data science projects.<br /> <br /> Key takeaways will be:<br /> <br /> Strategies for removing cognitive load for you and your team<br /> How to execute a program that is simple and effective<br /> How to best use the ecosystem of tools to be successful<br /> <br /> Speaker:<br /> Bill Chambers, Data Scientist - Databricks<br /> <br /> Hosted by:<br /> Rafael Knuth, Contributing Editor - Data Science Central DSC Webinar Series: An Expert’s Guide to Apache Spark™ tag:www.datasciencecentral.com,2018-05-23:6448529:Video:723863 2018-05-23T22:52:20.713Z Tim Matteson https://www.datasciencecentral.com/profile/2edcolrgc4o4b <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-an-expert-s-guide-to-apache-spark"><br /> <img alt="Thumbnail" height="135" src="https://storage.ning.com/topology/rest/1.0/file/get/2781548053?profile=original&amp;width=240&amp;height=135" width="240"></img><br /> </a> <br></br>Apache Spark™ has become the de-facto data processing and AI engine in enterprises today due to its speed, ease of use, and sophisticated analytics. As the first Unified Analytics engine to unify data with AI, Spark allows data engineering and data science teams to simplify data preparation and model training — enabling innovative AI use cases that… <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-an-expert-s-guide-to-apache-spark"><br /> <img src="https://storage.ning.com/topology/rest/1.0/file/get/2781548053?profile=original&amp;width=240&amp;height=135" width="240" height="135" alt="Thumbnail" /><br /> </a><br />Apache Spark™ has become the de-facto data processing and AI engine in enterprises today due to its speed, ease of use, and sophisticated analytics. As the first Unified Analytics engine to unify data with AI, Spark allows data engineering and data science teams to simplify data preparation and model training — enabling innovative AI use cases that leverage advanced analytics like machine learning, graph analytics, and deep learning.<br /> <br /> Join Bill Chambers, author of the book “Spark: The Definitive Guide,” and Matei Zaharia, Chief Technologist and Co-founder of Databricks and the orginal creator of Apache Spark™, in this Data Science Central webinar as they break down the basic operations and common functions of Spark and walk through sample use cases where Spark has helped accelerate AI innovation.<br /> <br /> In this webinar, we will cover:<br /> <br /> A gentle overview of big data and Spark<br /> Expert guidance on how to use, deploy and maintain Spark <br /> The fundamentals of monitoring, tuning, and debugging Spark<br /> An exploration into machine learning techniques and scenarios for employing MLlib, Spark’s scalable machine-learning library<br /> <br /> Speakers:<br /> Bill Chambers, Product Manager -- Databricks<br /> Matei Zaharia, Co-founder and Chief Technologist -- Databricks<br /> <br /> Hosted by:<br /> Bill Vorhies, Editorial Director -- Data Science Central Matei Zaharia’s Predictions for 2018: Big Data and AI Highlights tag:www.datasciencecentral.com,2018-02-02:6448529:Video:689388 2018-02-02T17:43:21.210Z Tim Matteson https://www.datasciencecentral.com/profile/2edcolrgc4o4b <a href="https://www.datasciencecentral.com/video/matei-zaharia-s-predictions-for-2018-big-data-and-ai-highlights"><br /> <img alt="Thumbnail" height="135" src="https://storage.ning.com/topology/rest/1.0/file/get/2781532298?profile=original&amp;width=240&amp;height=135" width="240"></img><br /> </a> <br></br>Over the past few years, AI and big data have powered numerous technologies that have changed the way we live, from autonomous cars to conversational systems to personalization. As a result, the excitement around these technologies has spiked. But how can we separate the hype from reality, and which advances will make an impact in practice… <a href="https://www.datasciencecentral.com/video/matei-zaharia-s-predictions-for-2018-big-data-and-ai-highlights"><br /> <img src="https://storage.ning.com/topology/rest/1.0/file/get/2781532298?profile=original&amp;width=240&amp;height=135" width="240" height="135" alt="Thumbnail" /><br /> </a><br />Over the past few years, AI and big data have powered numerous technologies that have changed the way we live, from autonomous cars to conversational systems to personalization. As a result, the excitement around these technologies has spiked. But how can we separate the hype from reality, and which advances will make an impact in practice next?<br /> <br /> In this DSC webinar, Databricks co-founder and Stanford computer science professor Matei Zaharia, who started the Apache Spark project in 2009, will share his perspective on which big data and AI trends will come to fruition in 2018. He will discuss how centering organizations around high-quality data will be the main driver to AI, which AI applications are seeing broad success in practice, and how new technologies including deep learning, data marketplaces and cloud computing will affect the computing landscape.<br /> <br /> Join this webinar to learn about:<br /> <br /> The current state of big data and AI<br /> Some of the new innovations taking place in research<br /> Key challenges that companies face in getting value from data and AI<br /> Matei’s predictions for 2018 for how companies and the technology industry will overcome these challenges<br /> Speaker: Matei Zaharia, Co-founder and Chief Technologist -- Databricks<br /> <br /> Hosted by: Bill Vorhies, Editorial Director -- Data Science Central Apache Spark and Agile Model Development tag:www.datasciencecentral.com,2017-10-25:6448529:Video:641072 2017-10-25T04:38:33.912Z Tim Matteson https://www.datasciencecentral.com/profile/2edcolrgc4o4b <a href="https://www.datasciencecentral.com/video/apache-spark-and-agile-model-development"><br /> <img alt="Thumbnail" height="135" src="https://storage.ning.com/topology/rest/1.0/file/get/2781532037?profile=original&amp;width=240&amp;height=135" width="240"></img><br /> </a> <br></br>The combination of Deep Learning with Apache Spark has the potential for tremendous impact in many sectors of the industry. Apache Spark has quickly become a critical technology for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. Machine learning and AI has begun to unlock new… <a href="https://www.datasciencecentral.com/video/apache-spark-and-agile-model-development"><br /> <img src="https://storage.ning.com/topology/rest/1.0/file/get/2781532037?profile=original&amp;width=240&amp;height=135" width="240" height="135" alt="Thumbnail" /><br /> </a><br />The combination of Deep Learning with Apache Spark has the potential for tremendous impact in many sectors of the industry. Apache Spark has quickly become a critical technology for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. Machine learning and AI has begun to unlock new possibilities that are creating a competitive advantage for companies. However, companies continue to struggle to increase the productivity of data scientists. The biggest hurdle to accelerate innovation has been the time to train, validate and deploy models.<br /> <br /> Join us for this latest Data Science Central webinar and hear from Richard Garris, Principal Solutions Architect at Databricks, as he shares his various experiences across multiple industries assisting customers with best practices for building deep learning pipelines driving agile model development practices.<br /> <br /> You will learn how to:<br /> <br /> Quickly train, validate and deploy different models by leveraging Apache Spark and cloud technologies<br /> Iterate through multiple models quickly by unifying the cloud infrastructure with big data processing capabilities<br /> Integrate workflows with data engineering and data science teams simplifying augmenting data to iterate on models<br /> Increase collaboration and agility within data science teams to improve quality and decrease time-to-production<br /> Speakers:<br /> <br /> Richard Garris, Principal Solutions Architect -- Databricks<br /> Wayne Chan, Senior product marketing manager -- Databricks<br /> <br /> Hosted by:<br /> Bill Vorhies, Editorial Director -- Data Science Central