All Videos Tagged Big (Data Science Central) - Data Science Central 2019-08-25T09:15:05Z https://www.datasciencecentral.com/video/video/listTagged?tag=Big&rss=yes&xn_auth=no DSC Webinar Series: From Pandas to Apache Spark™ tag:www.datasciencecentral.com,2019-07-03:6448529:Video:851584 2019-07-03T19:18:10.864Z Tim Matteson https://www.datasciencecentral.com/profile/2edcolrgc4o4b <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-from-pandas-to-apache-spark"><br /> <img alt="Thumbnail" height="135" src="https://storage.ning.com/topology/rest/1.0/file/get/3189210470?profile=original&amp;width=240&amp;height=135" width="240"></img><br /> </a> <br></br>***Please be aware there is a slight audio issue from approximately 10:45-13:00 in the recording***<br></br> <br></br> Presenting Koalas, a new open source project unveiled by Databricks, that brings the simplicity of pandas to the scalability powers of Apache Spark™.<br></br> <br></br> Data science with Python has exploded in popularity over the past few years and… <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-from-pandas-to-apache-spark"><br /> <img src="https://storage.ning.com/topology/rest/1.0/file/get/3189210470?profile=original&amp;width=240&amp;height=135" width="240" height="135" alt="Thumbnail" /><br /> </a><br />***Please be aware there is a slight audio issue from approximately 10:45-13:00 in the recording***<br /> <br /> Presenting Koalas, a new open source project unveiled by Databricks, that brings the simplicity of pandas to the scalability powers of Apache Spark™.<br /> <br /> Data science with Python has exploded in popularity over the past few years and pandas has emerged as the lynchpin of the ecosystem. When data scientists get their hands on a data set, pandas is often the most common exploration tool. It is the ultimate tool for data wrangling and analysis. In fact, pandas’ read_csv is often the very first command students run in their data science journey.<br /> <br /> The problem? pandas does not scale well to big data. It was designed for small data sets that a single machine could handle. On the other hand, Apache Spark has emerged as the de facto standard for big data workloads. Today many data scientists use pandas for coursework, and small data tasks. When they work with very large data sets, they either have to migrate their code to PySpark's close but distinct API or downsample their data so that it fits for pandas.<br /> <br /> Now with Koalas, data scientists get the best of both worlds and can make the transition from a single machine to a distributed environment without needing to learn a new framework.<br /> <br /> In this latest Data Science Central webinar, the developers of Koalas will show you how:<br /> <br /> Koalas removes the need to decide whether to use pandas or PySpark for a given data set<br /> For work that was initially written in pandas for a single machine, Koalas allows data scientists to scale up their code on Spark by simply switching out pandas for Koalas<br /> Koalas unlocks big data for more data scientists in an organization since they no longer need to learn PySpark to leverage Spark<br /> <br /> Speakers:<br /> Tony Liu, Product Manager, Machine Learning - Databricks<br /> Tim Hunter, Sr. Software Engineer and Technical Lead, Co-Creator of Koalas - Databricks<br /> <br /> Hosted by:<br /> Stephanie Glen, Editorial Director - Data Science Central DSC Webinar Series: Harness the Power of Big Data Analytics tag:www.datasciencecentral.com,2018-07-03:6448529:Video:740818 2018-07-03T21:17:09.572Z Tim Matteson https://www.datasciencecentral.com/profile/2edcolrgc4o4b <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-harness-the-power-of-big-data-analytics"><br /> <img alt="Thumbnail" height="135" src="https://storage.ning.com/topology/rest/1.0/file/get/2781546596?profile=original&amp;width=240&amp;height=135" width="240"></img><br /> </a> <br></br>From BI to AI, the need for Big Data and analytics is pervasive and transformational. However, Big Data technologies such as Hadoop or Spark are still quite complicated and not leveraged to their full capacity by business practitioners. New technologies are available to leverage the power of big data platforms for self-service data preparation… <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-harness-the-power-of-big-data-analytics"><br /> <img src="https://storage.ning.com/topology/rest/1.0/file/get/2781546596?profile=original&amp;width=240&amp;height=135" width="240" height="135" alt="Thumbnail" /><br /> </a><br />From BI to AI, the need for Big Data and analytics is pervasive and transformational. However, Big Data technologies such as Hadoop or Spark are still quite complicated and not leveraged to their full capacity by business practitioners. New technologies are available to leverage the power of big data platforms for self-service data preparation and automated machine learning to help organizations get the most out of their analytics initiatives and unlock the full potential of their Big Data investments.<br /> <br /> In this latest Data Science Central webinar you will learn the essentials you need for a modern data and analytics strategy, ways to expand your strategy development repertoire, and emerging approaches, as well as:<br /> <br /> Why big data solutions like Hadoop and Spark are ideal for machine learning and advanced analytics initiatives<br /> What Automated Machine Learning for Big Data is and how it can change your approach to ML<br /> How Self-Service Data Preparation reduces the work required to deliver clean data at scale for predictive modeling<br /> How to leverage Big Data platforms to rapidly deliver more accurate predictions for ML initiatives<br /> <br /> Speakers:<br /> Raju Penmatcha, PhD, Customer Facing Data Scientist -- DataRobot<br /> Connor Carreras, Manager for Customer Success, Americas -- Trifacta<br /> <br /> Hosted by:<br /> Bill Vorhies, Editorial Director -- Data Science Central DSC Webinar Series: An Expert’s Guide to Apache Spark™ tag:www.datasciencecentral.com,2018-05-23:6448529:Video:723863 2018-05-23T22:52:20.713Z Tim Matteson https://www.datasciencecentral.com/profile/2edcolrgc4o4b <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-an-expert-s-guide-to-apache-spark"><br /> <img alt="Thumbnail" height="135" src="https://storage.ning.com/topology/rest/1.0/file/get/2781548053?profile=original&amp;width=240&amp;height=135" width="240"></img><br /> </a> <br></br>Apache Spark™ has become the de-facto data processing and AI engine in enterprises today due to its speed, ease of use, and sophisticated analytics. As the first Unified Analytics engine to unify data with AI, Spark allows data engineering and data science teams to simplify data preparation and model training — enabling innovative AI use cases that… <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-an-expert-s-guide-to-apache-spark"><br /> <img src="https://storage.ning.com/topology/rest/1.0/file/get/2781548053?profile=original&amp;width=240&amp;height=135" width="240" height="135" alt="Thumbnail" /><br /> </a><br />Apache Spark™ has become the de-facto data processing and AI engine in enterprises today due to its speed, ease of use, and sophisticated analytics. As the first Unified Analytics engine to unify data with AI, Spark allows data engineering and data science teams to simplify data preparation and model training — enabling innovative AI use cases that leverage advanced analytics like machine learning, graph analytics, and deep learning.<br /> <br /> Join Bill Chambers, author of the book “Spark: The Definitive Guide,” and Matei Zaharia, Chief Technologist and Co-founder of Databricks and the orginal creator of Apache Spark™, in this Data Science Central webinar as they break down the basic operations and common functions of Spark and walk through sample use cases where Spark has helped accelerate AI innovation.<br /> <br /> In this webinar, we will cover:<br /> <br /> A gentle overview of big data and Spark<br /> Expert guidance on how to use, deploy and maintain Spark <br /> The fundamentals of monitoring, tuning, and debugging Spark<br /> An exploration into machine learning techniques and scenarios for employing MLlib, Spark’s scalable machine-learning library<br /> <br /> Speakers:<br /> Bill Chambers, Product Manager -- Databricks<br /> Matei Zaharia, Co-founder and Chief Technologist -- Databricks<br /> <br /> Hosted by:<br /> Bill Vorhies, Editorial Director -- Data Science Central DSC Webinar Series: The State of Data Preparation in 2018 tag:www.datasciencecentral.com,2018-04-06:6448529:Video:710341 2018-04-06T00:49:16.408Z Tim Matteson https://www.datasciencecentral.com/profile/2edcolrgc4o4b <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-the-state-of-data-preparation-in-2018"><br /> <img alt="Thumbnail" height="135" src="https://storage.ning.com/topology/rest/1.0/file/get/2781532423?profile=original&amp;width=240&amp;height=135" width="240"></img><br /> </a> <br></br>Over the past few years, data preparation has emerged as a stand-alone category within data management and analytics. A technology category that originated out of joint research across UC Berkeley and Stanford, it is now recognized as a critical technology by end users, organizations and industry analysts alike. Data preparation has evolved… <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-the-state-of-data-preparation-in-2018"><br /> <img src="https://storage.ning.com/topology/rest/1.0/file/get/2781532423?profile=original&amp;width=240&amp;height=135" width="240" height="135" alt="Thumbnail" /><br /> </a><br />Over the past few years, data preparation has emerged as a stand-alone category within data management and analytics. A technology category that originated out of joint research across UC Berkeley and Stanford, it is now recognized as a critical technology by end users, organizations and industry analysts alike. Data preparation has evolved tremendously since the category first emerged in 2015. So what’s new? How far have we come? Where are we headed in the future?<br /> <br /> Join Dresner Advisory Service’s Chief Research Officer, Howard Dresner, for an interactive webinar that will provide an overview of the data preparation market. In the session, Howard will review findings from his 2018 “Wisdom of the Crowds Market Study” on data preparation, compiled from end user responses.<br /> <br /> In this latest Data Science Central webinar, we will cover the following topics:<br /> <br /> How data preparation is being utilized within organizations - what users &amp; departments utilize data prep?<br /> What are the most critical features of data preparation technologies?<br /> Differences between traditional ETL technologies and this new generation of data preparation tools.<br /> Speakers:<br /> Howard Dresner, Chief Research Officer -- Dresner Advisory Services<br /> Will Davis, Director of Product Marketing -- Trifacta<br /> <br /> Hosted by:<br /> Bill Vorhies, Editorial Director -- Data Science Central DSC Webinar Series: Toolsets to Streamline the Workflow for Today’s Data Analytics tag:www.datasciencecentral.com,2015-02-26:6448529:Video:253161 2015-02-26T19:12:38.087Z Andrei Macsin https://www.datasciencecentral.com/profile/AndreiMacsin <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-toolsets-to-streamline-the-workflow-for-today"><br /> <img alt="Thumbnail" height="180" src="https://storage.ning.com/topology/rest/1.0/file/get/2781527780?profile=original&amp;width=240&amp;height=180" width="240"></img><br /> </a> <br></br>Organizations today are overly familiar with the complicated and often painful nature of handling the flood of data involved in their analytics process. It’s likely that you will need multiple tools to ingest, clean and merge your data before you can even start to build your models and analyze your datasets for business insights. And still… <a href="https://www.datasciencecentral.com/video/dsc-webinar-series-toolsets-to-streamline-the-workflow-for-today"><br /> <img src="https://storage.ning.com/topology/rest/1.0/file/get/2781527780?profile=original&amp;width=240&amp;height=180" width="240" height="180" alt="Thumbnail" /><br /> </a><br />Organizations today are overly familiar with the complicated and often painful nature of handling the flood of data involved in their analytics process. It’s likely that you will need multiple tools to ingest, clean and merge your data before you can even start to build your models and analyze your datasets for business insights. And still more time as you integrate them into a visualization application to have the data ultimately reviewed by the business decision makers within any given organization.<br /> <br /> In this webinar we will explore a new set of tools provided by Alteryx and Qlik that streamlines this otherwise highly inefficient process, making it simple, fast, intuitive, and cost-effective. You will gain deeper insights into your data within hours rather than weeks.<br /> <br /> Speakers:<br /> <br /> Michael Snow, Partner Marketing Director, Alteryx<br /> Gene Rinas, Senior Solutions Engineer, Alteryx<br /> Jesús Centeno, Alliances Technology Manager, Qlik<br /> <br /> Hosted by: Tim Matteson, Cofounder, Data Science Central