Subscribe to DSC Newsletter

Igor Bobriakov's Blog (50)

Practical Apache Spark in 10 minutes. Part 3 - DataFrames and SQL

Spark SQL is a part of Apache Spark big data framework designed for processing structured and semi-structured data. It provides a DataFrame API that simplifies and accelerates data manipulations. DataFrame is a special type of object, conceptually similar to a table in relational database. It represents a distributed collection…


Added by Igor Bobriakov on July 18, 2018 at 10:01pm — No Comments

Practical Apache Spark in 10 minutes. Part 2 - RDD

Spark’s primary abstraction is a distributed collection of items called a Resilient Distributed Dataset (RDD). It is a fault-tolerant collection of elements which allows parallel operations upon itself. RDDs can be created from Hadoop InputFormats (such as HDFS files) or by transforming other RDDs. 



Added by Igor Bobriakov on July 17, 2018 at 11:07pm — No Comments

Comparison of Top 6 Python NLP Libraries

Natural language processing (NLP) is getting very popular today, which became especially noticeable in the background of the deep learning development. NLP is a field of artificial intelligence aimed at understanding and extracting important information from text and further training based on text data. The main tasks include speech…


Added by Igor Bobriakov on July 17, 2018 at 3:00am — 2 Comments

Practical Apache Spark in 10 minutes. Part 1 - Ubuntu installation

Apache Spark is a powerful open-source processing engine built around speed, ease of use, and sophisticated analytics. It has originally been developed at UC Berkeley in 2009, while Databricks was founded later by the creators of Spark in 2013.

The Spark engine runs in a variety of…


Added by Igor Bobriakov on July 13, 2018 at 2:33am — No Comments

Top 10 Data Science Use Cases in Insurance

The insurance industry is regarded as one of the most competitive and less predictable business spheres. It is instantly related to risk. Therefore, it has always been dependent on statistics. Nowadays, data science has changed this dependence forever.

Now, insurance companies have a wider range of…


Added by Igor Bobriakov on July 11, 2018 at 11:18pm — 1 Comment

Installation and running Ubuntu Virtual Box

Oracle VM VirtualBox - a suite of applications, system services and drivers that emulate the new computer equipment in the environment of the operating system where you installed VirtualBox. On a virtual machine can be installed almost any operating system. For example, on a real computer with Windows, you can install a virtual machine with operating systems Linux and use both operating systems simultaneously. This operation we wish to make in this article.…


Added by Igor Bobriakov on July 6, 2018 at 3:28am — No Comments

Comparison of top data science libraries for Python, R and Scala [Infographic]

Data science is a promising and exciting field, developing rapidly. The area of data science use cases and influence is continuously expanding, and the toolkit to implement these applications is growing fast. Therefore data scientists should be aware of what are the best solutions for the particular tasks.  

So while many languages can be useful for a data scientist, these three remain the most popular and…


Added by Igor Bobriakov on June 29, 2018 at 3:30am — No Comments

Top 20 Python libraries for data science in 2018

Python continues to take leading positions in solving data science tasks and challenges. Last year we made a blog post overviewing the Python’s libraries that proved to be the most helpful at that moment. This year, we expanded our list with new libraries and gave a fresh look to the ones we already talked about, focusing on the updates that have been made during the year.

Our selection actually…


Added by Igor Bobriakov on June 13, 2018 at 2:30am — 2 Comments

Top 20 R Libraries for Data Science in 2018 [Infographic]

R is a well-known and increasingly popular tool in the Data Science field. It is a programming language and a software environment primarily designed for statistical computing, so its interface and structure are very well suited for the scientific tasks. Moreover, R has one of the most developed libraries systems that counts thousands of packages to solve a wide variety of problems.

Although there are many general-purpose…


Added by Igor Bobriakov on May 22, 2018 at 2:00am — 2 Comments

Top 7 Data Science Use Cases in Finance

In recent years, the ability of data science and machine learning to cope with a number of principal financial tasks has become an especially important point at issue. Companies want to know more what improvements the technologies bring and how they can reshape their business strategies.

To help you answer these questions, we…


Added by Igor Bobriakov on May 14, 2018 at 4:30am — No Comments


  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service