Spark is a powerful tool which can be applied to solve many interesting problems. Some of them have been discussed in our previous posts. Today we will consider another important application, namely streaming. Streaming data is the data which continuously comes as small records from different sources. There are many use cases for streaming technology…Continue
Added by Igor Bobriakov on July 30, 2018 at 3:53am — No Comments
Thus, data has become of great importance for those willing to take profitable decisions concerning business. Moreover, a…Continue
Added by Igor Bobriakov on July 26, 2018 at 8:00am — No Comments
The vast possibilities of artificial intelligence are of increasing interest in the field of modern information technologies. One of its most promising and evolving directions is machine learning (ML), which becomes the essential part in various aspects of our life. ML has found successful applications in Natural Languages Processing, Face…Continue
Added by Igor Bobriakov on July 24, 2018 at 10:12pm — No Comments
Spark SQL is a part of Apache Spark big data framework designed for processing structured and semi-structured data. It provides a DataFrame API that simplifies and accelerates data manipulations. DataFrame is a special type of object, conceptually similar to a table in relational database. It represents a distributed collection…Continue
Added by Igor Bobriakov on July 18, 2018 at 10:01pm — No Comments
Spark’s primary abstraction is a distributed collection of items called a Resilient Distributed Dataset (RDD). It is a fault-tolerant collection of elements which allows parallel operations upon itself. RDDs can be created from Hadoop InputFormats (such as HDFS files) or by transforming other RDDs.
Added by Igor Bobriakov on July 17, 2018 at 11:07pm — No Comments
Natural language processing (NLP) is getting very popular today, which became especially noticeable in the background of the deep learning development. NLP is a field of artificial intelligence aimed at understanding and extracting important information from text and further training based on text data. The main tasks include speech…Continue
Apache Spark is a powerful open-source processing engine built around speed, ease of use, and sophisticated analytics. It has originally been developed at UC Berkeley in 2009, while Databricks was founded later by the creators of Spark in 2013.
The Spark engine runs in a variety of…Continue
Added by Igor Bobriakov on July 13, 2018 at 2:33am — No Comments
The insurance industry is regarded as one of the most competitive and less predictable business spheres. It is instantly related to risk. Therefore, it has always been dependent on statistics. Nowadays, data science has changed this dependence forever.
Now, insurance companies have a wider range of…Continue
Oracle VM VirtualBox - a suite of applications, system services and drivers that emulate the new computer equipment in the environment of the operating system where you installed VirtualBox. On a virtual machine can be installed almost any operating system. For example, on a real computer with Windows, you can install a virtual machine with operating systems Linux and use both operating systems simultaneously. This operation we wish to make in this article.…Continue
Added by Igor Bobriakov on July 6, 2018 at 3:28am — No Comments