Subscribe to DSC Newsletter

Jwork.ORG's Blog (20)

HandWiki encyclopedia of datascience

In 2020, HandWiki has become the largest online wiki encyclopedia for major science topics (physics, math etc.) and computing. It has more than 105,000 scholarly articles, incorporating the current Wikipedia articles, scholarly articles submitted to the Wikipedia foundation (but later…


Added by jwork.ORG on January 16, 2020 at 6:30pm — No Comments

How to make you own Wiki from Wikipedia using Python

Here is a short blog I was asked to make about making a personal Wiki from Wikipedia. It shows the basic steps in text processing so I hope it will be useful for data scientists. It also requires some knowledge of MediaWiki setup on a web server, and some (not very advanced) knowledge of the Python programming language. It takes only several days to create this Wiki with Wikipedia articles if you know…


Added by jwork.ORG on October 24, 2019 at 1:21am — No Comments

Wikis for publishing scholarly articles on data science and software

By now you may already know that to add scholarly articles to the English version of Wikipedia is difficult due to the "notability" concept and tight control from anonymous editors (see this article). In recent years, entire Wikipedia topics and articles dedicated to software and data…


Added by jwork.ORG on September 28, 2019 at 4:30am — No Comments

Calculating exclusion limits for a new theory in hardcore science

In many fields of science, it is important to understand the relevance of new theories or hypotheses in a description of experimental data, assuming that such data are already well represented by predictions of some well-accepted theory. A popular statistical method for setting upper limits (also called exclusion limits) on model parameters of a new theory is based on the CLs method.…


Added by jwork.ORG on April 11, 2019 at 3:00pm — No Comments

Best dynamically-typed programming languages for data analysis

One can seriously argue about what programming language is the best for data analysis, but there is one universal metric that can define your choice: speed of calculations. Therefore, the word "best" in the title means the languages that lead to most performant applications. If most performant program can also be written in an easy-to-use, easy-to-learn, dynamically-typed…


Added by jwork.ORG on January 26, 2019 at 2:54pm — No Comments

Evaluation and comparison of open source software suites for data mining and knowledge discovery

An article by A.H.Abdulrahman, J. M. Luna, 2 M. A. Vallejo 3 and S. Ventura with the title "Evaluation and comparison of open source software suites for data mining and knowledge discovery" (published by Wiley "Data Mining and Knowledge Discovery, Vol 7 Issue 3 2017 see this link) provides the research community with an extensive study on different features included in any data mining tool. The final score for…


Added by jwork.ORG on October 30, 2018 at 3:18pm — No Comments

Statistical analysis on the Android platform

Last week a new release of AWork (version 2.0) was submitted to Google Play  (see the AWork link). Finally it supports Android 8+ devices with high resolution screens. AWork is a complete programming environment for Android devices…


Added by jwork.ORG on October 1, 2018 at 3:30pm — No Comments

Popularity of software programs for data science using recent reviews

In this article we discuss popularity of various software programs used for data analysis which are mentioned in various reviews published online in the period between 2017 and 2018. We used 14 reviews listed in the article Popularity of software programs for data…


Added by jwork.ORG on September 6, 2018 at 6:00pm — No Comments

Using Multi-Layer Recurrent Neural Network for language models

Here is another example of how to use Multi-Layer Recurrent Neural Network (RNN package) designed for character-level language models. This neural network was trained using 165,000+ real titles of acts submitted to the Congress from CONGRESS.GOV. The training was performed using GPU. Then the trained RNN was used to create "fake" titles. Use this link to find…


Added by jwork.ORG on August 17, 2018 at 4:29pm — No Comments

Everipedia as a desk reference for data mining topics

One interesting metric to check the  usefulness of Everipedia as a desk reference for data mining is to compare the number of relevant articles. Go to Everipedia ( and search for "data mining". You will get 7 articles.Then go to Wikipedia and search "data mining" You will see 4 articles (overlapped with similar Everipedia  articles).

Another example. Try the word "smoothing" which is a popular topic in data analysis.…


Added by jwork.ORG on August 2, 2018 at 1:34pm — No Comments

DataMelt published Java API documentation

DataMelt computational platform for data analysis organized its Java documentation:


Added by jwork.ORG on June 23, 2018 at 5:24pm — No Comments

Image identification using a convolutional neural network

This blog  explores a typical image identification task using a convolutional ("Deep Learning") neural network. For this purpose we will use a simple JavaCNN packageby D.Persson, and make our example small and concise using the Python scripting language. This example can also be rewritten in Java, Groovy, JRuby or any scripting language supported by the Java virtual machine.

This example will use images in the grayscale format (PGM). The name "PGM" is an acronym derived from…


Added by jwork.ORG on May 31, 2018 at 1:30pm — No Comments

Neural network classification of data using Smile

Data classification is the central data-mining technique used for sorting data, understanding of data and for performing outcome predictions. In this small blog we will use a library Smilecthat includes many methods for supervising and non-supervising data classification…


Added by jwork.ORG on March 13, 2018 at 4:00pm — No Comments

Recasting Java neural networks in Python

Many neural network applications implemented in Java, such as Neuroph, Encog and Joone, may look rather different when switching from the Java language to Python with the help of the DMelt computing environment. First of all, they look simpler. You can use your favorite Python tricks to load and display data. The Python coding is simpler for viewing and fast modifications. It does not require recompiling after each change. At the same time, the platform…


Added by jwork.ORG on July 29, 2017 at 1:00pm — No Comments

Coding graphs for data mining in Python using Java platform

Graphs belong to the field of mathematics, graph theory. For data analysis that requires searches of particular patterns, graph-based data mining becomes an important technique. Indeed, in real life, most of the data we have to deal with can be represented as graphs. A typical graph consists of vertices (nodes, cells), and of edges that…


Added by jwork.ORG on June 19, 2017 at 5:30pm — No Comments

Data analysis with DMelt

Data mining (sometimes called knowledge discovery) is the process of analyzing and summarizing data into useful information which can be used to understand common features, the origin of data and to extract hidden predictive information. Data mining is used in science, engineering,modeling and analysis of financial markets.

In this article we will discuss a free data-analysis framework called DMelt (The DataMelt project,…


Added by jwork.ORG on June 13, 2016 at 5:00pm — No Comments

DataMelt 1.5 with improved 3D graphics is released

 Finally, a new version of DataMelt (, a Java-based data-analysis framework based on open-source software, was released. This release features significantly improved graphics to display data and mathematical objects in 3D.  The updated canvas (called HPlotXYZ) uses Jzy3d and JOGL 2   to deploy deploy native OpenGL library. A few examples of images with data in 3D…


Added by jwork.ORG on May 25, 2016 at 2:47pm — No Comments

New book on data mining and statistics

New book:

Numeric Computation and Statistical Data Analysis on the Java Platform (by S.Chekanov)

710 pages. Springer International Publishing AG. 2016. ISBN 978-3-319-28531-3.

Book S.V.Chekanov 2016

About this book: Numerical computation, knowledge discovery…


Added by jwork.ORG on March 30, 2016 at 1:35pm — No Comments

Data analysis using Python on the Java platform

According to TIOBE Index for January 2016, the Java popularity index has reached 21%, leaving behind C++ (6%), while Python index is only 3.8%. These numbers can be different for data analysts positions, of course, where Python is likely to be more popular than Java.

But how about merging Python with Java? This is exactly what DMelt data…


Added by jwork.ORG on January 15, 2016 at 4:30pm — No Comments

Free data mining programs for everyday use

If you want quickly to get started with data analysis, here is my advise on free software programs that I use every day for data analysis, statistics and data mining.

R-package - a software for statistical computing written in C. Script oriented.

  • Pros: widely used, simple, extensive documentation.
  • Cons: less options for graphics compared to competitors, no…

Added by jwork.ORG on December 23, 2015 at 3:30pm — No Comments

Blog Topics by Tags

Monthly Archives








  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service