Subscribe to DSC Newsletter

JIANG Buxing's Blog (13)

A New Way to Compare and Reorganize Data

In real-world daily business routines, it is common that data that comes from different sources is of the same structure. Sometimes each set of data is independent and there isn’t any overlapping, like the sales data each branch office exports from their own database. Other times data overlaps heavily. In a common complete business process, it is most probably that all systems and sections input data based on their store of data. To compare the overlapped data and find and…

Continue

Added by JIANG Buxing on April 10, 2018 at 12:30am — No Comments

Relational Algebra Is the Root of SQL Problems

There is no doubt that SQL is the most widely-used working language for processing structured data. Not only is the language adopted by all relational database products, but its implementation is the goal of many newly-invented big data platforms. But in many aspects SQL isn’t so convenient to use in handling various computational and query demands. The procedurality issue stated in the last article is just a superficial one. SQL’s problems are rooted in its theory foundation, the relational…

Continue

Added by JIANG Buxing on January 14, 2018 at 10:30pm — 5 Comments

SQL: A Supposed English-like Language

SQL is envisioned as an English-like language. Simple SQL statements read like English sentences. SQL writes a statement in an English way with English prepositions unnecessarily attached, while other main programming languages only use English words as the mnemonic of a certain concept or operation, producing formal program statements instead of English sentences. For example, the FROM clause is the main part of a query but it is put in the end of a SQL statement, and the BY in the…

Continue

Added by JIANG Buxing on January 1, 2018 at 11:30pm — 3 Comments

An Open Computational System Brings Slim Databases

As stated in the last article, database obesity due to numerous intermediate tables and stored procedures is rooted in the closed computational system. If there is an independent computing engine providing computing capability independent of databases, then the latter can lose weight.

 

With a separate computing engine, the database-generated intermediate data doesn’t have to be stored as data tables; instead, it can be stored in the file system to be further computed…

Continue

Added by JIANG Buxing on December 12, 2017 at 12:30am — 3 Comments

Closed Computational System Leads to Bloated Databases

Not a few big organizations find their databases (or data warehouses) crammed with a huge number of old data tables, sometimes tens of thousands of them, after many years of operation. People have already forgotten why they are created; these tables even have long been useless. But all are kept for fear of mistaken deletion, causing heavy operation and maintenance workload. Moreover, a large number of stored procedures feed data continuously to these tables, seriously consuming the…

Continue

Added by JIANG Buxing on November 15, 2017 at 1:00am — No Comments

A View on the Difficulty of Stored Procedure Migration

The difficulty to migrate a stored procedure to other databases is always a subject of criticism. This is seldom forgotten every time when the shortcomings of the stored procedure are listed.

The migration of a stored procedure for handling the complex business logic is particularly problematic because its coding depends on the unique features and syntax of different databases and thus it needs to be recoded with the database changed. The cost won’t be very high if…

Continue

Added by JIANG Buxing on October 30, 2017 at 12:00am — No Comments

Stored Procedures: A Seemingly Nice Tool with Hidden Problems

The stored procedure is as widely used in database computing as the controversy around the technique is long-standing. By analyzing its two recognized merits, we try to locate the potential risks it poses and its application scenarios.

 

The stored procedure keeps user interface and business logic separate!

Today, it is a basic principle to separate user interface from business logic for application development. Different from the backstage data processing…

Continue

Added by JIANG Buxing on October 16, 2017 at 11:30pm — 1 Comment

What You Possibly Don’t Know About Columnar Storage

Columnar storage is a familiar data storage technique that is used by many data warehousing products because of its high effectiveness in many computing scenarios. The technique is usually a synonym of high-performance within the industry.

But is columnar storage a perfect strategy? A google shows that criticisms surrounding it are mainly about data modification. There are few discussions of its application to the read-only data analysis and computing, which will be taken care…

Continue

Added by JIANG Buxing on September 12, 2017 at 11:00pm — No Comments

The Triple-layered Reporting Architecture

By JIANG Buxing 

In conventional reporting architecture, a reporting tool is connected directly to data sources, without a data computing layer in between. Most of the time, the middle layer isn’t needed, and the computing purpose can be realized within the data source and by the reporting tool respectively. But development experience has taught us that there are certain types of reports for which the computations are not suitable to be handled either by data source or the…

Continue

Added by JIANG Buxing on August 27, 2017 at 10:30pm — 3 Comments

How Big Is A Terabyte of Data

How Big Is a Terabyte of Data?

By JINAG Buxing

It seems that one mile distance isn’t long, and that a cubic mile isn’t that big if compared with the size of the earth. You may be surprised if I tell you the entire world’s population could all fit in a cubic mile of space. Hendrik Willem van Loon, a Dutch-American writer, once wrote the similar thing in one of his books.

Teradata is a famous provider of database…

Continue

Added by JIANG Buxing on August 20, 2017 at 9:30pm — No Comments

Invest More beyond Tools to Improve Reporting Performance

As a type of front-end data service intended for end-users, reports in applications have received a lot of concerns about their performances.  It’s all users’ wish that parameters in and query and aggregate results out in no time. Though less than twenty seconds’ waiting time is within a tolerant range, three to five minutes will threaten to extend beyond a user’s patience with the worst product experience.

But why is a report so slow? And how shall we optimize its…

Continue

Added by JIANG Buxing on August 10, 2017 at 12:30am — No Comments

Don’t Allow Yourself to Be Hoodwinked by Unstructured Data Analytics Technology

By JIANG Buxing, Data Scientist 

As big data concept gains momentum, unstructured data analytics technology is becoming hot.  It is said that 80% of all data in an enterprise is unstructured, which is roughly true when it is measured by the space, considering the large size of the audio and video data. With huge amounts of data at hand, some technique is required to analyze them.

But make sure you are not misled by the universal unstructured data analytics…

Continue

Added by JIANG Buxing on July 24, 2017 at 12:00am — No Comments

Four Ways to Improve Back-end Performance for Multidimensional Analysis

 

By JIANG Buxing, Data Scientist 

Multidimensional analysis, commonly called OLAP, is an interactively data analysis process that performs operations, including rotation, slice and dice, drilldown etc., on a data cube. The structure of its back-end computation is simple, as shown by the following SQL:

SELECT D,..., SUM(M), ... FROM C WHERE D'=d' AND ... GROUP BY D,...

The statement…

Continue

Added by JIANG Buxing on July 20, 2017 at 5:30pm — No Comments

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service