.

SQL is envisioned as an English-like language. Simple SQL statements read like English sentences. SQL writes a statement in an English way with English prepositions unnecessarily attached, while other main programming languages only use English words as the mnemonic of a certain concept or operation, producing formal program statements instead of English sentences. For example, the FROM clause is the main part of a query but it is put in the end of a SQL statement, and the BY in the SQL GROUP BY clause is in fact unnecessary.  

Originally, the English-like design aimed to enable non-IT end users who can read and write English to make queries in SQL. In reality, the idea, however well-intended, fails, if not completely. Most of the non-IT users can only write very simple queries in SQL. But for such queries, applications are usually equipped with convenient and interactive visual interface to assist the process. There’s no need to hand-code them, making the design lose its significance. In fact it is the programmers who use SQL most frequently to perform queries. They also use the language to program, and it doesn’t make much difference for them in understanding an English-like language and a non-English-like language. Actually the English-like design is more a hindrance than an advantage.

SQL is extremely strict about the syntax implementation. Any deviation will result in a refusal from the interpreter, so SQL users must learn to observe the rule of syntax. This is the common feature of all programming languages. By contrast, natural languages are flexible about the grammar and thus can accommodate some degree of ambiguities. SQL lacks this flexibility natural languages have. It wasn’t easy to achieve such a feature in the emerging era of SQL when the technology wasn’t so advanced.

 

The English-like nature of SQL has had big consequences, besides the lack of the merit of the English language. The simulation of English grammar doesn’t make SQL easier to grasp; rather, it serves to complicate it.

The biggest consequence of English-like design is that SQL becomes non-procedural. The program logic is performed step by step, during which variables are used to store intermediate values for use in later steps. The English language, however, uses a small number of conjunctions or relative pronouns to connect and define a relationship between clauses or between a main clause and a subordinate clause. It writes as much information as possible in one sentence. To express complex meanings exactly, it uses one or more subordinate clauses introduced by relative pronouns. SQL imitates this feature of English language by putting many actions like SLECT, WHERE and GROUP in one statement, producing multi-level nested subqueries for complicated queries. For example, SQL uses the WHERE clause and HAVING clause for the same action. The case is rare in other programming languages.

Stepwise coding effectively facilitates understanding and execution. A query is easy to understand if done in steps but becomes difficult to understand if handled in one statement. An example is to find clients whose sales are two times higher than the average. Naturally we will compute the average and then find the eligible clients. This needs only two statements. To do it in SQL, a long statement with a subquery is needed. The example isn’t difficult as it only involves the two-layer logic. For the English-like SQL, its description of a two-level relationship isn’t so difficult to understand. In reality, there are many queries that have three to five layers of subqueries that are real obstacles to understanding.

SQL’s lack of interest in stepwise coding leads to long statements. The complicated SQLs programmers have to write are often measured by K, instead of by lines. There’s a world difference in understanding a 100-line piece of code written in a single one statement and in 100 statements. It’s not only laborious to produce an extremely long statement, but it’s possible that the programmer themselves can’t understand it after a long time. A non-stepwise long statement is also difficult to debug and takes long to write.

 

About the procedurality, fans of SQL defend the language by saying that SQL users only need to take care of what they want and don’t need to care about how to do it. Machines will find the method of doing it, making it unnecessary for syntax to support the procedurality.

That’s nonsense!

Every programming language supports the “how” at some degree. Coding in assembly language needs to take care of the actions of the register and memory, but doesn’t need to concern about the actions of the lower-level NAND gate. With SQL there’s no need to handle data operations at the physical storage level (file system, internal memory and hard disk), but we still need to handle operations at the logical level (tables and fields). In a sense a SQL statement is a description of the computing logic. A complicated multi-level nested query also offers the execution steps while expressing the computing target. In other words, SQL describes a computing target by stating the execution steps, though the way is rather sophisticated.  

 

Not an enthusiast of stepwise computation though, SQL is to some extent procedure-oriented. The stored procedure is a way of executing a SQL statement in a step-by-step fashion. Invoking a SQL statement with an external program can also achieve procedural computing. Without taking into consideration the low performance resulted from generating temporary tables (for storing intermediate values) and performing database I/O (used to get computing results during the invocation of SQL statements by an external program), both stored procedures and invocation by external programs provide equal functionalities. But to deal with the performance issue in handling huge amounts of data, we still need long SQL statements. The techniques are a valuable supplement to the SQL’s insufficient procedurality ability when the amount of data being handled isn’t large and the performance issue can be ignored.

Views: 2230

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Peter Vanroose on January 4, 2018 at 11:46pm

I agree with the conclusion that SQL is syntactically very strict, so it's not at all a natural language.

But I'm sorry to say that I disagree with the conclusion of this post that SQL is non-procedural!

It's up to you, the writer of the SQL SELECT statement, whether you want your query to be stepwise or not!

This is thanks to the "WITH" clause, containing so-called common table expressions (CTEs). Forget temporary tables: that's passé; your intermediate variables are the CTEs, and your stepwise procedural logic specifies how a next CTE depends on one or more base tables or previous CTEs.

To take your example: finding clients whose sales are at least two times higher than the average.

Step 1: derive the sales per client (and place that intermediate result in a first, two-column CTE);
Step 2: determine the average of those values (and place that intermediate result in a second CTE, a 1x1 table);
Step 3: filter the first CTE based on twice the value in the seond CTE.

So that would give something like: (just inventing some table & column names)

   WITH
clientSales(clientName,salesPerClient) AS
(SELECT clientName, SUM(sales) FROM baseTable GROUP BY clientName),
avgSales(salesAverage) AS
(SELECT AVG(salesPerClient) FROM clientSales)
SELECT clientName FROM clientSales,avgSales
WHERE salesPerClient >= 2 * salesAverage
Comment by jaap Karman on January 4, 2018 at 11:27am

@Kevin. It is even more confusing as in the article of Jiang.
As you are using SAS it will be a native SAS dialect SQL-99. It is missing the tree value logic.

It also lacking common native data types as varchar packed bignint blob clob 

When going for a DBMS conecction Oracle DB 2 Teradata or whatever there will be a native dialect  SQL-2013.
There will be grouping functions waiht rank and more. The SQL lag fucntion is not the SAS lag function.
Those grouped oredered fucntions is the some of the SAS function SQL did not hae in the early days.
Date datetime processing (temporal data) and geospatial as is XML has become SQL standard features.

SQL is very different to procedural languages but can perform very well knowing the behavior how the specific DBMS behind that is tuned and should behave. Something that is hard work and not commonly being done in the data science area.   

Comment by Kevin Friesen on January 4, 2018 at 10:40am

Just one comment: There is a difference in what HAVING and WHERE do. In many uses it does produce the same result, but there are usages where the choice between the two results in different output. I use SQL within SAS, so it is possible that this difference is unique to this work-space, but I don't think so. 

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service