Home » Uncategorized

Stored Procedures: A Seemingly Nice Tool with Hidden Problems

The stored procedure is as widely used in database computing as the controversy around the technique is long-standing. By analyzing its two recognized merits, we try to locate the potential risks it poses and its application scenarios.

 

The stored procedure keeps user interface and business logic separate!

Today, it is a basic principle to separate user interface from business logic for application development. Different from the backstage data processing logic, the user interface runs in a variety of application environments, including PC, mobile phone and so on, and is volatile thanks to frequent changes. It’s cost-effective if the data processing code and the user interface code are independently developed while the maintenance and redevelopment of the user interface and the change of data processing logic can be done simultaneously.

Those in favor of the stored procedure believe that the technique is a way to keep user interface and business logic separate. The stored procedure is executed in the database in backstage, feeding data to the frontend system without having to take care of the format and any change of the interface. Encapsulating all data processing logics into stored procedures helps standardize data input/output interface, making it convenient to manage data access privileges.

But by further thinking we may know that stored procedures are not the only way to keep the user interface and the business logic separated. We can create a data access layer for data input and output to implement the goal. In fact some applications do use such a way, though it’s not common. So what’s the reason behind the rare use of the data access layer? What’s the difference between the two ways?

The reason lies in the degree of development difficulty.

The data processing logic often involves the batch processing of structured data, which a non-database-specific programming language can barely handle. Even a sum operation in Java needs many lines of code, let alone the filtering and grouping operations. The SQL-based stored procedure, however, has good support for structured data processing, making it easier to write complicated logic compared with high-level languages, though it is difficult to write and debug.

In other words, the stored procedure is effective in an effort to separate the user interface and the business logic. But its strength in implementing the backstage data processing logic is based on SQL’s set-oriented processing approach, which means the effect comes from fast and convenient development process, instead of an alternative application structure.

For computing scenarios wherein the computing ability of the stored procedure is unavailable, such as those involving the multi-database source and non-database source, a data access layer independent of the database is needed.

Can the stored procedure really keep user interface and business logic separate?

The concept of separating the user interface and the business logic is ambiguous, because there isn’t a clear definition of interface and there isn’t such a rule that the interface shouldn’t accommodate data computing tasks.

A typical scenario is reporting. A report needs to be displayed on the interface, and it is most likely that it will undergo a series of changes, making the reporting volatile. Obviously any change to a report needs to be done on the interface. The report development, however, always involves complex data preparation process. Contrary to many online recommendations that the stored procedure be used in handling the data preparation for complex reports, insisting on putting the data preparation process into a stored procedure and executing it as the backend logic won’t get benefit from the separation of user interface and business logic. Rather, this will bring great trouble.

Generally the report templates are drawn by the reporting tool and stored as regular files in the application. If the stored procedure is used to prepare the report data source, the two closely related parts – the template and the stored procedure – for report development will be physically stored in different places. Any change to the report will need to change both parts at the same time, which not only is error-prone but may increase the communication cost (because different people may be in charge of the two parts). A stored procedure in a shared database is probably used by more than one report, even more than one application, and a change to it could disable the other modules. Using the stored procedure to prepare report data source risks damaging the application’s module structure and increasing the application coupling, and this will result in a rise in maintenance cost.

The use of stored procedure also leads to the conflict between security and efficiency. In principle report development requires only the read-only access privilege for the database. But if the data source is prepared with the stored procedure, the privileges of compiling and executing the stored procedure should be open to the report developers, who would be able to perform any operations over the database, posing potential risks to security. Safety regulations will thus be stringently enforced and all uploaded stored procedures will subject to a lot scrutiny, which reduces efficiency, because a job that can be done by only the report developers has to involve more staff.

If there’s a convenient computing engine independent of the database, we can get away from the stored procedure’s problems by moving computations that are volatile and closely connected with the user interface outside of the database to integrate with the application. This way the maintenance cost can be reduced. We can even perform computations that are more stable outside of the database to address the issues of multi-database and non-database sources.  An application structure without stored procedures is a more reasonable one.

Does the stored procedure have high computational performance?

According to tests, using stored procedures to prepare data gives a higher performance compared with retrieving data out via SQL and then manipulating it outside of the database. But why the stored procedure performs better?

According to some views on the internet, stored procedures are fast because they are precompiled, while the SQL statements need to be compiled as they are executed. In fact, compared with the time spent in computing the data, the time taken to compile SQL statements can be ignored. A SQL statement that needs to be repeatedly executed with different parameters can also be precompiled. Some programmers embed different parameters in one SQL and send a different SQL to the database each time for execution. Then the compilation time can be quite long.

The truth is that stored procedures are fast because data is computed in the database. An application outside of the database must use the database interface to access the data. Most of the database interfaces, however, perform poor, particularly the JDBC interface intended for the JAVA applications. The JDBC interface will be called each time when a SQL statement is sent to the database for execution, which brings down the execution speed. There is network latency if the application and the database are not in the same computer, but it isn’t more serious than the poor performance of an interface. For external storage computing, the time taken to retrieve data from the database is usually longer than the data computing itself.

Moreover, stored procedures have a bad execution performance. According to tests on a top commercial database, the stored procedure is nearly one order of magnitude slower by retrieving data out row by row when performing a certain computation (such as a sum over a big table’s  field), while one SQL statement can get it done. Even a high-level language like Java is much faster than the stored procedure when performing the same computation by retrieving data from the file system. It’s easier to write parallel code for the external storage computing to make full use of the multiple CPUs of modern servers. But generally stored procedures don’t support the parallel processing. If there are too many stored procedures in a database for computation, the database will be further burdened and the already slow stored procedure will become slower when performing concurrent operations.

The so-called better performance of stored procedures is more a counterpoint of the inefficient database I/O than a desirable feature.

At present only the relational databases have relatively good ability of keeping transaction consistency. They are more suitable for working as the OLTP backend system, and the data collected in the frontend system will directly go into them. As a result, a large amount of original data will be stored in the database. If the database data is computed with external strategy, the overall performance will be very poor due to the extremely slow retrieval process. The use of the stored procedure, however, will achieve a good performance thanks to the in-database handling of data, though the computing itself is slow. So we can’t totally refuse the stored procedures with relational databases dominating the industry.