Throughout this article, you will learn everything you need to query optimization in the DBMS; to considerably improve the response times of these technological systems.
Main DBMS today.
Query optimization, What are DBMS?
Before we start talking about some tips and recommendations to improve the query optimization of your website or applications. It is important that you know a little about the DBMS that exist today.
The SGBD, whose acronym stands for Database Management System; They are a set of programs that allow the user to manipulate certain information in every sense, housed in a database; such as extracting, storing and / or modifying these.
In addition to this, they help to provide all the necessary security to this database, to manage and control the flow of user inputs and outputs; even the protection of these in case the information has been corrupted and disappeared, being recovered. As an important point, they are also a relevant part of the query optimization.
In other words, for an application, computer or website to function correctly, the presence of a database system is necessary, otherwise, it would not be useful at all.
Among the most notable examples that we can name of the DBMS, which exist today; have: Microsoft SQL Server; CouchDB (this one specifically oriented to the part of documents); MongoDB (oriented the same as the previous one); and the most famous of them and the most used, MySQL, relational, open source, used by platforms like WordPress.
If you want to know more about the database and specifically about MySQL; We recommend the following article, where you will learn more about this software so widely used today: Data types in MYSQL.
What is query optimization?
Basically, it is to considerably and optimally improve the response times of the DBMS; in such a way that they can provide their users with the requested information in record time. It happens in some cases, where certain applications or others are usually so complex that, when consulting, the time to obtain an answer is quite long; In many cases, this answer is not usually the most "optimal", that is, the best possible.
There are certain optimizers based on costs, and these in turn, based on certain plans; that they will be those that, through the analysis, will be able to give as a result, the best paths to take to optimize the query; taking, of course, those plans with the lowest possible costs.
An important point is that users cannot directly access the optimizer; They first have to go through an analysis process and after this step, the user can already have access to the optimization
How do they work?
The most of the query optimization, is implemented by means of a tree of nodes, to represent them graphically. Each node present in that tree represents a plan and those plans, encapsulated in those nodes, are nothing more than simple operations.
It is possible that each node has other child nodes, with plans; but that in the same way they will operate the same plan as their parent node. In the case of the leaves of this "node tree", they represent the results of these operations carried out by all the nodes present.
As important data, in the database management systems; the nodes are JOINs, which allow combining records from tables (these can be several or just one) in a database. In fact, the word Jointranslated from English, it means "to unite."
Having said what is written in the previous paragraph, one of the important factors in the query optimization and what greater influence do they have; It is in the order in which the data tables are operated, that is, in the order in which the JOIN is made. Further optimization could be determined by the operation of the small tables instead of the large ones first; if done the other way around, the process could take much longer than anticipated.
Many optimizers make use of a certain algorithm, implemented by System R database project; which follows a series of analysis and search stages; that in the end, they will yield the best possible results. These results must be considered better than others, if they follow the same order; as this could further reduce response times.
What are Tuples?
A tuple is one of the most important in a database; since objects that contain the information of said data (in the case of the mathematical definition). Going to the computer field, it does not differ much from the previous definition, except that in this case, it corresponds to a row, of a specific table; The latter, therefore, are the ones that contain the saved data.
As in the area of mathematics, the data stored in these objects are disordered, since more than a list, they are a set of data; and there is no duplication and any kind of replication of a tuple, since mathematically, this would be impossible.
Optimization processes
La query optimization, follows a series of steps, or a process. In this case, we will simply name them and in the next section we will say some tips to keep in mind to improve performance.
The first step is the Internal representation of queries, which must have a series of characteristics (mentioned in the next section) and systems, to be able to represent the logical expressions. The second step, Conversion to canonical form, an equivalent expression (derived from the original) will be found here, which will become the canonical form of the query and improve the performance of the query.
The third process, Choice of low-level procedures, here you will find several points (such as indices and alternative paths) for the query. And finally, the Generation and choice of query plans
Some tips for query optimization
In this section, we will name the four main optimization processes, but most of all; some aspects to take into account, to improve response time, which is the main point of this post. Something very important to take into account is to have the presence of a plan (which we have already named before), because in this way, the process will greatly speed up.
Another relevant thing is the choice of a strategy, to be able to carry out the entire consultation process; This, in turn, will be divided into two, which will be: the selection of an algorithm, which is in charge of executing the operation; and select indices, well detailed and concrete, since it would avoid problems and delays.
As for the plan to be carried out, it consists of two phases or stages to take into account; closely related to optimization. The first stage consists of: generating logical expressions, which are related to the main expression; With "expressions", we refer to the specific actions given, for the search or obtaining of the data, therefore, the other expressions must be related to the main one then.
The second stage, given that based on the first, certain results will be obtained; In this new phase, these results must be recorded, which will be new logical expressions; that will serve as alternatives to be able to generate evaluation plans later, so it will be very important.
The aforementioned will considerably improve the response time of inquiries. However, you can keep the following points in mind, for better efficiency and effectiveness in querying data: be clear about a good starting point to start with the next stages; offer a certain degree of freedom, which is sufficient, so that further optimizations can be made to the query.
In the next video that we will leave you below, you will be able to learn more about the query optimization in a graphic way, which will help you better understand everything related to this aspect of computing. Since in writing it is quite difficult to try to explain it.