Dec
27

MySQL Infrastructure

This article provides an in-depth introduction to the logical architecture of MySQL and the execution process of SQL query statements within MySQL. First, the article explains in detail the roles of connectors, query caches, and analyzers in the execution of SQL query statements, as well as the difference between long and short joins. Then, it explains how MySQL determines the execution plan and starts executing the statement through the processing of the optimizer and executor stages. The executor executes the query based on the engine definition of the table, using the interfaces provided by the engine, and the logic of execution on indexed and non-indexed tables is presented. Finally, the logical architecture of MySQL and the stages of the SQL statement execution process are summarized to provide the reader with an initial impression. The entire article helps readers gain a deeper understanding of the inner workings of MySQL by disassembling its underlying architecture and explaining the execution process of SQL query statements.

This is the first article in the column. I want to talk to you about the basic architecture of MySQL. We often say that when you look at something, you should never get stuck in the details. You should first take a bird's eye view of the whole picture, which can help you understand the problem from a high dimension. The same is true for learning MySQL. When we use the database, we usually see it as a whole. For example, if you have a simple table with only one ID field, when you execute the following query statement: 

mysql> select * from T where ID=10;

We only see that a statement is entered and a result is returned, but we don't know the execution process of this statement in MySQL. So today I want to disassemble MySQL with you and see what "parts" are inside. I hope that through this disassembly process, you will have a deeper understanding of MySQL. In this way, when we encounter some anomalies or problems in MySQL, we can get to the essence and locate and solve the problem more quickly. Below I give a basic architecture diagram of MySQL, from which you can clearly see the execution process of SQL statements in various functional modules of MySQL. MySQL logical architecture diagram Generally speaking, MySQL can be divided into two parts: the server layer and the storage engine layer. The server layer includes connectors, query cache, analyzer, optimizer, executor, etc., covering most of the core service functions of MySQL, as well as all built-in functions (such as date, time, math and encryption functions, etc.). All cross-storage engine functions are implemented in this layer, such as stored procedures, triggers, views, etc. The storage engine layer is responsible for data storage and extraction. Its architecture mode is plug-in-based and supports multiple storage engines such as InnoDB, MyISAM, Memory, etc. The most commonly used storage engine now is InnoDB, which has become the default storage engine since MySQL version 5.5.5. That is to say, when you execute create table to create a table, if you do not specify the engine type, InnoDB is used by default. However, you can also select other engines by specifying the type of storage engine, such as using engine=memory in the create table statement to specify the use of the memory engine to create a table. Different storage engines have different ways of accessing table data and support different functions. In the following article, we will discuss the choice of engine. It is not difficult to see from the figure that different storage engines share a server layer, that is, the part from the connector to the executor. You can first have an impression of the name of each component. Next, I will take you through the entire execution process in conjunction with the SQL statement mentioned at the beginning, and look at the role of each component in turn. The first step of the connector is to connect to the database. At this time, the connector will receive you. The connector is responsible for establishing a connection with the client, obtaining permissions, maintaining and managing the connection. The connection command is generally written like this: 

mysql -h$ip -P$port -u$user -p

After entering the command, you need to enter the password in the interactive dialogue. Although the password can also be written directly after -p in the command line, this may cause your password to be leaked. If you are connecting to a production server, it is strongly recommended that you do not do this. The mysql in the connection command is a client tool used to establish a connection with the server. After completing the classic TCP handshake, the connector will start to authenticate your identity, and the username and password you entered are used at this time.

• If the username or password is incorrect, you will receive an "Access denied for user" error, and the client program will terminate execution.
• If the username and password authentication is successful, the connector will find out the permissions you have in the permission table. After that, the permission judgment logic in this connection will depend on the permissions read at this time. This means that after a user successfully establishes a connection, even if you modify the permissions of this user with an administrator account, it will not affect the permissions of the existing connection. After the modification is completed, only the newly created connection will use the new permission settings. After the connection is completed, if you do not take any subsequent actions, the connection is idle, and you can see it in the show processlist command. The figure in the text is the result of show processlist. The line with "Sleep" in the Command column indicates that there is an idle connection in the system now.
If the client is silent for too long, the connector will automatically disconnect it. This time is controlled by the parameter wait_timeout, and the default value is 8 hours. If the client sends a request again after the connection is disconnected, it will receive an error reminder: Lost connection to MySQL server during query. At this time, if you want to continue, you need to reconnect and then execute the request. In the database, a long connection means that after the connection is successful, if the client continues to have requests, the same connection will be used all the time. A short connection means that the connection is disconnected after a few queries are executed each time, and a new one is established for the next query. The process of establishing a connection is usually complicated, so I suggest that you try to reduce the number of connection establishment actions during use, that is, try to use long connections. However, after using long connections, you may find that sometimes the memory occupied by MySQL increases very quickly. This is because the memory temporarily used by MySQL during execution is managed in the connection object. These resources will be released only when the connection is disconnected. Therefore, if long connections accumulate, it may cause too much memory to be forcibly killed by the system (OOM). From the phenomenon, it means that MySQL restarts abnormally. How to solve this problem? You can consider the following two solutions.


Disconnect long connections regularly. After using for a period of time, or after the program determines that a large query that occupies memory has been executed, disconnect the connection, and then reconnect to query.

If you are using MySQL 5.7 or later, you can reinitialize the connection resources by executing mysql_reset_connection after each large operation. This process does not require reconnection and re-permission verification, but will restore the connection to the state when it was just created.

After the query cache connection is established, you can execute the select statement. The execution logic will come to the second step: query cache. When MySQL receives a query request, it will first check the query cache to see if the statement has been executed before. The previously executed statements and their results may be directly cached in the memory in the form of key-value pairs. The key is the query statement, and the value is the query result. If your query can find the key directly in this cache, then the value will be directly returned to the client. If the statement is not in the query cache, it will continue to the subsequent execution phase. After the execution is completed, the execution result will be stored in the query cache. You can see that if the query hits the cache, MySQL does not need to perform the subsequent complex operations and can directly return the result, which is very efficient. However, in most cases, I would recommend that you do not use the query cache. Why? Because the query cache often has more disadvantages than advantages. The query cache is invalidated very frequently. As long as there is an update to a table, all query caches on this table will be emptied. Therefore, it is very likely that you have worked hard to store the results, but before you use them, they are all emptied by an update. For databases with high update pressure, the query cache hit rate will be very low. Unless your business has a static table that is updated once in a long time. For example, a system configuration table, then the query on this table is suitable for query cache. Fortunately, MySQL also provides this "on-demand use" method. You can set the parameter query_cache_type to DEMAND, so that query cache is not used for default SQL statements. For statements that you are sure to use query cache, you can explicitly specify it with SQL_CACHE, like the following statement:mysql> select SQL_CACHE * from T where ID=10;
It should be noted that the entire function of query cache was directly deleted in MySQL 8.0, which means that this function is completely gone since 8.0. If the analyzer does not hit the query cache, it will start to actually execute the statement. First, MySQL needs to know what you want to do, so it needs to parse the SQL statement. The analyzer will first do "lexical analysis". You enter a SQL statement composed of multiple strings and spaces, and MySQL needs to identify what the strings are and what they represent. MySQL recognizes from the keyword "select" you enter that this is a query statement. It also recognizes the string "T" as "table name T" and the string "ID" as "column ID". After completing these identifications, you need to do "syntax analysis". Based on the results of lexical analysis, the syntax analyzer will determine whether the SQL statement you entered meets the MySQL syntax according to the syntax rules. If your statement is incorrect, you will receive an error reminder "You have an error in your SQL syntax", for example, the following statement select misses the initial letter "s". mysql> elect * from t where ID=1;

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'elect * from t where ID=1' at line 1


Generally, syntax errors will prompt the first error location, so you should pay attention to the content immediately after "use near". Optimizer After the analyzer, MySQL knows what you want to do. Before starting to execute, it must also be processed by the optimizer. The optimizer decides which index to use when there are multiple indexes in the table; or when a statement has multiple table associations (joins), it determines the connection order of each table. For example, if you execute the following statement, this statement is to execute the join of two tables:mysql> select * from t1 join t2 using(ID) where t1.c=10 and t2.d=20;

• You can first extract the ID value of the record with c=10 from table t1, then associate it with table t2 based on the ID value, and then determine whether the value of d in t2 is equal to 20.
• You can also first extract the ID value of the record with d=20 from table t2, then associate it with t1 based on the ID value, and then determine whether the value of c in t1 is equal to 10. The logical results of these two execution methods are the same, but the execution efficiency will be different, and the role of the optimizer is to decide which plan to use. After the optimizer stage is completed, the execution plan of this statement is determined, and then enters the executor stage. If you still have some questions, such as how the optimizer selects the index, is it possible to choose the wrong one, etc., it doesn’t matter, I will explain the content of the optimizer separately in the following article. MySQL knows what you want to do through the analyzer and how to do it through the optimizer, so it enters the executor stage and starts executing the statement. When starting to execute, it must first determine whether you have the permission to execute the query on this table T. If not, an error of no permission will be returned, as shown below (in engineering implementation, if the query cache is hit, the permission will be verified when the query cache returns the result. The query will also call precheck to verify the permission before the optimizer). mysql> select * from T where ID=10;

ERROR 1142 (42000): SELECT command denied to user 'b'@'localhost' for table 'T'
If you have permission, open the table and continue to execute. When opening the table, the executor will use the interface provided by the engine according to the engine definition of the table. For example, in the table T in our example, the ID field has no index, so the execution process of the executor is as follows:

Call the InnoDB engine interface to get the first row of this table, and determine whether the ID value is 10. If not, skip it. If it is, save this row in the result set;

Call the engine interface to get the "next row", and repeat the same judgment logic until the last row of this table is obtained.

The executor returns the record set consisting of all rows that meet the conditions in the above traversal process to the client as a result set.

At this point, the statement is executed. For tables with indexes, the execution logic is similar. The first call is the "get the first row that meets the conditions" interface, and then the "next row that meets the conditions" interface is looped. These interfaces are already defined in the engine. You will see a rows_examined field in the slow query log of the database, indicating how many rows were scanned during the execution of this statement. This value is accumulated every time the executor calls the engine to obtain data rows. In some scenarios, the executor is called once, and multiple rows are scanned inside the engine, so the number of rows scanned by the engine is not exactly the same as rows_examined. We will have a special article later to talk about the internal mechanism of the storage engine, which will have detailed explanations. Summary Today I introduced you to the logical architecture of MySQL. I hope you have a preliminary impression of the various stages of the complete execution process of a SQL statement. Due to space limitations, I just went through each link with an example of a query. If you still have questions about the details of each link, don’t worry, I will mention them again in the practical chapter later. I will leave you a question. If there is no field k in table T, and you execute this statement select * from T where k=1, then it will definitely report an error of "this column does not exist": "Unknown column 'k' in 'where clause'". At which stage do you think this error was reported? Thank you for listening, you are welcome to leave me a message, and you are also welcome to share it with more friends to read together

Contact

Missing something?

Feel free to request missing tools or give some feedback using our contact form.

Contact Us