What is Baked in the Baker's Dozen?

What is Baked in the Baker's Dozen?
58 min read
30 April 2020

On April 8, PostgreSQL feature freeze took place, so only features committed earlier will get into version PostgreSQL 13. Probably, this version can hardly be considered revolutionary, since it has no conceptual changes. Some of critical patches were late to get into it, such as Table and Functions for the JSON/SQL standard, which had been desirable to be part of PostgreSQL 12, along with the JSONPath patch; plug-in warehouses did not appear either — only the interface is being finalized. The list of improvements is still impressive. We prepared a pretty complete overview of the patches of the Baker's Dozen.

Changes to SQL commands


The initdb and createdb utilities, as well as the CREATE COLLATION command, have the LOCALE parameter, which enables specifying the values of LC_CTYPE and LC_COLLATE at the start. Now the CREATE DATABASE command has it as well:

CREATE DATABASE db_koi8r TEMPLATE template0 


The ALTER VIEW command now enables changing the column name in a view. Earlier, we had to recreate the view to do this.

Suppose we forgot to give a name to a column:

CREATE VIEW uptime AS SELECT current_timestamp, current_timestamp - pg_postmaster_start_time();
SELECT * FROM uptime;
       current_timestamp       |    ?column?     
 2020-03-23 15:37:00.088824+03 | 04:18:24.897856

We can correct this:
ALTER VIEW uptime RENAME COLUMN "?column?" TO uptime; SELECT * FROM uptime;

       current_timestamp       |     uptime      
 2020-03-23 15:37:40.726516+03 | 04:19:05.535548


Now we can turn a generated column of a table into a usual one, that is, remove the expression:

CREATE TABLE payments ( id integer PRIMARY KEY, amount numeric(18,2), income_tax numeric(18,2) GENERATED ALWAYS AS (amount*0.13) STORED ); INSERT INTO payments(id, amount) VALUES(1, 42); \d payments

                                     Table "public.payments"
   Column   |     Type      | Collation | Nullable |                   Default            
 id         | integer       |           | not null |
 amount     | numeric(18,2) |           |          |
 income_tax | numeric(18,2) |           |          | generated always as ((amount * 0.13))
    "payments_pkey" PRIMARY KEY, btree (id)

Suppose we changed our mind, and now we prefer to define income_tax explicitly. So we just remove the expression:
ALTER TABLE payments ALTER COLUMN income_tax DROP EXPRESSION; \d payments

                   Table "public.payments"
   Column   |     Type      | Collation | Nullable | Default
 id         | integer       |           | not null |
 amount     | numeric(18,2) |           |          |
 income_tax | numeric(18,2) |           |          |
    "payments_pkey" PRIMARY KEY, btree (id)

Certainly, the existing data in the column are still there:
SELECT * FROM payments;

 id | amount | income_tax
  1 |  42.00 |       5.46


If we need to delete a database before all the users are disconnected, we can benefit from the new FORCE option of the DROP DATABASE command.

We connect to the new database:

SELECT dblink_connect('dbname=db');


And now we delete it forcing open connections to terminate, same way as pg_terminate_backend does:


The ALTER TYPE command enables us to change various properties of base data types, including the storage strategy. Earlier, it could be specified only in the CREATE TYPE command.

Rather than create a new base type for demo purposes, we will use the existing tsquery type. But first we will create a separate database and connect to it:


For the tsquery data type, the plain storage strategy is used, so columns in tables of this type get the same strategy:
SELECT typname,typstorage FROM pg_type WHERE typname = 'tsquery';

 typname | typstorage
 tsquery | p
CREATE TABLE queries (query tsquery);
SELECT attname, attstorage FROM pg_attribute WHERE attrelid = 'queries'::regclass AND attname = 'query';
 attname | attstorage
 query   | p

If we need to use another strategy for new tables, we can change the base type:
ALTER TYPE tsquery SET (storage=external);

SELECT typname,typstorage FROM pg_type WHERE typname = 'tsquery';

 typname | typstorage
 tsquery | e

The storage type in the new tables will also change:

CREATE TABLE new_queries (query tsquery);
SELECT attname, attstorage FROM pg_attribute WHERE attrelid = 'new_queries'::regclass AND attname = 'query';
 attname | attstorage
 query   | e

Please keep in mind that we cannot change the strategy intended for use of TOAST back to plain:
ALTER TYPE tsquery SET (storage=plain);

ERROR:  cannot change type's storage to PLAIN

That's why we conducted the experiments in a separate database, which we won't mind to delete.


The CREATE STATISTICS command can gather lists of most common values for selected combinations of table columns. The number of most common values gathered is defined by the default_statistics_target parameter. The value for specific statistics can now be changed by the command:



As we know, in the SELECT command, the syntax defined in the SQL standard can be used instead of the LIMIT option:
SELECT * FROM (VALUES (1,'1.1'), (2,'2.1'), (2,'2.2'), (3,'3.1')) t(a,b) ORDER BY a FETCH FIRST 2 ROWS ONLY;

 a |  b  
 1 | 1.1
 2 | 2.1
(2 rows)

Now FETCH also supports WITH TIES option, which adds to the output all «familial» rows (rows equal to already selected if only sort condition is taken into account):
SELECT * FROM (VALUES (1,'1.1'), (2,'2.1'), (2,'2.2'), (3,'3.1')) t(a,b) ORDER BY a FETCH FIRST 2 ROWS WITH TIES;

 a |  b  
 1 | 1.1
 2 | 2.1
 2 | 2.2
(3 rows)

Built-in functions and data types


The new get_random_uuid function returns version 4 UUID (random value):
SELECT gen_random_uuid();


The function is useful for generation of unique values of the UUID type in distributed systems.
Formerly, we had to use the uuid-ossp or pgcrypto library.

min_scale and trim_scale for values of type numeric

The min_scale function determines how many significant figures are contained in the fractional part a number, and the trim_scale function removes zeros:
SELECT n, min_scale(n), trim_scale(n) FROM (VALUES(42.42000)) as t(n);

    n     | min_scale | trim_scale
 42.42000 |         2 |      42.42

gcd and lcm

This is an addition to a set of mathematical functions. Now we can quickly find the greatest common divisor (gcm) and least common multiple (lcm):
SELECT gcd(54,24), lcm(54,24);

 gcd | lcm
   6 | 216

Aggregate functions min and max for the pg_lsn data type

For the pg_lsn data type, min and max aggregate functions are added, which enables execution of queries such as:
SELECT min(restart_lsn) FROM pg_replication_slots; SELECT min(sent_lsn) FROM pg_stat_replication;

Check of the type modifier of the returned function value

In previous versions, the type modifier was not checked for the function value returned.
Suppose that a type is available to store currency units and a function that returns the amount of income taxes:

CREATE TYPE currency AS (
    amount numeric(18,2),
    code   text
CREATE FUNCTION income_tax(quantity currency) RETURNS currency
    AS 'SELECT quantity.amount * 0.13, quantity.code' LANGUAGE SQL;

When calling the function, we expect two decimal places, but we get four. Even the explicit type cast after the function call does not help (the third column):

SELECT amount, code, amount::numeric(18,2)
FROM income_tax(ROW(42.42, 'RUR'))\gx
-[ RECORD 1 ]--
amount | 5.5146
code   | RUR
amount | 5.5146

In version 13, the result is correct:

SELECT amount, code, amount::numeric(18,2)
FROM income_tax(ROW(42.42, 'RUR'))\gx
-[ RECORD 1 ]
amount | 5.51
code   | RUR
amount | 5.51

Localized names in to_date() and to_timestamp()

The to_date and to_timestamp functions learned to recognize localized names of months and of week days. Formerly, only English names could be used:
SELECT to_date('Вторник, 24 Марта 2020', 'TMDay, DD TMMonth YYYY');


normalize and IS NORMALIZED

For compliance with the SQL standard, the normalize() function was added to normalize a Unicode string and the IS NORMALIZED predicate to check whether the string is normalized.

SELECT str, normalize(str, NFC) AS nfc,
       str IS NFC NORMALIZED AS is_nfc_normalized,
       normalize(str, NFKC) AS nfkc,
       str IS NFKC NORMALIZED AS is_nfkc_normalized
FROM (VALUES(U&'\0032\2075')) AS vals(str)\gx
-[ RECORD 1 ]------+---
str                | 2⁵
nfc                | 2⁵
is_nfc_normalized  | t
nfkc               | 25
is_nfkc_normalized | f

More details on Unicode normalization forms.

xid8 type and xid8_current() function for 64-bit transaction IDs

The new xid8 data type was added for a 64-bit transaction ID. But this does not mean that PostgreSQL migrated to 64-bit transactions: everything works exactly as before. But certain functions return the new type; for example: it is recommended to use pg_current_xact_id rather than the old txid_current function, which returned int8, and so forth.

New polymorphic data types of anycompatible family

anycompatibleanycompatiblearrayanycompatiblenonarray and anycompatiblerange types are added. Unlike types of the anyelement family, the new types enable using not exactly the same, but actually compatible types.

In the following example, the maximum function takes integer and numeric arguments defined as anycompatible. The value returned is cast to the type common for the two:

CREATE FUNCTION maximum(a anycompatible, b anycompatible) RETURNS anycompatible

SELECT f, pg_typeof(f) FROM maximum(42, 42.42) f;
   f   | pg_typeof
 42.42 | numeric

Moreover, anycompatible- and any- are two independent type sets:

CREATE FUNCTION first_elems(a anyarray, b anycompatiblearray)
    RETURNS TABLE(a anyelement, b anycompatible) AS $$
        SELECT ROW(a[array_lower(a,1)], b[array_lower(b,1)]);
    $$ LANGUAGE sql;
SELECT first_elems(ARRAY[1,2,3], ARRAY['a','b','c']) AS str, 
       first_elems(ARRAY[1.1,2.2,3.3],ARRAY[true,false]) bool;
  str  |  bool   
 (1,a) | (1.1,t)

Procedural languages

Transform for bool type to PL/Perl

A while ago, TRANSFORM by Ivan Panchenko (Deputy CEO of Postgres Professional) was committed — bool_plperl. Postgres passes Boolean values as t or f to PL/Perl, but for Perl, t and f are not the true and false values of the Boolean variable, but just the letters «t» and «f». There are a few ways to resolve this issue (see email), but to create TRANSFORM for bool is the most efficient.

Faster execution of simple expression in PL/pgSQL

Simple expressions (at least not containing access to tables and not requiring locking) will be executed faster. Formerly, in such cases time was inefficiently wasted on invoking the planner in each loop.

CREATE OR REPLACE FUNCTION slow_pi() RETURNS double precision AS $$
        a double precision := 1;
        s double precision := 1;
        r double precision := 0;
        FOR i IN 1 .. 10000000 LOOP
            r := r + s/a; a := a + 2; s := -s;
        END LOOP;
        RETURN 4*r;
    $$ LANGUAGE plpgsql;

Calling slow_pi() in PostgreSQL 12:
SELECT slow_pi();

    (1 row)
    Time: 13060,650 ms (00:13,061)

Doing the same in PostgreSQL 13:

SELECT slow_pi();
    (1 row)
    Time: 2108,464 ms (00:02,108)

Trustable extensions instead of pg_pltemplate

The system catalog now contains one table less. pg_pltemplate is removed. It stored properties of procedural languages that were needed to execute CREATE LANGUAGE. But it was decided to move the properties from pg_pltemplate into scripts of appropriate language extensions and get rid of the table itself. But to implement the idea, the database owner (non-superuser) must be able to create the trusted language from an extension script. This is because currently to create plpgsql for instance, the database owner does not need to be superuser.

So, the following was done. A new Boolean parameter, trusted, was added to the extensions' control file. If both trusted and superuser parameters are turned on, not only superuser can create the extension, but also a user with CREATE privilege for the current database (including its owner). When the script of such an extension is run, privileges of the superuser who initialized the cluster will be used. This superuser will also own the objects created by the extension, while the extension itself will be owned by the user who created it.

The following are important implications of these changes:
Trustable extensions open the floodgates for third-party developers to creation of other trusted languages. It was hardcoded in pg_pltemplate that plpython pertains to the second version of the language. Dropping pg_pltemplate is a step (necessary, although not sufficient) towards transitioning to python3.


B-tree deduplication

A critical and long awaited patch (the work on it started as early as in Y2015) created by Anastasia Lubennikova (Postgres Professional) and Peter Geoghegan was finally committed by Peter. Anastasia already reported on this at PGconf India. Postgres learned to considerably reduce sizes of B-tree indexes by means of deduplication, that is, by skimping on repeating index keys. These indexes were deeply reworked to enable compression without losing backward compatibility of indexes. The idea of deduplication was borrowed from a more flexible architecture of GIN indexes.

In these indexes, situations where a lot of rows match a key occur more frequently than in B-tree. For example: in the case of text processing, one lexeme usually occurs in several documents. And it is stored in the index only once. B-tree indexes could not ensure this until recently.

B-tree indexes are different from GIN mainly in leaf pages. Depending on the number of rows related to the same value of a key, there are the following possibilities: a page contains only the «posting list», that is, the list of TIDs (tuple IDs of rows being indexed) if the list is short, but if there are a lot of TIDs, instead of value lists, new «branches of trees» are stored — references to other pages of the «posting list» type or other branches of the tree (they are called «posting trees»).

Such a tree-like structure is similar to B-tree, but differs in significant details: for example, lists for moving across pages of the same tree level are unidirectional in GIN rather than bidirectional. Therefore (among the rest), it is uneasy to ensure good backward compatibility of new deduplicated indexes. And enhancements really took more than 3 years. It was also needed to perfect vacuum techniques (microvacuum) and other details.

In performance tests, all the indexes to which deduplication was applicable were compressed by approximately 3 times. Compression of duplicates also helps unique indexes by eliminating the issue of index bloat up when a table changes at a high rate of INSERT/DELETE. This behavior can be turned on/off manually at index setup.

Avoid full GIN index scan when possible

Now we can avoid a walk through a whole GIN index in certain cases. Some operations, although supported by GIN index, are actually performed by a full index scan. For example: let's consider an index for full-text search by the tsvector column. If the search query looks like «anything but a given word», we will have to read the whole index. If the same query also includes another condition that does not require a full index scan, the whole index will be scanned regardless.

With the new optimization, first a more restrictive condition will be used and then the results will be rechecked so as to also take into account the other limitation. Compare the number of pages that were read in version 12 (Buffers):

SELECT * FROM mail_messages WHERE tsv @@ to_tsquery('!vacuum') AND tsv @@ to_tsquery('analyze');
                                             QUERY PLAN                                             
Bitmap Heap Scan on mail_messages (actual rows=5864 loops=1)
   Recheck Cond: ((tsv @@ to_tsquery('!vacuum'::text)) AND (tsv @@ to_tsquery('analyze'::text)))
   Heap Blocks: exact=5167
   Buffers: shared hit=24 read=27405
   ->  Bitmap Index Scan on mail_messages_tsv_idx (actual rows=5864 loops=1)
         Index Cond: ((tsv @@ to_tsquery('!vacuum'::text)) AND (tsv @@ to_tsquery('analyze'::text)))
         Buffers: shared hit=24 read=22238
 Planning Time: 0.283 ms
 Execution Time: 3258.234 ms

with the number of buffers in the new version:

SELECT * FROM mail_messages WHERE tsv @@ to_tsquery('!vacuum') AND tsv @@ to_tsquery('analyze');
                                             QUERY PLAN                                             
Bitmap Heap Scan on mail_messages (actual rows=5864 loops=1)
   Recheck Cond: ((tsv @@ to_tsquery('!vacuum'::text)) AND (tsv @@ to_tsquery('analyze'::text)))
   Heap Blocks: exact=5156
   Buffers: shared hit=5179
   ->  Bitmap Index Scan on mail_messages_tsv_idx (actual rows=5864 loops=1)
         Index Cond: ((tsv @@ to_tsquery('!vacuum'::text)) AND (tsv @@ to_tsquery('analyze'::text)))
         Buffers: shared hit=23
 Planning Time: 0.250 ms
 Execution Time: 8.779 ms

We can come across a similar situation when using trigrams and when checking for being array elements.
Opclass parameters
Many index access methods in PostgreSQL are actually a «framework» that takes over a high-level implementation of a search algorithm, work with pages and locks, as well as with WAL. And it's operator classes that associate the method with specific data types and operators.

Operators could not have parameters as yet. For example, GiST index with the tsvector_ops operator class can be used for full-text search (refer here for GiST operator classes). This operator class uses a signature tree, and the signature must be fixed-length (124 bytes). Now we can explicitly specify the length, which enables us to control the balance between the index size and efficiency (the number of hash collisions):
CREATE INDEX ON mail_messages USING gist(tsv tsvector_ops(siglen=1024));

For a start, similar changes were also made to other GiST operator classes that use a signature tree; this involves the hstoreintarray and pg_trgm extensions.
But the main idea behind proposing this change is a possibility to pass a JSONPath expression to a GIN index in order to index only the necessary part of the JSON document rather than the entire document. In many cases this will enable us to drastically reduce the index sizes. But this work is yet to be done.

This Oleg Bartunov's idea is implemented by Nikita Glukhov and Alexander Korotkov (all from Postgres Professional).

(box, point) operator is added

A missing operation to be used in kNN for GiST and SP-GiST was added. In PostgreSQL 12, when working with geometric types point and box, a distance operator (point, box) can be used, which will speed up search with GiST and SP-GiST indexes. But the reverse operator (box, point) was not implemented although box already recognized distances to more complex types — polygons and circles.

CREATE TABLE points(point point);
CREATE TABLE boxes(box box);
INSERT INTO points VALUES('1,2','3,4');
INSERT INTO boxes VALUES(box(point(3,4),point(5,6)), box(point(13,14),point(15,16)));

In PostgreSQL 12:

SELECT p.point, b.box, b.box   p.point distance FROM points AS p, boxes AS b ORDER BY distance;
ERROR:  operator does not exist: box  point

Inversely, everything is OK:

SELECT p.point, b.box, p.point   b.box distance FROM points AS p, boxes AS b ORDER BY distance;
 point |       box       |      distance
 (1,2) | (5,6),(3,4)     | 2.8284271247461903
 (2,1) | (5,6),(3,4)     | 3.1622776601683795
 (1,2) | (15,16),(13,14) | 16.970562748477143
 (2,1) | (15,16),(13,14) | 17.029386365926403

And in PostgreSQL 13:

SELECT p.point, b.box, b.box   p.point distance FROM points AS p, boxes AS b ORDER BY distance;
 point |       box       |      distance
 (1,2) | (5,6),(3,4)     | 2.8284271247461903
 (2,1) | (5,6),(3,4)     | 3.1622776601683795
 (1,2) | (15,16),(13,14) | 16.970562748477143
 (2,1) | (15,16),(13,14) | 17.029386365926403

GiST and SP-GiST indexes will accelerate on this operation too.
Note what this query returns in PostgreSQL 13:
SELECT count(*) FROM pg_operator WHERE oprname = '';


But if we run the same query in PostgreSQL 12, we will get 20 rows: as many as 8 operators were added to the list in version 13.


Support for jsonpath .datetime() method
It is one of non-qualified patches from a large series of JSONPath patches, which was late to be ready by the PostgreSQL 12 release. It is part of JSON/SQL standard. The issue is that all functions in the series of JSONPath patches are immutable, but comparison of dates takes into account the time zone, which can vary during a session.

In such cases, we allow existing immutable functions to throw an error on non-immutable comparisons. At the same time, this patch contains functions named with the _tz suffix, which work fine with timezone.

jsonb_set_lax function

In general, «lax» is a non-strict (unlike «strict») mode of functions manipulating jsonb. In this case, the function will be operational when one of the arguments that it takes equals NULL. Unlike the strict version — jsonb_set(), it has an additional argument that indicates actions in the case of NULL. The options are: use_json_null / raise_exception / return_target / delete_key. These options were proposed by concerned users.

Optimization of some jsonb functions

A great deal was optimized, mainly by Nikita Glukhov (Postgres Professional). It does not make sense to discuss each item here: first, the abundance of them would swell the article that is pretty large as it is; second, the changes deal with the internal structure, which is interesting not to everyone. So, let's only list most of the changes:

  1. JsonbExtractScalar() function is optimized.
  2. #>> operator, as well as jsonb_each_text() and jsonb_array_elements_text() functions are optimized.
  3. Recognition of the JsonbContainer type in get_jsonb_path_all() is optimized.
  4. Fetch of the first token from the JsonbIterator iterator is replaced with a light-weight JsonbContainerIsXxx() macro.
  5. Key retrieval is more convenient now — findJsonbKeyInObject.
  6. Storage of the result of findJsonbValueFromContainer() and getIthJsonbValueFromContainer() is optimized.
  7. The get_jsonb_path_all() function is optimized, it no longer uses an iterator.
    Code of JsonbValueAsText is rewritten.

As they say, information on patches of SQL/JSON: JSON_TABLE and SQL/JSON: functions could be here. We expected them and hoped to see them committed. Unfortunately this did not happen. The patches are large and involve a lot of files. We believe they will get into PostgreSQL 14. And JSONPath is left on its own so far.

Utilities and extensions


The benchmarking utility got a number of improvements. Execution statistics for tasks at the initialization phase, a more visualized output, a possibility to look up the code of the built-in scriptsand testing on a partitioned table of accounts were added.

Besides, the \aset command was added, which is similar to \gset, but enables setting the values for variables of several queries sent in one go. The following string sent to the server for execution will set the values of both variables one and two:

SELECT 1 AS one \; SELECT 2 AS two \aset


pg_dump learned to dump the data of foreign tables. The --include-foreign-data parameter enables specifing a list of foreign servers whose table data will be dumped.

Such dumping should be used cautiously. Data do not necessarily need to be loaded to a foreign server. In addition, a foreign server is not unlikely to be unavailable during recovery. Or a foreign server can allow only reading, but not writing.

A series of small patches adds more convenience to working in psql:

  • Autocomplete is improved for several commands.
  • In addition to \echo, which sends a string to STDOUT, the new \warn command sends a string to the standard error output (STDERR).
  • The \d command for a TOAST table provides information on the index and table. And for an index on a partitioned table, we can use the \d+ command to get a list of index partitions with table spaces.
  • In the \dt+ command, the new «Persistence» column shows whether tables are unlogged or permanent.
  • We can output a query edited by the \e command to the screen if we do not put ; (or the \g* command) at the end of the query. Now, on return from the editor to the psql window, we cannot see the text of the command prior to its execution.
  • The default prompt will include the information on the status of the current transaction. To this end, a special character %x is added to the PROMPT1 and PROMPT2 variables.
  • New commands appeared for getting additional information on access methods: \dAc\dAf\dAo\dAp.
  • In \g, now any options supported by \pset can be specified in parentheses. They will affect only the current transaction.


Some changes relate to connecting to PostgreSQL:

  • Inaccuracy is fixed in the descriptions of the host and hostadr parameters and the related inconsistency in the output of the \conninfo command of the psql utility.
  • If the key of the client's certificate is stored in an encrypted form, you can enter the password only interactively. The new sslpassword parameter will allow non-interactive key decryption.
  • Two new parameters sslminprotocolversion and sslmaxprotocolversion enable specifying limitations of the version of the SSL/TCL protocol for which connection is permitted.


The new --jobs parameter of the reindexdb utility specifies the number of connections to the database for which indexes will be rebuilt simultaneously.


The limitations of the utility are gradually removed and the capabilities increased.
First, pg_rewind can now write information for recovery (as pg_basebackup can do it) and also start the recovery and subsequent shutdown of the instance unless it was shut down via a checkpoint (earlier, this could be done only manually).

Second, pg_rewind learned to work with WAL archive.
When the utility finds the point of divergence of WAL between two servers, it must make up a list of all pages that need to be copied to the target server in order to eliminate the differences. To do this, the utility needs all WAL files starting at the point found. Earlier, if the necessary WAL files were unavailable on the target cluster, the utility could not do the job.

With this patch by Alexey Kondratov (Postgres Professional), pg_rewind will be able to read missing WAL segments from the archive of WAL files using the restore_command parameter if the new key -c or --restore-target-wal is provided.


pg_waldump will extracts information about the prepared transaction from the record.

The amcheck extension learned to better detect corruption in B-tree indexes.
By the way, now error messages in the server log about corrupted pages will differ for indexes and tables.


The heap_tuple_infomask_flags function of the pageinspect extension extracts information from the values of fields infomask and infomask2 returned by the heap_page_items function. This is useful for investigating situations related to data corruption.


Superuser can allow regular users to connect without the password at the level of user names maps:

ALTER USER MAPPING FOR non-superuser SERVER server
    OPTIONS (ADD password_required 'false');

Among the rest, this is done to enable using sslkey and sslcert as connection parameters.


The adminpack extension contains a new function — pg_file_sync. It can be used to do fsync for files that the server writes to disk, for instance, by means of pg_file_write or COPY TO.



The shared memory of a server contains not only a large buffer cache, but also a number of other, simpler, caches (for example: for transaction statuses). They use a simple algorithm of evicting least frequently used pages (simple least-recently-used, or SLRU). These caches «just worked» so far, but now the time is ripe for monitoring them; first of all the developers of PostgreSQL core need this to figure out whether anything needs to be changed in the caches. To this end, a new, pg_stat_slru, view was added.

A new columnleader_id, was added to the pg_stat_activity view. For processes involved in execution of parallel queries, it is filled with the pid of the leader process. And for the leader process, leader_id equals pid.
The following query shows what processes execute which queries in the parallel mode:

SELECT query, leader_pid, 
    array_agg(pid) filter(WHERE leader_pid != pid) AS members
  FROM pg_stat_activity
 WHERE leader_pid IS NOT NULL
 GROUP BY query, leader_pid;

The list of wait events was changed. Two new events were added: BackupWaitWalArchive and RecoveryPause. And two more were given more adequate names: RecoveryWalStream -> RecoveryRetrieveRetryInterval and RecoveryWalAll -> RecoveryWalStream.

There are also two new events that occur on a replicaRecoveryConflictSnapshot (a conflict with VACUUM that deleted needed tuples) and RecoveryConflictTablespace (a conflict related to deletion of a table space).


The pg_stat_statements extension regarded queries with the FOR UPDATE clause and without it as the same query so far. Now queries with FOR UPDATE are treated differently.

The amount of information gathered also increased. From now on, not only information on resources for execution of commands is tracked, but also statistics on generated WAL records. New columns of the view are: wal_bytes — the amount of WAL records generated, wal_records — the number of WAL records generated, wal_num_fpw — the number of full page writes.

This was achieved thanks to the developed infrastructure to track WAL usage. So, now EXPLAIN with the new WAL option will also show the number of WAL records generated:

CREATE TABLE t (id int);
              QUERY PLAN              
 Insert on t (actual rows=0 loops=1)
   WAL:  records=1  bytes=59
   ->  Result (actual rows=1 loops=1)

The auto_explain extension, VACUUM with VERBOSE and autovacuum also use the infrastructure developed and will output WAL amounts.

Back to pg_stat_statements. If the new pg_stat_statements.track_planning parameter is turned on, additional, planner, statistics will be written: the number of plan creations, the total planning time and the maximum time of a single planning, as well as the average and standard deviation.

One more patch, unrelated to pg_stat_statements, deals with tracking the resources allocated to the planner. EXPLAIN with the BUFFERS option will report the number of buffers used during planning:

                   QUERY PLAN                   
 Seq Scan on pg_class (actual rows=386 loops=1)
   Buffers: shared hit=9 read=4
 Planning Time: 0.782 ms
   Buffers: shared hit=103 read=11
 Execution Time: 0.219 ms


  • The log_statement_sample_rate parameter now defines the share of SQL commands being logged if the duration of their execution is greater than the value of log_min_duration_sample(new parameter).
    Note that the value of the log_min_duration_statement parameter takes precedence, i. e., if log_min_duration_statement is less than log_min_duration_sample, all the commands will be logged regardless of the share defined inlog_statement_sample_rate.
    In addition, the new log_transaction_sample_rate parameter allows us to define the share of transactions for which all the commands will be logged regardless of their execution time.
  • The values of the bind variables can now be logged along with commands that caused an error condition (which is defined by the log_min_error_statement parameter). The maximum size of values logged is defined by the new log_parameter_max_length_on_error parameter. By default, it equals zero, which turns the output off.
    Turning log_parameter_max_length_on_error on entails additional overhead costs for execution of all SQL commands rather than only erroneous ones, but provides for more accurate error diagnostics.
    For non-erroneous statements (which are logged as defined by the log_statements and log_duration parameters), a parameter was also added to limit the size: log_parameter_max_length. By default, entire values are logged, as before.
  • The process type can be logged now in the server log (pg_stat_activity.backend_type). To this end, the %b special character can be provided in the log_line_prefix parameter. And if the log has the csv format (log_destination=csvlog), the backend_type column is already there.

Progress report

New views pg_stat_progress_analyze and pg_stat_progress_basebackup allow us to track execution of gathering statistics by the ANALYZE command and of a backup by the pg_basebackup utility, respectively.


Evaluation of immutable functions in the FROM clause during planning
The patch by Alexander Kuzmenkov and Alexandr Parfenov (both from Postgres Professional) helps when the FROM clause contains a call to a function that is actually a constant. In this case, instead of doing the join, the value of the constant is substituted into the query where appropriate.

This is how it happens, for instance, in a full-text search query:

SELECT subject, ts_rank_cd(tsv, q) AS rank
FROM mail_messages, to_tsquery('english', 'tuple') q
WHERE tsv @@ q
                              QUERY PLAN                              
   Sort Key: (ts_rank_cd(mail_messages.tsv, '''tuple'''::tsquery)) DESC
   ->  Bitmap Heap Scan on mail_messages
         Recheck Cond: (tsv @@ '''tuple'''::tsquery)
         ->  Bitmap Index Scan on mail_messages_tsv_idx
               Index Cond: (tsv @@ '''tuple'''::tsquery)

There is no join here, and the value of 'tuple'::tsquery is substituted into the query as early as during planning. In version 12, the picture was quite different:

SELECT subject, ts_rank_cd(tsv, q) AS rank                            
FROM mail_messages, to_tsquery('english', 'tuple') q
WHERE tsv @@ q                            
                          QUERY PLAN                         
   Sort Key: (ts_rank_cd(mail_messages.tsv, q.q)) DESC
   ->  Nested Loop
         ->  Function Scan on q
         ->  Bitmap Heap Scan on mail_messages
               Recheck Cond: (tsv @@ q.q)
               ->  Bitmap Index Scan on mail_messages_tsv_idx
                     Index Cond: (tsv @@ q.q)

Incremental sort

In cases where a sort by many keys (k1, k2, k3…) is required, the planner can now use the awareness that the data is already sorted by a few of the first keys (say, k1 and k2). In this case, we can avoid re-sort of all the data from scratch, but divide them into sequential groups with the same values of k1 and k2 and only do additional sort by k3.

This way, the entire sort is decomposed into several sequential smaller-size sorts. This reduces the size of the memory required and also allows the first data to be output earlier than the entire sort is complete.

For example, in the demo database the tickets table has an index on the ticket_id column. The data obtained from the index will be definitely sorted by ticket_id, therefore, the following query will do an incremental sort:

EXPLAIN (analyze, costs off, timing off)
SELECT * FROM tickets ORDER BY ticket_no, passenger_id;
                                  QUERY PLAN                                  
 Incremental Sort (actual rows=2949857 loops=1)
   Sort Key: ticket_no, passenger_id
   Presorted Key: ticket_no
   Full-sort Groups: 92184 Sort Method: quicksort Memory: avg=31kB peak=31kB
   ->  Index Scan using tickets_pkey on tickets (actual rows=2949857 loops=1)
 Planning Time: 2.137 ms
 Execution Time: 2230.019 ms

The incremental sort functionality can be turned off by the enable_incrementalsort parameter. In such a case, the sort will take noticeably more time:

SET enable_incrementalsort = off;
EXPLAIN (analyze, costs off, timing off)
SELECT * FROM tickets ORDER BY ticket_no, passenger_id;
                              QUERY PLAN                               
 Gather Merge (actual rows=2949857 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort (actual rows=983286 loops=3)
         Sort Key: ticket_no, passenger_id
         Sort Method: external merge  Disk: 107024kB
         Worker 0:  Sort Method: external merge  Disk: 116744kB
         Worker 1:  Sort Method: external merge  Disk: 107200kB
         ->  Parallel Seq Scan on tickets (actual rows=983286 loops=3)
 Planning Time: 3.994 ms
 Execution Time: 12291.844 ms

Alexander Korotkov (Postgres Professional) suggested the idea of incremental sort as early as in Y2013, and then seven years later, James Coleman brought the patch to the state committed by the community.

Speedup truncates of relation forks

When TRUNCATE is performed, shared buffers are scanned to remove buffers of the table from the shared memory. Formerly, scan was done three times, for each fork of the table: MAIN (the main, data, fork), FSM (the free space map), VM (the visibility map). Now the logic is changed: the buffers are scanned only once rather than three times. For large values of shared_buffers, this yields considerable gains.

Optimized partial TOAST decompression

When there is no need to read the entire TOAST, but only a slice at the beginning or not far from the beginning is sufficient, it makes no sense to decompress TOAST entirely. Compressed TOAST is read in iterations: we read a slice and if there is no data needed, we decompress more and read further. This is suggested by the Google Summer of Code student Binguo Bao, who provides an example:

CREATE TABLE slicingtest (
     id serial PRIMARY KEY,
     a text
INSERT INTO slicingtest (a) SELECT
    repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000) AS a FROM
SELECT sum(length(substr(a, 0, 20))) FROM slicingtest;
Time: 28.123 ms

With the patch, this is faster by an order of magnitude:

Time: 2.306 ms

Parallel VACUUM

In his related article, Egor Rogov explains this critical step to parallelization in detail. In short: «Masahiko Sawada's patch that enables vacuuming in the parallel mode. The table itself is still vacuumed by one (leader) process, but to vacuum indexes, this process can now launch background worker processes, one for each index. In the manual mode, this enables us to speed up vacuuming of large tables with several indexes; autovacuum does not use this functionality yet.» — translated.

Trigger autovacuum on tuple insertion

For this patch (also known as Berserk autovacuum), we should thank Darafei Praliaskouski, who suggested a solution to the following problem: autovacuum never comes to append-only tables since they do not contain «dead» tuples. Because of this, the visibility map does not get updated, which makes index-only scans inefficient. But when vacuum does come to prevent transaction id wraparound, it needs to do very much work instantaneously. This situation is fixed now: row inserts will trigger autovacuum. Two new server parameters were added (autovacuum_vacuum_insert_threshold and autovacuum_vacuum_insert_scale_factor), similar to existing parameters for deletes and updates (autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor).

Memory-Bounded Hash Aggregation

Aggregation that uses hashing may require more RAM than the planner expected and than specified in work_mem. Formerly this planner error resulted in ignoring work_mem and allocating as much memory as needed for the operation or for arrival of OOM Killer. Now the algorithm can avoid falling outside the range of work_mem and use temporary files on disk when necessary. To control the behavior of the planner, new parameters were added: enable_groupingsets_hash_disk and enable_hashagg_disk.

Optimize update of tables with generated columns

In version 12, generated columns were recomputed for any update of a row, even if this update did not affect these columns at all. Now they will be recomputed only if it is really necessary (if their base columns changed).

This optimization can, for instance, considerably speed up updates of tables with a generated column of the tsvector type since the to_tsvector() function is pretty expensive.

Allow trigger to get updated columns

This small patch adds the bitmap of updated columns to the TriggerData structure. General-purpose trigger functions, such as tsvector_update_trigger() or lo_manage(), can use this information to avoid doing extra work.

Using multiple extended statistics for estimates

In PostgreSQL 12, the planner could not use several extended statistics together for one table. For example: imagine the situation when there are two extended statistics on different sets of columns and a query contains columns both from one set and the other. The planner can now access all the available information.

Infrastructure for parallelization and COPY (See also this patch.)

Parallelism in PostgreSQL still works only for read queries. For write queries, there are issues, and one of them is locking the processes that perform the same task in parallel, all in a common parallel group). It is taken to be that locks of such processes do not conflict — for example: several processes can hold an exclusive lock on the same table. This requires the developers of the core to be especially scrupulous, otherwise, deadlocks would continuously arise.
But there are two exceptions:

  • Relation extension lock, which is acquired when new pages are added at the end of a data file.
  • Page lock, which is used when moving elements of a GIN index from the pending list to the main tree.

(This article provides more details.)
Locks like these must conflict even between processes in one parallel group — this is what this patch implements. But these locks can never cause deadlocks, therefore, they are not checked.

In general, nothing changes for users, but this patch is important because first, it paves the way to parallel INSERT and COPY and second, it eliminates one of the bottlenecks of PostgreSQL under heavy load (of which you can hear in this presentation at HL++).


EDH SKIP primes are replaced
This is about discontinuing use of EDH primes (Diffie-Hellman ephemeral keys) via the already outdated SKIP protocol.

Change authentication defaults for initdb
Default access settings for local and network connections at the start of initdb are changed. Now in pg_hba.conf, instead of the trust authentication method, peer will be used for local connections (or md5, if peer is not supported) and md5 for network connections. Initially, more liberal measures were discussed: a warning in the documentation. And then stricter ones were considered: scram-sha-256. As a result, it was decided not to go beyond peer and md5 so far.

Using explicit_bzero

This is a critical patch. OS functions bzero() and explicit_bzero() fill specified memory areas with bytes containing '\0' (for an example on Linux, refer here). These patches are only a beginning: there are quite a few memory areas where passwords and other sensitive information can remain. It was decided to begin with libpq, where an entire file with passwords can remain in memory after reading .pgpass, and with cleaning after closing a connection. be-secure-common.c now includes erasing a secret phrase that is input in SSL and that shows up in the error string (path).

Add password_protocol connection parameter to libpq

This patch enables libpq to check which password transfer protocol is used when connecting. After receiving this parameter, libpq will reject authentication if the protocol is weaker than the specified one. The default value of this parameter is plaintext, that is, all protocols will do.

Add mandatory access control for TRUNCATE

This patch allows extensions to build in mandatory access control (MAC) for the TRUNCATE operation. Now the sepgsql extension will also check the privileges for this operation. SELinux Reference Policy and Redhat-based Linux distributions do not support checks of permissions for db_table {truncate} by SELinux. In cases like this, sepgsql with deny_unknown equal to 1 will be used and TRUNCATE will not be performed.

Exposure related to GUC value of ssl_passphrase_command

This is a simple, but useful patch. Now the value of the ssl_passphrase_command parameter will be visible only to superuser. This parameter specifies an external command to be called when a password is required to decrypt an SSL file, such as a private key.


Simple collation versions for glibc

Databases store version numbers for ICU collations. For every use of a rule (collation, comparison of symbols) the stored version number is checked against the current OS version of the ISU library and in case of discrepancies, a warning is issued. This allows us to detect indexes that were built using a changed collation and may therefore appear incorrect and need rebuilding. After rebuilding the indexes using the ALTER COLLATION ... REFRESH VERSION command, the collation version in the database gets updated and warnings are no longer issued.

But this was the case only for ICU. Now the version number is also stored for libc collations:

SELECT collname, collversion FROM pg_collation WHERE collname = 'ru_RU';
 collname | collversion
 ru_RU    | 2.27

This enables us to issue warnings when the library changes in the OS. It is relevant in view of transitioning to glibc 2.28, where a lot of collations changed and appropriate indexes will need rebuilding.

But until we transition to 2.28, all is quiet:

NOTICE:  version has not changed

Full-text search

Full-text search for Greek language
No comment.

Add absolute value to dict_int

A capability to remove the sign of a number was added to the dict_int template dictionary (same as the extension).

SELECT ts_lexize('intdict', '-123');
(1 row)
SELECT ts_lexize('intdict', '-123');
(1 row)

So this time the absolute value is recognized.


BEFORE ROW triggers for partitioned tables

In version 12, row triggers BEFORE cannot be created on a partitioned table, while there is no problem to create them on separate partitions. Now the BEFORE FOR EACH ROW trigger created on a partitioned table will be automatically inherited and work for all partitions, but the partition key of the trigger on UPDATE will be able to vary only in the range of the current partition.

Support adding partitioned tables to logical replication publications

Earlier, addition of a partitioned table to a publication resulted in an error:

ERROR:  "p" is a partitioned table
DETAIL:  Adding partitioned tables to publications is not supported.
HINT:  You can add the table partitions individually.

Now this does work.

Advanced partition matching for partition-wise join

Starting with version 11, the planner can join partitioned tables in a partition-wise fashion, but only when partition bounds are exactly the same. Now the algorithm is improved: it will work when a partition of one table is fully contained in a partition of the other even if their sizes are different (for example: if one table is partitioned by days, and the other — by months). The new algorithm is functional for range and list partitioning.

Fix handling of full outer joins by partition-wise join

Partition-wise joins also work now for full outer joins with the USING clause.


There are no drastic advancements with respect to PostgreSQL 12 so far in this attractive and challenging, but difficult area. There are no ready-to-use pluggable starages (zheap and so on). The old friend heap) is still the only choice. But development of API is in progress.

Abstracting relation sizing code

Robert Haas rewrote the code by changing its architecture in favor of abstract forks in order to avoid code duplication in future. This part relates to estimate_rel_size — the size of table forks.

Allow table AM's to cache stuff in relcache

This patch makes the possibilities of time management for table access methods closer to those of index methods.
tableam vs. TOAST
TOAST is mainly designed for the heap storage concept, therefore, when table access methods are created, one can choose between two ways to go: whether help developers of the new methods to integrate there inserts, updates and deletes of TOAST rows, or delegate work with TOAST to the code that uses heap storage, which is traditional for PostgreSQL. A set of 5 patches to implement INSERT, UPDATE and DELETE operations uses slots of tuples and can help those going either way.


fsync error handling in pg_receivewal, pg_recvlogical

Struggle with the fsync() issue continues. PostgreSQL believes that a successful call to fsync() means that all data of the file is on disk, but sometimes this does not happen (it depends on the OS) and can cause data loss. It was decided that in PostgreSQL 13, we would need to handle the pg_receivewal and pg_recvlogical utilities. At present, the default behavior is like this: the utilities will log fsync errors, restore the connection and continue as if nothing ever happened. As a result, WAL will contain information about successfully copied files, which actually were not correctly copied to disk. So, it is better to terminate the utility. The future of pg_dumppg_basebackuppg_rewind and pg_checksums was also discussed, but it was decided not to go beyond the above two.

Safeguards against incorrect fd flags for fsync()
This patch checks whether the flags are set correctly when receiving the file descriptor for fsync() — directories are open only for reading, while files are open for writing or for both.

Backup and replication

Pause recovery if PITR target not reached

If WALs are over and recovery_target_time is not reached, the server stops the recovery and proceeds to a regular work. Now the behavior will be different. The recovery process will pause and put information into the log, and the administrator will be able to add missing WAL segments to continue the recovery.

Add ignore_invalid_pages parameter

If a recovery process comes across a reference to an incorrect page in a WAL record on a replica, panic happens. Turning on the ignore_invalid_pages parameter can help to handle it. The recovery will continue with possible loss of consistency or data, as well as with other severe consequences. The parameter is intended for server developers and must be used in cases where it is anyway needed to complete the recovery and run the replica by all means.

Online change of primary_conninfo

This Sergey Kornilov's patch allows us to change primary_conninfoprimary_slot_name and wal_receiver_create_temp_slot parameters without restarting the server. It's for this sake that in the 12-th release, the recovery.conf file was dropped.

Backup manifests

The pg_basebackup tool now creates a «manifest» — a JSON file that contains information on the backup copy created (filenames and file sizes, necessary WAL files, as well as checksums of anything and everything).
The new pg_validatebackup utility validates backup copies against the manifest and also uses the pg_waldump utility to check availability of WAL files needed for recovery and make sure they are not corrupted (only WAL files inside the backup copy itself rather than in the archive are meant).
This will allow us to detect situations when backup files got corrupted or disappeared or when recovery became impossible due to lack of necessary WAL files.

Restricting maximum keep segments by repslots

A replication slot is a convenient but dangerous technique: if a client does not read data from the slot in due time, unread WAL records may occupy all the server space. Now the max_slot_wal_keep_size parameter will enable us to impose a limitation on the maximum disk space that can be occupied by unread data. If at a next checkpoint, it appears that the size is exceeded, the slot is invalidated and the space gets free.


Support of Unix sockets on Windows
Windows 10 supports unix-domain sockets although they are turned off by default.


There are two new appendixes in the documentation.
Appendix M. Glossary appeared after a long discussion. There are currently 101 terms in the glossary.

As before, it is possible to colorize diagnostic messages of console utilities by means of the PG_COLOR variable. But now this is documented in Appendix N. Color Support. Peter Eisentraut's initial intent in this patch was to turn on colored output by default. And those who did not want this were suggested that they set the NO_COLOR variable. But opponents of the color differentiation appeared to be in the majority among those who discussed the patch. So it was decided to document available capabilities. And we got a new first-level documentation section.

Source: habr.com

Alex 10.6K
Joined: 4 years ago
In case you have found a mistake in the text, please send a message to the author by selecting the mistake and pressing Ctrl-Enter.
Comments (0)

    No comments yet

You must be logged in to comment.

Sign In / Sign Up