Section 29: Tuning PostgreSQL for Spatial¶
PostgreSQL is a very versatile database system, capable of running efficiently in very low-resource environments and environments shared with a variety of other applications. In order to ensure it will run properly for many different environments, the default configuration is very conservative and not terribly appropriate for a high-performance production database. Add the fact that geospatial databases have different usage patterns, and the data tend to consist of fewer, much larger records than non-geospatial databases, and you can see that the default configuration will not be totally appropriate for our purposes.
All of these configuration parameters can edited in the database configuration file, C:\Documents and Settings\%USER\.opengeo\pgdata\%USER. This is a regular text file and can be edited using Notepad or any other text editor. The changes will not take effect until the server is restarted.
An easier way of editing this configuration is by using the built-in “Backend Configuration Editor”. In pgAdmin, go to File > Open postgresql.conf.... It will ask for the location of the file, and navigate to C:\Documents and Settings\%USER\.opengeo\pgdata\%USER.
This section describes some of the configuration parameters that should be adjusted for a production-ready geospatial database. For each section, find the appropriate item in the list, double-click on the line to edit the configuration. Change the Value to the recommended value as described, make sure the item is Enabled, the click OK.
These values are recommendations only; each environment will differ and testing is required to determine the optimal configuration. But this section should get you off to a good start.
Defines the amount of memory that internal sorting operations and hash tables can consume before the database switches to on-disk files. This value defines the available memory for each operation; complex queries may have several sort or hash operations running in parallel, and each connected session may be executing a query.
As such you must consider how many connections and the complexity of expected queries before increasing this value. The benefit to increasing is that the processing of more of these operations, including ORDER BY, and DISTINCT clauses, merge and hash joins, hash-based aggregation and hash-based processing of subqueries, can be accomplished without incurring disk writes.
Default value: 1MB
Recommended value: 16MB
Defines the amount of memory used for maintenance operations, including vacuuming, index and foreign key creation. As these operations are not terribly common, the default value may be acceptable. This parameter can alternately be increased for a single session before the execution of a number of CREATE INDEX or VACUUM calls as shown below.
SET maintenance_work_mem TO '128MB'; VACUUM ANALYZE; SET maintenance_work_mem TO '16MB';
Default value: 16MB
Recommended value: 128MB
Sets the amount of memory used for write-ahead log (WAL) data. Write-ahead logs provide a high-performance mechanism for insuring data-integrity. During each change command, the effects of the changes are written first to the WAL files and flushed to disk. Only once the WAL files have been flushed will the changes be written to the data files themselves. This allows the data files to be written to disk in an optimal and asynchronous manner while ensuring that, in the event of a crash, all data changes can be recovered from the WAL.
The size of this buffer only needs to be large enough to hold WAL data for a single typical transaction. While the default value is often sufficient for most data, geospatial data tends to be much larger. Therefore, it is recommended to increase the size of this parameter.
Default value: 64kB
Recommended value: 1MB
This value sets the maximum number of log file segments (typically 16MB) that can be filled between automatic WAL checkpoints. A WAL checkpoint is a point in the sequence of WAL transactions at which it is guaranteed that the data files have been updated with all information before the checkpoint. At this time all dirty data pages are flushed to disk and a checkpoint record is written to the log file. This allows the crash recovery process to find the latest checkpoint record and apply all following log segments to complete the data recovery.
Because the checkpoint process requires the flushing of all dirty data pages to disk, it creates a significant I/O load. The same argument from above applies; geospatial data is large enough to unbalance non-geospatial optimizations. Increasing this value will prevent excessive checkpoints, though it may cause the server to restart more slowly in the event of a crash.
Default value: 3
Recommended value: 6
This is a unit-less value that represents the cost of a random page access from disk. This value is relative to a number of other cost parameters including sequential page access, and cpu operation costs. While there is no magic bullet for this value, the default is generally conservative. This value can be set on a per-session basis using the SET random_page_cost TO 2.0 command.
Default value: 4.0
Recommended value: 2.0
This is the parameter that controls the cost of a sequential page access. This value does not generally require adjustment but the difference between this value and random_page_cost greatly affects the choices made by the query planner. This value can also be set on a per-session basis.
Default value: 1.0
Recommended value: 1.0
After these changes are made, save changes and reload the configuration.
- This is done by right-clicking on the Server PostGIS (localhost:54321) in pgAdmin, selecting to Disconnect
- Clicking Shutdown in the OpenGeo Dashboard, then clicking Start.
- Finally reconnecting to the server in pgAdmin (right-click on the server and select Connect).
Table Of Contents
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License. Feel free to use this material, but we ask that you please retain the OpenGeo branding, logos and style.