TigerData logo
TigerData logo
  • Product

    Tiger Cloud

    Robust elastic cloud platform for startups and enterprises

    Agentic Postgres

    Postgres for Agents

    TimescaleDB

    Postgres for time-series, real-time analytics and events

  • Docs
  • Pricing

    Pricing

    Enterprise Tier

  • Developer Hub

    Changelog

    Benchmarks

    Blog

    Community

    Customer Stories

    Events

    Support

    Integrations

    Launch Hub

  • Company

    Contact us

    About

    Timescale

    Partners

    Security

    Careers

Log InTry for free
Home
AWS Time-Series Database: Understanding Your OptionsStationary Time-Series AnalysisThe Best Time-Series Databases ComparedTime-Series Analysis and Forecasting With Python Alternatives to TimescaleWhat Are Open-Source Time-Series Databases—Understanding Your OptionsWhy Consider Using PostgreSQL for Time-Series Data?Time-Series Analysis in RWhat Is Temporal Data?What Is a Time Series and How Is It Used?Is Your Data Time Series? Data Types Supported by PostgreSQL and TimescaleUnderstanding Database Workloads: Variable, Bursty, and Uniform PatternsHow to Work With Time Series in Python?Tools for Working With Time-Series Analysis in PythonGuide to Time-Series Analysis in PythonUnderstanding Autoregressive Time-Series ModelingCreating a Fast Time-Series Graph With Postgres Materialized Views
Understanding PostgreSQLOptimizing Your Database: A Deep Dive into PostgreSQL Data TypesUnderstanding FROM in PostgreSQL (With Examples)How to Address ‘Error: Could Not Resize Shared Memory Segment’ How to Install PostgreSQL on MacOSUnderstanding FILTER in PostgreSQL (With Examples)Understanding GROUP BY in PostgreSQL (With Examples)PostgreSQL Join Type TheoryA Guide to PostgreSQL ViewsStructured vs. Semi-Structured vs. Unstructured Data in PostgreSQLUnderstanding Foreign Keys in PostgreSQLUnderstanding PostgreSQL User-Defined FunctionsUnderstanding PostgreSQL's COALESCE FunctionUnderstanding SQL Aggregate FunctionsUsing PostgreSQL UPDATE With JOINHow to Install PostgreSQL on Linux5 Common Connection Errors in PostgreSQL and How to Solve ThemUnderstanding HAVING in PostgreSQL (With Examples)How to Fix No Partition of Relation Found for Row in Postgres DatabasesHow to Fix Transaction ID Wraparound ExhaustionUnderstanding LIMIT in PostgreSQL (With Examples)Understanding PostgreSQL FunctionsUnderstanding ORDER BY in PostgreSQL (With Examples)Understanding WINDOW in PostgreSQL (With Examples)Understanding PostgreSQL WITHIN GROUPPostgreSQL Mathematical Functions: Enhancing Coding EfficiencyUnderstanding DISTINCT in PostgreSQL (With Examples)Using PostgreSQL String Functions for Improved Data AnalysisData Processing With PostgreSQL Window FunctionsPostgreSQL Joins : A SummaryUnderstanding OFFSET in PostgreSQL (With Examples)Understanding PostgreSQL Date and Time FunctionsWhat Is Data Compression and How Does It Work?What Is Data Transformation, and Why Is It Important?Understanding the Postgres string_agg FunctionWhat Is a PostgreSQL Left Join? And a Right Join?Understanding PostgreSQL SELECTSelf-Hosted or Cloud Database? A Countryside Reflection on Infrastructure ChoicesUnderstanding ACID Compliance Understanding percentile_cont() and percentile_disc() in PostgreSQLUnderstanding PostgreSQL Conditional FunctionsUnderstanding PostgreSQL Array FunctionsWhat Characters Are Allowed in PostgreSQL Strings?Understanding WHERE in PostgreSQL (With Examples)What Is a PostgreSQL Full Outer Join?What Is a PostgreSQL Cross Join?What Is a PostgreSQL Inner Join?Data Partitioning: What It Is and Why It MattersStrategies for Improving Postgres JOIN PerformanceUnderstanding the Postgres extract() FunctionUnderstanding the rank() and dense_rank() Functions in PostgreSQL
Recursive Query in SQL: What It Is, and How to Write OneGuide to PostgreSQL PerformancePostgreSQL Performance Tuning: Designing and Implementing Your Database SchemaPostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesHow to Reduce Bloat in Large PostgreSQL TablesDetermining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)When to Consider Postgres PartitioningDesigning Your Database Schema: Wide vs. Narrow Postgres TablesGuide to PostgreSQL Database OperationsBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Best Practices for (Time-)Series Metadata Tables Understanding PostgreSQL TablespacesA Guide to Data Analysis on PostgreSQLA Guide to Scaling PostgreSQLGuide to PostgreSQL SecurityHandling Large Objects in PostgresHow to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLHow to Use PostgreSQL for Data TransformationOptimizing Array Queries With GIN Indexes in PostgreSQLPg_partman vs. Hypertables for Postgres PartitioningTop PostgreSQL Drivers for PythonWhat Is Audit Logging and How to Enable It in PostgreSQLGuide to Postgres Data ManagementHow to Index JSONB Columns in PostgreSQLHow to Monitor and Optimize PostgreSQL Index PerformanceSQL/JSON Data Model and JSON in SQL: A PostgreSQL PerspectiveA Guide to pg_restore (and pg_restore Example)PostgreSQL Performance Tuning: How to Size Your DatabaseAn Intro to Data Modeling on PostgreSQLExplaining PostgreSQL EXPLAINWhat Is a PostgreSQL Temporary View?A PostgreSQL Database Replication GuideHow to Compute Standard Deviation With PostgreSQLHow PostgreSQL Data Aggregation WorksHow to Use Psycopg2: The PostgreSQL Adapter for PythonBuilding a Scalable DatabaseGuide to PostgreSQL Database Design
Best Practices for Scaling PostgreSQLHow to Design Your PostgreSQL Database: Two Schema ExamplesHow to Handle High-Cardinality Data in PostgreSQLHow to Store Video in PostgreSQL Using BYTEABest Practices for PostgreSQL Database OperationsHow to Manage Your Data With Data Retention PoliciesBest Practices for PostgreSQL AggregationBest Practices for Postgres Database ReplicationHow to Use a Common Table Expression (CTE) in SQLBest Practices for Postgres Data ManagementBest Practices for Postgres PerformanceBest Practices for Postgres SecurityBest Practices for PostgreSQL Data AnalysisTesting Postgres Ingest: INSERT vs. Batch INSERT vs. COPYHow to Use PostgreSQL for Data Normalization
PostgreSQL Extensions: amcheckPostgreSQL Extensions: Unlocking Multidimensional Points With Cube PostgreSQL Extensions: hstorePostgreSQL Extensions: ltreePostgreSQL Extensions: Secure Your Time-Series Data With pgcryptoPostgreSQL Extensions: pg_prewarmPostgreSQL Extensions: pgRoutingPostgreSQL Extensions: pg_stat_statementsPostgreSQL Extensions: Install pg_trgm for Data MatchingPostgreSQL Extensions: Turning PostgreSQL Into a Vector Database With pgvectorPostgreSQL Extensions: Database Testing With pgTAPPostgreSQL Extensions: PL/pgSQLPostgreSQL Extensions: Using PostGIS and Timescale for Advanced Geospatial InsightsPostgreSQL Extensions: Intro to uuid-ossp
Columnar Databases vs. Row-Oriented Databases: Which to Choose?Data Analytics vs. Real-Time Analytics: How to Pick Your Database (and Why It Should Be PostgreSQL)How to Choose a Real-Time Analytics DatabaseUnderstanding OLTPOLAP Workloads on PostgreSQL: A GuideHow to Choose an OLAP DatabasePostgreSQL as a Real-Time Analytics DatabaseWhat Is the Best Database for Real-Time AnalyticsHow to Build an IoT Pipeline for Real-Time Analytics in PostgreSQL
When Should You Use Full-Text Search vs. Vector Search?HNSW vs. DiskANNA Brief History of AI: How Did We Get Here, and What's Next?A Beginner’s Guide to Vector EmbeddingsPostgreSQL as a Vector Database: A Pgvector TutorialUsing Pgvector With PythonHow to Choose a Vector DatabaseVector Databases Are the Wrong AbstractionUnderstanding DiskANNA Guide to Cosine SimilarityStreaming DiskANN: How We Made PostgreSQL as Fast as Pinecone for Vector DataImplementing Cosine Similarity in PythonVector Database Basics: HNSWVector Database Options for AWSVector Store vs. Vector Database: Understanding the ConnectionPgvector vs. Pinecone: Vector Database Performance and Cost ComparisonHow to Build LLM Applications With Pgvector Vector Store in LangChainHow to Implement RAG With Amazon Bedrock and LangChainRetrieval-Augmented Generation With Claude Sonnet 3.5 and PgvectorRAG Is More Than Just Vector SearchPostgreSQL Hybrid Search Using Pgvector and CohereImplementing Filtered Semantic Search Using Pgvector and JavaScriptRefining Vector Search Queries With Time Filters in Pgvector: A TutorialUnderstanding Semantic SearchWhat Is Vector Search? Vector Search vs Semantic SearchText-to-SQL: A Developer’s Zero-to-Hero GuideNearest Neighbor Indexes: What Are IVFFlat Indexes in Pgvector and How Do They WorkBuilding an AI Image Gallery With OpenAI CLIP, Claude Sonnet 3.5, and Pgvector
Understanding IoT (Internet of Things)A Beginner’s Guide to IIoT and Industry 4.0Storing IoT Data: 8 Reasons Why You Should Use PostgreSQLMoving Past Legacy Systems: Data Historian vs. Time-Series DatabaseWhy You Should Use PostgreSQL for Industrial IoT DataHow to Choose an IoT DatabaseHow to Simulate a Basic IoT Sensor Dataset on PostgreSQLFrom Ingest to Insights in Milliseconds: Everactive's Tech Transformation With TimescaleHow Ndustrial Is Providing Fast Real-Time Queries and Safely Storing Client Data With 97 % CompressionHow Hopthru Powers Real-Time Transit Analytics From a 1 TB Table Migrating a Low-Code IoT Platform Storing 20M Records/DayHow United Manufacturing Hub Is Introducing Open Source to ManufacturingBuilding IoT Pipelines for Faster Analytics With IoT CoreVisualizing IoT Data at Scale With Hopara and TimescaleDB
What Is ClickHouse and How Does It Compare to PostgreSQL and TimescaleDB for Time Series?Timescale vs. Amazon RDS PostgreSQL: Up to 350x Faster Queries, 44 % Faster Ingest, 95 % Storage Savings for Time-Series DataWhat We Learned From Benchmarking Amazon Aurora PostgreSQL ServerlessTimescaleDB vs. Amazon Timestream: 6,000x Higher Inserts, 5-175x Faster Queries, 150-220x CheaperHow to Store Time-Series Data in MongoDB and Why That’s a Bad IdeaPostgreSQL + TimescaleDB: 1,000x Faster Queries, 90 % Data Compression, and Much MoreEye or the Tiger: Benchmarking Cassandra vs. TimescaleDB for Time-Series Data
Alternatives to RDSWhy Is RDS so Expensive? Understanding RDS Pricing and CostsEstimating RDS CostsHow to Migrate From AWS RDS for PostgreSQL to TimescaleAmazon Aurora vs. RDS: Understanding the Difference
5 InfluxDB Alternatives for Your Time-Series Data8 Reasons to Choose Timescale as Your InfluxDB Alternative InfluxQL, Flux, and SQL: Which Query Language Is Best? (With Cheatsheet)What InfluxDB Got WrongTimescaleDB vs. InfluxDB: Purpose Built Differently for Time-Series Data
5 Ways to Monitor Your PostgreSQL DatabaseHow to Migrate Your Data to Timescale (3 Ways)Postgres TOAST vs. Timescale CompressionBuilding Python Apps With PostgreSQL: A Developer's GuideData Visualization in PostgreSQL With Apache SupersetMore Time-Series Data Analysis, Fewer Lines of Code: Meet HyperfunctionsIs Postgres Partitioning Really That Hard? An Introduction To HypertablesPostgreSQL Materialized Views and Where to Find ThemTimescale Tips: Testing Your Chunk Size
Postgres cheat sheet
HomeTime series basicsPostgres basicsPostgres guidesPostgres best practicesPostgres extensionsPostgres for real-time analytics
Sections

Performance

Guide to PostgreSQL Performance

Schema design

PostgreSQL Performance Tuning: Designing and Implementing Your Database Schema

Performance tuning

PostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesHow to Reduce Bloat in Large PostgreSQL TablesPostgreSQL Performance Tuning: How to Size Your Database

Partitioning

Determining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)When to Consider Postgres PartitioningPg_partman vs. Hypertables for Postgres Partitioning

Database design and modeling

An Intro to Data Modeling on PostgreSQLDesigning Your Database Schema: Wide vs. Narrow Postgres TablesBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Best Practices for (Time-)Series Metadata Tables Guide to PostgreSQL Database Design

Database replication

A PostgreSQL Database Replication Guide

Data analysis

A Guide to Data Analysis on PostgreSQLHow to Compute Standard Deviation With PostgreSQL

Data transformation

How to Use PostgreSQL for Data Transformation

Data aggregation

How PostgreSQL Data Aggregation Works

Scaling postgres

A Guide to Scaling PostgreSQLBuilding a Scalable Database

Query optimization

Explaining PostgreSQL EXPLAINWhat Is a PostgreSQL Temporary View?Recursive Query in SQL: What It Is, and How to Write OneOptimizing Array Queries With GIN Indexes in PostgreSQL

Database security

Guide to PostgreSQL SecurityWhat Is Audit Logging and How to Enable It in PostgreSQL

Data management

Understanding PostgreSQL TablespacesGuide to Postgres Data ManagementHandling Large Objects in Postgres

Database operations

Guide to PostgreSQL Database Operations

JSON

How to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLHow to Index JSONB Columns in PostgreSQLSQL/JSON Data Model and JSON in SQL: A PostgreSQL Perspective

Database tools and libraries

How to Use Psycopg2: The PostgreSQL Adapter for PythonTop PostgreSQL Drivers for Python

Database indexes

How to Monitor and Optimize PostgreSQL Index Performance

Database backups and restore

A Guide to pg_restore (and pg_restore Example)

Products

Time Series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Forum Tutorials Changelog Success Stories Time Series Database

Company

Contact Us Careers About Brand Community Code Of Conduct Events

Subscribe to the Tiger Data Newsletter

By submitting, you acknowledge Tiger Data's Privacy Policy

2025 (c) Timescale, Inc., d/b/a Tiger Data. All rights reserved.

Privacy preferences
LegalPrivacySitemap

Published at Mar 6, 2024

Database Design

PostgreSQL Performance Tuning: Designing and Implementing Your Database Schema

An elephant boss at its desk.

Authors: Emery Mizero and Carlota Soto

So far, in this blog series about PostgreSQL performance tuning, we’ve covered how to size your database, the key parameters you need to optimize, and database indexes. Now, we’re adding one more piece of the puzzle—schema design.

As a Support Engineer at Timescale, one of the topics I get asked most about is how to design a database schema: essentially, the database skeleton that defines how data is stored in your relational database. Optimizing your schema design is fundamental to scaling PostgreSQL databases. A good schema will help you achieve optimal query performance, minimize I/O operations, ensure efficient CPU and memory utilization for backend processes—and even reduce storage requirements.

We’ve already discussed common design choices for your PostgreSQL tables (narrow vs. medium vs. wide table) and considerations for your overall database design (single-table vs. multiple-table) in some of our previous articles. One of the most significant takeaways we shared is that to decide between narrow/medium/wide or single/multiple tables, you’ll have to determine what’s most important for you (e.g., ease-of-use, cost of up-front design, extensibility, the need for multi-tenancy) and pick the design that has the most potential according to your priorities.

Today, we’ll keep building on the topic of schema design by presenting concrete implementation considerations to optimize query performance and storage efficiency while scaling PostgreSQL.

A scaling database is like a living product: before concluding your schema and sending it to the manufacturing floor, you must have a conversation on how your data is collected, how it will be ingested, and ultimately, how it will be accessed by any downstream consumers. This will ensure that your product (a.k.a. your database) will be up to task with the expected workload, avoiding failures and unexpected bottlenecks that may be hard to resolve once you are up and running in production.

Let’s dive straight into it.

Best Practices for Designing Your Database Schema

If you can, avoid too many backfills or updates

At Timescale, we’re experts on time-series data (or data indexed by time) so we’re very familiar with what is traditionally described as append-only workloads. For example, in the case of time-series data, common types of time series include metrics, events, IoT data, and stock prices. Each piece of data captures information about something useful at a given point in time; therefore, it is not updated often.

Ingesting append-only data will give you optimal performance, especially if you’re adding data chronologically (or to the most recently created partitions). Append-only ingestion involves adding new records to the end of a table without updating or deleting existing rows. This pattern minimizes the overhead associated with updating existing data and maintaining indexes: since no existing data is modified, there's less contention and less work for the system.

As support engineers who’ve seen many different use cases, we recommend always trying to write to the most recent partitions (or chunks in Timescale terminology) and make them fit in memory. Writing data in this manner ensures that a consistent and predictable portion of the data (the most recent) remains hot.

If you can design your schema to accommodate such an ingesting pattern, your insert performance will be at its best. If you’re using partitioning (as hypertables do), determining the right partition size will be key to doing so. For advice on partition size, check out our recent blog post on the topic.

What happens when not all your data is not ingested chronologically (e.g., if you’re migrating an existing dataset) and backfilling is necessary? Ingestion with backfill usually implies a performance hit, depending on the range covered by the incoming data and the number of partitions targeted by these inserts.

The likely problem in backfill scenarios is memory eviction: if your data does not fit in memory or its data access pattern targets many different partitions, you will encounter frequent memory eviction that can lead to significant ingest degradation of your data ingestion.

To mitigate this issue with backfilling, consider the following best practices:

  • Try to backfill data during periods when the database load is lowest. This ensures that normal operations are least affected by the intensive backfilling process.

  • Instead of inserting one record at a time, batch the data together. This is, with hundreds or thousands of rows per INSERT, which helps reduce the transactional overhead and can increase the efficiency of data ingestion.

  • If the data to be backfilled can be processed in parallel, consider using parallel insert processes, targeting different partitions. This helps in utilizing system resources more efficiently. If you’re using Timescale, you can do this via timescaledb_parallel_copy().

  • If you have indexes that aren't crucial for the current operations, consider dropping them before the backfill and recreating them afterward. Every index update can slow down data insertion, as we covered in the previous article of this series.

  • Consider temporarily increasing shared_buffers during the backfill process. This might help reduce memory evictions, but consider the overall recommendations on 

    shared_buffers (and remember to get back to your default once you’re done backfilling).

  • You can also temporarily adjust work_mem. Again, make sure to do it carefully while considering potential drawbacks.

  • Another thing you could look into is write-ahead log (WAL). During any modification to the data, PostgreSQL first writes those changes to the WAL, ensuring that you can recover the database using this log even in the event of a crash. However, when you're backfilling large volumes of data, the default WAL configurations might not be optimal and become a bottleneck. You could consider increasing wal_buffers, allocating more memory for buffering WAL data so that more transactional changes can be held in memory before a flush is needed. commit_delay is another interesting parameter to adjust, enabling more transactions to be grouped together.

Lastly, let’s briefly cover another scenario: ingestion with updates. This is often accompanied by the ON CONFLICT clause and requires a uniqueness constraint that must be checked on every insert. Just like backfill, the window of time and the order in which your data is inserted will determine whether you have aggressive memory eviction of more recent data or more desirable memory eviction of old data that has already been synced to disk.

The third tip in one of our timeless blog posts—13 Tips to Improve PostgreSQL Insert Performance—touches on why avoiding data uniqueness checks during ingestion can improve your performance.

Design your schemas around your query pattern

Query patterns have a profound influence on schema design in PostgreSQL. When designing your schema, it's essential to understand the most common and critical operations your database will perform. I have many years of experience querying data from very large PostgreSQL databases, and I have always relied on the following guidelines:

  • Leverage highly selective indexes. As we covered in the previous article of this series, indexing is an efficient way to speed up data retrieval operations in a database. A highly selective index is one that filters out a large portion of a table's rows, drastically reducing the search space for a query and allowing PostgreSQL to quickly locate and access the relevant rows.

  • Return a reasonable resultset (generally below 10,000 records). The volume of data returned by a query not only impacts the time it takes to retrieve the data but also affects the bandwidth, memory, and processing resources on both the server and client side. By limiting the resultset to a reasonable number, you ensure that the database isn't overburdened. You can put this into practice by using the ‘LIMIT’ clause or by implementing pagination using LIMIT and OFFSET. This might be worth optimizing from your application layer as well, for example, by encouraging users to provide specific filtering criteria when running a query (e.g., prompting them to a specific date range when fetching a metric).

  • Bring the amount of intermediate data during query processing as close as possible to the amount of records in the final resultset. Every SQL query goes through multiple processing phases before arriving at the final result. If intermediate stages handle a lot of data, it can lead to increased I/O operations, higher memory usage, and prolonged processing times. Strive to design queries in such a way that the intermediate data volume aligns closely with the size of the final resultset. This can be achieved by filtering data early in the query or by judiciously using joins and subqueries.

  • Cache computed data where necessary. Recomputing data repeatedly can be resource-intensive, especially for complex calculations or aggregations. Instead of performing these calculations every time data is queried, consider caching the computed results.

If you are using Timescale, the guidelines above translate into these very concrete best practices:

  • Query hypertables for recent data or specific time windows. Queries accessing the most recent data or specific short-time windows will perform optimally in hypertables—you can query them directly.

  • Query continuous aggregates for analytical queries over larger windows of time or large portions of historical data. For your queries that require looking at longer time intervals (e.g., monthly averages), use continuous aggregates. They allow you to pre-aggregate data, making it much faster and more efficient to retrieve analytics over historical data. They also allow you to drop the original raw data in the parent hypertable if required for storage savings.

  • Leverage user-defined actions for complex computations. By doing so, you can offload complex computations to be executed as data comes in, storing them in a separate cache or result table so they can be done in smaller datasets at a time. These computations could involve complex joins, mathematical operations, or even machine learning predictions. Once the computations are stored, your application can query the cache table for a snappy response.

Partitioning is your friend (and hypertables are your superhero)

We couldn’t give advice on schema design for large PostgreSQL tables without mentioning partitioning.

Since its inception, Timescale has worked to address challenges that developers tend to encounter while working with time series data—including how to scale PostgreSQL to deal with the ingestion and retention of massive amounts of data.  Database partitioning is a common way in which PostgreSQL users solve the problem of very big tables that get sluggish. This article gets deep into the topic of partitioning and when you should consider it.

In terms of how to implement partitioning for your large PostgreSQL tables without complicating your life, hypertables are the answer. When you create a hypertable (which looks just like a regular PostgreSQL table), partitions will be automatically created as you insert data and transparently deleted after some time if you set up a retention policy.

Hypertables can help you boost your ingest and query performance significantly (especially if you’re dealing with large tables and/or high ingest rates), and they can also save you tons of time in data management.

Design your schemas to minimize storage cost

In PostgreSQL, the efficient use of storage space is influenced by column ordering due to alignment padding, which is determined by the size of column types. For instance, an eight-byte bigint type will begin its storage at an address that's a multiple of eight. Similarly, a four-byte integer type will start at an address that's a multiple of four. This pattern is consistent with the two-byte smallint and the one-byte char types.

Consider a scenario where a smallint (two bytes) is placed right before a bigint (eight bytes), starting at address zero. The smallint occupies addresses zero and one. The subsequent bigint, however, won't begin at address two since it's not an eight-byte alignment. Instead, it starts at address eight, spanning to address 15, leaving addresses two through seven unutilized. These vacant addresses could have accommodated types like integer, smallint, or char.

To optimize storage and minimize wasted space due to alignment padding, it's advisable to arrange columns in the table definition from largest to smallest data type.

In the example below, the ‘bad_column_order’ hypertable uses eight additional bytes per row of data. If compared to the ‘good_column_order’ hypertable, it could potentially translate into roughly 14 percent space savings.

create table bad_column_order ( id bigint not null, val_a smallint not null, val_b integer not null, val_c smallint not null, time timestamp with time zone not null ); create index bad_column_order_time_idx on bad_column_order(time desc); select create_hypertable('bad_column_order', 'time', chunk_time_interval => INTERVAL '1 day');

insert into bad_column_order (id, val_a, val_b, val_c, time) values (1, 15, 123456789, 50, now());

select pg_column_size(bad_column_order.*) from bad_column_order pg_column_size ---------------- 56

create table good_column_order ( id bigint not null, time timestamp with time zone not null, val_b integer not null, val_a smallint not null, val_c smallint not null );

create index good_column_order_time_idx on good_column_order(time desc); select create_hypertable('good_column_order', 'time', chunk_time_interval => INTERVAL '1 day');

insert into good_column_order (id, val_a, val_b, val_c, time) values (1, 15, 123456789, 50, now());

select pg_column_size(good_column_order.*) from good_column_order; pg_column_size ---------------- 48

When Your Use Case Demands Multiple Tables: Extra Considerations

A single table is often sufficient to deal with multi-tenant scenarios; we often recommend that the separation of data be handled at the application level. However, there can be operational requirements and regulations that may demand such separation of data to be pushed down to database objects (or even different databases entirely) and thus the separation of data into many tables.

For considerations on the schema design choice of single table vs. multiple, review our previous article on the topic. If you choose to have multiple tables or hypertables, consider the following:

Beware of vacuuming

If you have a large number of update/delete against multiple tables or hypertables, then you might need to pay close attention to optimizing your vacuum strategy. Vacuuming hundreds if not thousands of hypertables is often associated with sustained increased CPU utilization, which may severely impair your overall database performance. You don’t want this to happen.

This topic deserves its own article—stay tuned for that. In the meantime, we hope this helps: you can build a BPFtrace program to monitor your PostgreSQL vacuum operations.

Don’t run out of background workers

If you are a Timescale user leveraging most of the features that Timescale has to offer, each hypertable will be accompanied by a number of continuous aggregates and retention policies. Running each one of these processes consumes one of your allocated worker processes, configured under ‘timescaledb.max_background_workers’, which is, in turn, a subset of ‘max_worker_processes’ (a parameter we covered in part II of this series).

If you have many hypertables and, therefore, more policies/jobs, the likelihood that multiple policies will run at the same time increases, potentially exhausting your background workers and leading some of the policies to fail. To address this problem, we released a feature in TimescaleDB 2.9.0 that allows you to create policies with fixed schedules and stagger them enough to guarantee that the number of jobs running simultaneously will never exceed  timescaledb.max_background_workers config value.

That said, you may be limited based on the amount of time it takes to run a job and the job schedule interval. For instance, assume that you have multiple policies where each one takes five seconds to complete and is scheduled to run every minute. If you can only consume two background workers or two jobs running at any given time, then you can only have at most 24 policies defined. Completion time for jobs may vary, but the straightforward staggering implementation will be based on each job’s expected longest time to complete.

To continue reading on how to configure timescaledb.max_background_workers in relation to max_parallel_workers and max_worker_processes, check our documentation on the topic and our previous article.

Conclusion

From all the things you can do to fine-tune your PostgreSQL performance, adequately designing your database schema is one of the most impactful but also the one that raises more doubts. As a support engineer, I hope to have shed some light on the best practices you should follow, including tips to leverage the Timescale functionality if you’re a Timescale user.

If you have any questions and happen to be a Timescale user, remember you can always turn to us for advice. Shoot us a message at [email protected], including if you’re a trialer.

On this page

    Advertisement