Doing things right or do right things? How to find row count of every table in database efficiently.


One Data Engineer had to replicate data from one well-known database vendor to a less-known database vendor.

He used select count(*) to validate that tables row counts were equal on the source and target.

It worked so slowly, that he got fired without ever knowing whether the table's content was equal or not.

Often laziness is a first step towards efficiency. Rather than doing count(*) on each table, the unfortunate DBA could have used internal statistics that every decent database vendor is maintaining, stored in the system views.

Need to take into consideration that it will never be 100% accurate and will depend on a few things

  • How often do database objects change
  • What is the internal or manual schedule for statistics refresh.
  • For a lot of database vendors, statistics will get refreshed automatically only when the changed data is more than 10% of the total table rows but this is usually configurable per table.
  • The percentage of rows used to calculate the statistics. The most accurate statistics are gathered when all rows in the table are processed (i.e. a 100% sample)
  • How often DBA have scheduled custom statistics to refresh.
  • How much approximation each database vendor allows themselves to have.
  • Keep in mind that there are usually no statistics on external tables

I will list here system objects/views that can be utilized to view table sizes for some database vendors.





















Will be happy to hear in comments if I have touched anything that is alive in this cosmos.

Yours,

Maria


"Do what you do so well that they will want to see it again and bring their friends " / Walt Disney.



Comments

Popular posts from this blog

Unlocking Microsoft Fabric: A Simple Guide when you only have a personal account.

ETL to ELT journey: Break free your Transformations and discover Happiness and Data Zen

The backbone your data pipelines have been waiting for.