49874

best disk saving strategy for “replacement inserts”

Every day I delete hundreds of thousands of records from a large table, then I do some calculations (with new data) and replace every one of the records that I previously deleted. I thought doing regular vacuum tbl would do the trick. I know it doesn't return disk space to the server, but (because of the pg docs) I thought because I was inserting about as many records as I was deleting, I wouldn't loose any/much disk space. However, after moving the table to a different namespace (for an unrelated reason) the table went from 117GB to 44GB! So...

Is there a better strategy than this so my table does bloat:

delete from tbl where ...etc... -- hundreds of thousands of rows removed insert into tbl (...etc...) values (...etc...) -- hundreds of thousands of rows added back (fresh calcs) .. repeat the above about 10 times a day ... vacuum tbl

https://www.postgresql.org/docs/9.6/static/sql-vacuum.html

PostgreSQL 9.6

What I actually did to reduce the table size is in my answer here: integer out of range and remaining disk space too small to convert id to bigint and other solutions

Edit 1: The drawbacks to vacuum full are too restricting for me. I am processing stuff 24/7 so i can't have locks like that and my available disk space is pretty limited at any point in time. Trying to go about this in a better way.

Answer1:

What you are looking for is "dead space equilibrium" as I like to call it. If you've got say 1M rows and you want to delete and replace 100k rows, then you can do it in different ways. Let's suppose you delete 100k, and insert 100k right away. The db won't have time to vacuum up those old dead rows, so now your 1M row table has 100k dead rows in it. Over the next 24 hours vacuum will kick in and mark them dead, and the next time you delete / insert, you'll create 100k more dead rows, then reuse (most of) the previous 100k dead rows. Your 1M row table now has ~100k dead rows again, which will get reused next time and so on.

You want to reach a point where your deletes/inserts (or updates) and vacuum are creating / reclaiming dead tuples at an even rate.

Recommend

  • Keeping fixed size symbols in shared memory
  • Why are destructors not virtual by default [C++]
  • Multiple XmlElement attributes on same property/class/
  • How to reduce SSRS PDF Export Size
  • Git Staged - It have file history? [closed]
  • Can I use PropertyChanged and LostFocus together?
  • a:first-child modifying all links within my list
  • How to deal with xxx~HEAD after git merge?
  • how to create trigger to do concatenation of fields
  • Why is the COM interface contract immutable?
  • Embed multiple jpeg images into EXCEL programmatically?
  • PE file - what's missing?
  • How do I customize header and footer in jqgrid?
  • shutdown and update job in Google Dataflow with PubSubIO + message guarantees
  • Mapping ManyToMany with composite Primary key and Annotation:
  • @Autowired for @ModelAttribute
  • Index.php as custom error page
  • Why not Factory pattern for sorting? [closed]
  • C++ Coin flip program error
  • string.IsNullOrEmpty() Doesn't Seem to Work on a String within a Class within a Class
  • Insert records if not exist SQL Server 2005
  • How to extract text from a PDF and decode characters?
  • Are there any side effects from calling SQLAlchemy flush() within code?
  • IE6 changes DOCTYPE to a bad one
  • User messaging system
  • WPF Template Binding in ToggleButton UserControl
  • Do query loads all the data in memory
  • Trying to get the char code of ENTER key
  • Memory error in python- how to use more memory
  • Using Sax parsing to edit and write XML in VB6
  • ASP.NET MVC Application won't update some controllers
  • Debug.DrawLine not showing in the GameView
  • Yii2: Config params vs. const/define
  • Optimizing database types to compact database (SQLite)
  • vba code to select only visible cells in specific column except heading
  • Do I've to free mysql result after storing it?
  • Transpose CSV data with awk (pivot transformation)
  • Checking variable from a different class in C#
  • Programmatically clearing map cache
  • Sorting a 2D array using the second column C++