I'm trying to create a data import mechanism for a database that requires high availability to readers while serving irregular bulk loads of new data as they are scheduled.
The new data involves just three tables with new datasets being added along with many new dataset items being referenced by them and a few dataset item metadata rows referencing those. Datasets may have tens of thousands of dataset items.
The dataset items are heavily indexed on several combinations of columns with the vast majority (but not all) reads including the dataset id in the where clause. Because of the indexes, data inserts are now too slow to keep up with inflows but because readers of those indexes take priority I can not remove the indexes on the main table but need to work on a copy.
I therefore need some kind of working table that I copy into, insert into and reindex before quickly switching it to become part of the queried table/view. The question is how do I quickly perform that switch?
I have looked into partitioning the dataset items table by a range of dataset id, which is a foreign key, but because this isn't part of the primary key SQL Server doesn't seem make that easy. I am not able to switch the old data partition with a readily indexed updated version.
Different articles suggest use of partitioning, snapshot isolation and partitioned views but none directly answer this situation, being either about bulk loading and archiving of old data (partitioned by date) or simple transaction isolation without considering indexing.
Is there any examples that directly tackle this seemingly common problem?
<strong>What different strategies do people have for really minimizing the amount of time that indexes are disabled for when bulk loading new data into large indexed tables?</strong>Answer1:
Notice, that partitioning on a column requires the column to be part of the clustered index key, not part of the primary key. The two are independent.
Still, partitioning imposes lots of constraints on what you operations you can perform on your table. For example, switching only works if all indexes are aligned and no foreign keys reference the table being modified.
If you can make use of partitioning under all of those restrictions this is probably the best approach. Partitioned views give you a more flexibility but have similar restrictions: All indexes are obviously aligned and incoming FKs are impossible.
Partitioning data is not easy. It is not a click-through-wizard-and-be-done solution. The set of tradeoffs is very complex.