Modern enterprises accumulate a large amount of information stored in databases. One way to deal with a growing database is to partition the database and distribute the data over several database partitions. Database partitioning allows parallel processing of database query and other database administration tasks (e.g. database backup), and database performance is improved. On the other side, more database partitions generate additional processing and communication overhead. In current database systems the data (tables and indexes) are coupled with the DB partitions. There is no simple method available to assign data to another DB partition, and a fast data redistribution method is needed to move the data from existing database partition schema into the new database partition schema. This article proposes an efficient method for assigning existing data volumes to a new database partition schema without data redistribution.Thus the shared nothing concept will be enhanced by the flexibility to adopt the number of database partitions to the database size without any data redistribution.
Method and system assigning clustered data to database partitions without data redistribution
Modern enterprises accumulate a large amount of information stored in databases. As the information in these enterprises grows dramatically, the databases grow as well. One way to deal with a growing database is to partition the database and distribute the data over several database partitions. Each database partition (DB partition) has dedicated system resources (e.g. CPUs, memory, I/O adapters and storage devices). This concept is known as shared nothing concept for databases.
Database partitioning allows parallel processing of database query and other database administration tasks (e.g. database backup). Database performance is improved with parallel processing. On the other side, more database partitions generate additional processing and communication overhead. To achieve the optimal ratio between total processing overhead and database performance, the number of database partitions should grow with the amount of data in the database.
But in current database systems the data (tables and indexes) are coupled with the DB partitions. There is no simple method available to assign data to another DB partition. Long running data redistribution processes have to run to change data distribution over database partitions. Data has to be moved physically record by record with database specific data redistribution functions, which includes much database processing overhead and need a lot of system resources. For that reason a fast data redistribution method is needed to move the data from existing database partition schema into the new database partition schema.
This publication relates to a method for database management systems and, more particularly, to a method for efficient and flexible assignment of existing data volumes to a new database partition schema without the conventional data redistribution .
A database may be partitioned into one or more database partitions. Each database partition (DB partition) has database server resources assigned, like CPUs, memory and I/O devices. The data of a database is physically stored in a data storage, which could be local disks of a database server or one or more separate data storage devices connected to a database server.
The data within a relational database are organized in relational database tables and indexes. The records of a database table are grouped in disjunct sets, which are called "data clusters". One or more columns of the database table are used to determine the data cluster, in which a record of a database table is stored. These columns are called cluster key columns. A hash or value range function can be applied on the cluster key columns of a table record to determine the data cluster, in which the table record will be stored.
All records of a data cluster have t...