SQL Server 2000 Performance Guidelines - Schema
From Guidance Share
- J.D. Meier, Srinath Vasireddy, Ashish Babbar, Sharon Bjeletich and Alex Mackman
Devote the Appropriate Resources to Schema Design
Too many organizations design tables at the last minute when the tables are needed for their queries. Take the time and devote the resources that are needed to gather the business requirements, to design the right data model, and to test the data model. Make sure that your design is appropriate for your business and that the design accurately reflects the relationships between all objects. Changing a data model after your system is already in production is expensive, time consuming, and inevitably affects a lot of code.
Separate OLAP and OLTP Workloads
OLAP and OLTP workloads on one server have to be designed to not impede each other. OLAP and reporting workloads tend to be characterized by infrequent, long-running queries. Users are rarely waiting impatiently for the queries to complete. OLTP workloads tend to be characterized by lots of small transactions that return something to the user in less than a second. Long-running queries for analysis, reports, or ad-hoc queries may block inserts and other transactions in the OLTP workload until the OLAP query completes.
If you need to support both workloads, consider creating a reporting server that supports the OLAP and reporting workloads. If you perform lots of analysis, consider using SQL Server Analysis Services to perform those functions.
Normalize First, Denormalize for Performance Later
You achieve a good, logical database design by applying normalization rules to your design. Normalization provides several benefits such as reducing redundant data. When you reduce redundant data, you can create narrow and compact tables. However, overnormalization of a database schema may affect performance and scalability. Obtaining the right degree of normalization involves tradeoffs. On the one hand, you want a normalized database to limit data duplication and to maintain data integrity. On the other hand, it may be harder to program against fully normalized databases, and performance can suffer.
Addresses are one part of a data model that is typically denormalized. Because many systems store multiple addresses for companies or people over long periods of time, it is relationally correct to have a separate address table and to join to that table to always get the applicable address. However, it is common practice to keep the current address duplicated in the person table or even to keep two addresses because this type of information is fairly static and is accessed often. The performance benefits of avoiding the extra join generally outweigh the consistency problems in this case.
The following denormalization approaches can help:
- Start with a normalized model, and then denormalize if necessary. Do not start with a denormalized model and then normalize it. Typically, each denormalization requires a compensating action to ensure data consistency. The compensating action may affect performance.
- Avoid highly abstracted object models that may be extremely flexible but are complex to understand and result in too many self-joins. For example, many things can be modeled by using an Object table, an Attributes table, and a Relationship table. * This object model is very flexible, but the self-joins, the alias joins, and the number of joins become so cumbersome that it is not only difficult to write queries and understand them, but performance and scalability suffer. For an abstract object model, try to find some common object types that can be used as subtypes under the generic Object type, and then try to find the best balance between flexibility and performance.
Define All Primary Keys and Foreign Key Relationships
Primary keys and foreign key relationships that are correctly defined help ensure that you can write optimal queries. One common result of incorrect relationships is having to add DISTINCT clauses to eliminate redundant data from result sets.
When primary and foreign keys are defined as constraints in the database schema, the server can use that information to create optimal execution plans.
Declarative referential integrity (DRI) performs better than triggers do, and DRI is easier to maintain and troubleshoot than triggers are. DRI is checked by the server before the server performs the actual data modification request. When you use triggers, the data modification requests are inserted in the Inserted and Deleted temporary system tables, and the trigger code is run. Depending on the trigger code, the final modifications are then made or not made,
The sample screen shot in Figure 14.2 shows an execution plan that accesses only one table, although two tables are included in a join in the query. Because there is a declared foreign key relationship between the authors table and the titleauthor table, and the au_id column in the titleauthor table is not allowed to be null, the optimizer knows it does not need to access the authors table to resolve the query. The result of the SET STATISTICS IO command also shows that the authors table is never accessed.
Figure 14.2: Sample execution plan
Define All Unique Constraints and Check Constraints
Unique constraints and check constraints provide more information for the optimizer to use to create optimal execution plans. A unique constraint gives the optimizer information about the expected results. A check constraint can be used to determine whether a table or index has to be accessed to find a result set.
Figure 14.3 shows a query that references a table that is not scanned at execution time because the optimizer knows from the check constraint that no rows can be returned. To try this example, create a check constraint on the Quantity column that allows only values greater than zero. The SET STATISTICS IO command output shows no physical or logical reads and a scan count of zero. The output shows this because the constraint information answered the query.
Figure 14.3: Example of a check constraint that prevents unnecessary reads
For more information, see MSDN article, "SET STATISTICS IO," at http://msdn.microsoft.com/library/en-us/tsqlref/ts_set-set_0q0f.asp.
Choose the Most Appropriate Data Type
Choose the most appropriate data type, with the appropriate size and nullability. Consider each of the following when you are choosing a data type:
- Try to choose the smallest data type that works for each column. Also, choose the most appropriate type because both explicit and implicit conversions may be costly in terms of the time that it takes to do the conversion. They also may be costly in terms of the table or index scans that may occur because the optimizer cannot use an index to evaluate the query.
- Try to avoid nullable foreign key columns to limit the amount of outer joins that might need to be written. Outer joins tend to be more expensive to process than inner joins. If there are cases where the foreign key value may not be known, consider adding a row in the other table that would be the unknown case. Some database architects use one row for the unknown case, one row for the case that is not applicable, and one row for the case that is not determined yet. This approach not only allows for inner joins rather than outer joins, but it provides more information about the actual nature of the foreign key value.
- Columns that use the text data type have extra overhead because they are stored separately on text/image pages rather than on data pages. Use the varchar type instead of text for superior performance for columns that contain less than 8,000 characters.
- The sql_variant data type allows a single column, parameter, or variable to store data values of different data types like int and nchar. However, each instance of a sql_variant column records the data value and additional metadata. The metadata includes the base data type, maximum size, scale, precision, and collation. While sql_variant provides flexibility, the use of sql_variant affects performance because of the additional data type conversion.
- Unicode data types like nchar and nvarchar take twice as much storage space compared to ASCII data types like char and varchar. The speed factors specific to SQL Server are discussed in the article referenced in the following "More Information" section. However, note that strings in the Microsoft .NET Framework and in the Microsoft Windows 2000 kernel are Unicode. If you need or anticipate needing Unicode support, do not hesitate to use them.
For more information, see the "Performance and Storage Space" section of "International Features in Microsoft SQL Server 2000" on MSDN at http://msdn.microsoft.com/library/en-us/dnsql2k/html/intlfeaturesinsqlserver2000.asp.
Use Indexed Views for Denormalization
When you have joins across multiple tables that do not change frequently, such as domain or lookup tables, you can define an indexed view for better performance. An indexed view is a view that is physically stored like a table. The indexed view is updated by SQL Server when any of the tables that the indexed view is based on are updated. This has the added benefit of pulling I/O away from the main tables and indexes.
Partition Tables Vertically and Horizontally
You can use vertical table partitioning to move infrequently used columns into another table. Moving the infrequently used columns makes the main table narrower and allows more rows to fit on a page.
Horizontal table partitioning is a bit more complicated. But when tables that use horizontal table partitioning are designed correctly, you may obtain huge scalability gains. One of the most common scenarios for horizontal table partitioning is to support history or archive databases where partitions can be easily delineated by date. A simple method that you can use to view the data is to use partitioned views in conjunction with check constraints.
Data-dependent routing is even more effective for very large systems. With this approach, you use tables to hold partition information. Access is then routed to the appropriate partition directly so that the overhead of the partitioned view is avoided.
If you use a partitioned view, make sure that the execution plan shows that only the relevant partitions are being accessed. Figure 14.4 shows an execution plan over a partitioned view on three orders tables that have been horizontally partitioned by the OrderDate column. There is one table per year for 1996, 1997, and 1998. Each table has a PartitionID column that has a check constraint. There is also a partition table that includes a PartitionID and the year for that partition. The query then uses the partition table to get the appropriate PartitionID for each year and to access only the appropriate partition.
Although the graphical query plan includes both tables in the plan, moving the mouse over the Tip on the Filter icon shows that this is a start filter, as seen in the STARTUP clause in the argument of the filter. A start filter is a special type of filter that you want to see in plans that use partitioned views.
Figure 14.4: An execution plan that shows the filter details
Note that the SET STATISITCS IO output shown in Figure 14.5 shows that only the Orders98 table was actually accessed.
Figure 14.5: SET STATISTICS IO output
For more information about the graphical execution plan, see "Graphically Displaying the Execution Plan Using SQL Query Analyzer" on MSDN at http://msdn.microsoft.com/library/en-us/optimsql/odp_tun_1_5pde.asp.