Gethyn Ellis, SQL Consultant, Northdoor
8th October 2019
There is a saying in the aviation industry that “any fool can fly a plane, it’s just that hardly anyone can land the thing”. I think you can adapt that analogy to the world of databases. “Anyone can build a database but not everyone can make it perform well.” This is especially true if your database platform comes with an easy to use graphical user interface such as SQL Server’s management studio.
If this is the situation you find yourself in, what can you do to make your database perform to the best of its ability?
Get your initial design correct and you will save a lot of time and effort (and headaches) later.
In this post, I will assume that you are building or managing an Online Transaction Processing (OLTP_ database, one that has regular read and writes and likely it would be connected to an application or website. Such as an insurance claims system or HR system. Rather than a data warehouse for example which is more for analysing large amounts of static data and is, therefore, will very likely to have a different design.
If you are working with or building an OLTP database hopefully you have heard of the term normalisation? Ultimately this is the process that underpins all OLTP databases and will ensure that your tables are structured correctly. This Wikipedia page is a good starting point and also the tutorial on W3 Schools is useful too
When it comes to designing your tables there are some important rules that can often be overlooked by developers.
Firstly, make sure you are using the most appropriate data types. For example, if you need to hold a date then use a date field, never use a text field. Globally dates are written in all sorts of different ways, for example in North America they write their dates with the month first (October 2nd 2019 or 10/2/2019), as opposed to day first which is the convention we follow in the UK (2nd October 2019 or 2/10/2019). As you can see from this example, this has the potential to lead to all sorts of issues of misinterpretation and confusion. From a performance perspective, the continuous conversion of data types can have an overhead on the database optimiser leading to bad performance. So use the correct data type for the data being held.
Also, ensure your fields and data types are the right size for your expected data. Using date as an example, if you only need to hold a date and time to the nearest second use the smalldatetime or datetime2. If you use datetime you will be storing the time to the nearest millisecond taking up valuable disk space. The same goes for integers, do you need an int, bigint or smallint. These are all database design decisions. By using the correct size data type you are potentially reducing the number of reads needed to retrieve data, and this helps to improve performance.
In transactional systems where a table data will be re-read/updated or deleted it is advisable to assign a Primary Key to the table. Ideally, the primary key should be a narrow column and be written to sequentially. IDENTITY columns provide a mechanism to auto-generate values and most often tend to be primary keys. Primary Keys are not only important in ensuring data integrity but so often critical to ensuring database performance. By default creating a primary key on a table in SQL Sever will create a clustered index on the primary column(s). This can really help the optimizer retrieve data and really help performance.
When it comes to text data, we have an extra complication of Unicode versus non Unicode n(var)char or (var)char. Unicode datatypes such as nvarchar take more space to hold the character but can use this space to hold many different types of characters. If you don’t expect to hold data with characters from non-Latin languages then don’t waste all that space with a Unicode field. A word of caution is also that whilst you may see datatypes text and ntext these are deprecated datatypes, these should be avoided. It is strongly advised to use varchar(max) or nvarchar(max) for storing larger textual data in SQL Server.
Another thing often overlooked is constraints, setting these correctly can make a huge difference to the query engine when it tries to run your queries. I have seen adding a constraint on a column change a query from a few minutes to a few seconds, here are some examples:
NULLs.: If you know that your field must always have a value make sure you set it to “Not NULL”. Follow this up by setting default values for as many of these columns as possible for new records
Unique constraints can make a huge difference to the query engine as well. If it knows a column is unique, once it has found one value it knows there is no further values in the table. If you know that a row of data will be unique then tell SQL server and it will be put to good use.
This leads to a special type of constraint called a primary key which is a combination of a unique constraint and a Not NULL constraint. It ensures that every row in a table is unique and is identifiable. Every database table should really have a primary key for optimum performance. If you don’t have an obvious unique column then use an identity column, this is a column that automatically populates the column with a new unique ID each time you create a new row. But be careful, your primary key should be as small as possible for good performance.
The final part of your design is to create indexes for your normalised tables. If you have normalised your database you will most likely have lots of tables that are linked together in what is known as a one to many relationship. Using a primary key – foreign key relationship. Each link will go from a unique column or primary key (known as the parent table or the “one” in the one to many relationship) in one table to many records with that value in another table (known as the child table or the “many” side of the relationship) Make sure that your foreign key columns in the many tables have an index. The query engine will make good use of these keys when you run queries (joins) that pull data from both tables.
Apply these rules to every table you create and you will have a good foundation for your database.