apparelrefa.blogg.se - Sql collate

SQL COLLATE HOW TO
SQL COLLATE CODE

If the Case Sensitive rule (_CS) is used: And the last part is only used by the sort and comparison rules.Īs said before, the sensitivity or insensitivity will only be used in rules (sorts or comparisons, also used in the indexes): Case sensitiv “_CS” or Case Insensitiv “_CI”

SQL COLLATE CODE

This will also define the underlining code page. Traditional-Spanish-100, case-insensitive, …Īs you can see, the first part of the collation name design the language that will be used. German-PhoneBook-100, case-insensitive, … Here below part of the result of the query (2397 rows), with the different collations we could use in Western Europe: Nameįrench, case-insensitive, accent-sensitive, … > SELECT name, COLLATIONPROPERTY(name, 'CodePage') AS CodePage, description To list all collations supported in a SQL-Server instance with their corresponding code pages use the following query:

SQL COLLATE HOW TO

How to find out which collation uses wich code page? The drawback of multi-bytes is very easy to imagine, because each character will be stored in 2 bytes, this will need more disk and memory space and more CPU to read, write and compare the characters. In this particular case the collation is only used for sort and comparison rules. And as said before, this is also used for “char”, “varchar” and text when a collation uses the code page “0”.įor Unicode, SQL-Server uses a “UCS-2” code page (Universal Character Set, using only 2 bytes), the underlining code page is identical for all collations and version of SQL Server (since version 7.0) and data do not need to be convert when the collation is changed as this is Unicode and is indipendent from the collation. The column types “nchar”, “nvarchar” and “ntext” are always using the Unicode code page. This means that all code pages (storage) are always case sensitiv and accent sensitiv! This is why the underlining code page must make a difference between all characters, and keep the upper/lower case difference and all stored accents. If this is not the case, then we will lose information… Keep also in mind that each different character that is stored must be returned exactly as it was given. This is also the reason why the 17th code page (Code) is multi-byte and is always used for languages that include more than 256 different characters.

In this case only the Unicode (multi-bytes) can resolve the problem. But this has a drawback, because a single-byte can only store 256 (2^8) different values, which means that only 256 different characters can be defined (one value=one character).įor a standard language or when you mix different similar languages (same code page) this is not a problem, but this becomes a problem as soon as you need more than 256 different characters in the different languages you want to use (that use different code pages). That means that all language characters (and signs) will use only one byte to be stored on disk (or in the memory). Only column types “char”, “varchar” and “text” can use non-Unicode code pages (only one of the 16 single-byte code pages).

Why are different code pages used?įor performance and space (disk-memory) reasons, 16 code pages are single-byte (called non-Unicode) and only one is multi-bytes (Unicode using UCS-2).

SQL Server 2008R2 supports more than 2000 different collations, but these are only stored in 17 different code pages. But it will also define the underline “code page” used to store physically the data of this object. The collation determines the rules SQL Server takes to sort and to compare the characters in an object (generaly a table column). Let’s start with a few questions: What does a Collation exactly define? So keep in mind the best practices, try to keep the same collation definition on all objects, databases and instance. This is due to internal collation convertions, but this is also a very big topic and will not be explained here in detail. You will see that this is not so easy, it can generate lots of problems and in some cases, information can be lost (wrong convertions).Īnd this is not all, if you mix collations between objects, databases and instances, you can expect very bad query performances. In this blog I will explain the methodology that allows to change the collation of all objects (not only the database) and the consequences of these changes on the data. And as they don’t know what they do, they often only perform the small and easiest part of it, changing only the collation on the database. Lots of DBAs are changing the collation of their SQL Server databases without knowing exactly what they do and the consequences of this change.