How To Create A Reference Table In Sql
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
CREATE TABLE (Azure Synapse Analytics)
- 18 minutes to read
Thank you.
Applies to: Azure Synapse Analytics Analytics Platform System (PDW)
Creates a new table in Azure Synapse Analytics or Analytics Platform System (PDW).
To understand tables and how to use them, see Tables in Azure Synapse Analytics.
Note
Discussions about Azure Synapse Analytics in this article apply to both Azure Synapse Analytics and Analytics Platform System (PDW) unless otherwise noted.
Note
Serverless SQL pool in Azure Synapse Analytics supports only external and temporary tables.
Transact-SQL Syntax Conventions
Syntax
-- Create a new table. CREATE TABLE { database_name.schema_name.table_name | schema_name.table_name | table_name } ( { column_name <data_type> [ <column_options> ] } [ ,...n ] ) [ WITH ( <table_option> [ ,...n ] ) ] [;] <column_options> ::= [ COLLATE Windows_collation_name ] [ NULL | NOT NULL ] -- default is NULL [ IDENTITY [ ( seed, increment ) ] [ <column_constraint> ] <column_constraint>::= { DEFAULT constant_expression | PRIMARY KEY NONCLUSTERED NOT ENFORCED -- Applies to Azure Synapse Analytics only | UNIQUE NOT ENFORCED -- Applies to Azure Synapse Analytics only } <table_option> ::= { CLUSTERED COLUMNSTORE INDEX -- default for Azure Synapse Analytics | CLUSTERED COLUMNSTORE INDEX ORDER (column [,...n]) | HEAP --default for Parallel Data Warehouse | CLUSTERED INDEX ( { index_column_name [ ASC | DESC ] } [ ,...n ] ) -- default is ASC } { DISTRIBUTION = HASH ( distribution_column_name ) | DISTRIBUTION = ROUND_ROBIN -- default for Azure Synapse Analytics | DISTRIBUTION = REPLICATE -- default for Parallel Data Warehouse } | PARTITION ( partition_column_name RANGE [ LEFT | RIGHT ] -- default is LEFT FOR VALUES ( [ boundary_value [,...n] ] ) ) <data type> ::= datetimeoffset [ ( n ) ] | datetime2 [ ( n ) ] | datetime | smalldatetime | date | time [ ( n ) ] | float [ ( n ) ] | real [ ( n ) ] | decimal [ ( precision [ , scale ] ) ] | numeric [ ( precision [ , scale ] ) ] | money | smallmoney | bigint | int | smallint | tinyint | bit | nvarchar [ ( n | max ) ] -- max applies only to Azure Synapse Analytics | nchar [ ( n ) ] | varchar [ ( n | max ) ] -- max applies only to Azure Synapse Analytics | char [ ( n ) ] | varbinary [ ( n | max ) ] -- max applies only to Azure Synapse Analytics | binary [ ( n ) ] | uniqueidentifier
Arguments
database_name
The name of the database that will contain the new table. The default is the current database.
schema_name
The schema for the table. Specifying schema is optional. If blank, the default schema will be used.
table_name
The name of the new table. To create a local temporary table, precede the table name with #. For explanations and guidance on temporary tables, see Temporary tables in Azure Synapse Analytics.
column_name
The name of a table column.
Column options
COLLATE
Windows_collation_name
Specifies the collation for the expression. The collation must be one of the Windows collations supported by SQL Server. For a list of Windows collations supported by SQL Server, see Windows Collation Name (Transact-SQL)/).
NULL
| NOT NULL
Specifies whether NULL
values are allowed in the column. The default is NULL
.
[ CONSTRAINT
constraint_name ] DEFAULT
constant_expression
Specifies the default column value.
Argument | Explanation |
---|---|
constraint_name | The optional name for the constraint. The constraint name is unique within the database. The name can be reused in other databases. |
constant_expression | The default value for the column. The expression must be a literal value or a constant. For example, these constant expressions are allowed: 'CA' , 4 . These constant expressions aren't allowed: 2+3 , CURRENT_TIMESTAMP . |
Table structure options
For guidance on choosing the type of table, see Indexing tables in Azure Synapse Analytics.
CLUSTERED COLUMNSTORE INDEX
Stores the table as a clustered columnstore index. The clustered columnstore index applies to all of the table data. This behavior is the default for Azure Synapse Analytics.
HEAP
Stores the table as a heap. This behavior is the default for Analytics Platform System (PDW).
CLUSTERED INDEX
( index_column_name [ ,...n ] )
Stores the table as a clustered index with one or more key columns. This behavior stores the data by row. Use index_column_name to specify the name of one or more key columns in the index. For more information, see Rowstore Tables in the General Remarks.
LOCATION = USER_DB
This option is deprecated. It's syntactically accepted, but no longer required and no longer affects behavior.
Table distribution options
To understand how to choose the best distribution method and use distributed tables, see Distributing tables in Azure Synapse Analytics.
DISTRIBUTION = HASH
( distribution_column_name ) Assigns each row to one distribution by hashing the value stored in distribution_column_name. The algorithm is deterministic, which means it always hashes the same value to the same distribution. The distribution column should be defined as NOT NULL because all rows that have NULL are assigned to the same distribution.
DISTRIBUTION = ROUND_ROBIN
Distributes the rows evenly across all the distributions in a round-robin fashion. This behavior is the default for Azure Synapse Analytics.
DISTRIBUTION = REPLICATE
Stores one copy of the table on each Compute node. For Azure Synapse Analytics the table is stored on a distribution database on each Compute node. For Analytics Platform System (PDW), the table is stored in a SQL Server filegroup that spans the Compute node. This behavior is the default for Analytics Platform System (PDW).
Table partition options
For guidance on using table partitions, see Partitioning tables in Azure Synapse Analytics.
PARTITION
( partition_column_name RANGE
[ LEFT
| RIGHT
] FOR VALUES
( [ boundary_value [,...n] ] ))
Creates one or more table partitions. These partitions are horizontal table slices that allow you to apply operations to subsets of rows regardless of whether the table is stored as a heap, clustered index, or clustered columnstore index. Unlike the distribution column, table partitions don't determine the distribution where each row is stored. Instead, table partitions determine how the rows are grouped and stored within each distribution.
Argument | Explanation |
---|---|
partition_column_name | Specifies the column that Azure Synapse Analytics will use to partition the rows. This column can be any data type. Azure Synapse Analytics sorts the partition column values in ascending order. The low-to-high ordering goes from LEFT to RIGHT in the RANGE specification. |
RANGE LEFT | Specifies the boundary value belongs to the partition on the left (lower values). The default is LEFT. |
RANGE RIGHT | Specifies the boundary value belongs to the partition on the right (higher values). |
FOR VALUES ( boundary_value [,...n] ) | Specifies the boundary values for the partition. boundary_value is a constant expression. It can't be NULL. It must either match or be implicitly convertible to the data type of partition_column_name. It can't be truncated during implicit conversion so that the size and scale of the value don't match the data type of partition_column_name If you specify the If you specify one boundary value, the resulting table has two partitions; one for the values lower than the boundary value and one for the values higher than the boundary value. If you move a partition into a non-partitioned table, the non-partitioned table will receive the data, but will not have the partition boundaries in its metadata. |
See Create a partitioned table in the Examples section.
Ordered Clustered columnstore index option
Clustered columnstore index (CCI) is the default for creating tables in Azure Synapse Analytics. Data in CCI is not sorted before being compressed into columnstore segments. When creating a CCI with ORDER, data is sorted before being added to index segments and query performance can be improved. See Performance Tuning with Ordered Clustered Columnstore Index for details.
An ordered CCI can be created on columns of any data types supported in Azure Synapse Analytics except for string columns.
Users can query column_store_order_ordinal column in sys.index_columns for the column(s) a table is ordered on and the sequence in the ordering.
Check Performance tuning with ordered clustered columnstore index for details.
Data type
Azure Synapse Analytics supports the most commonly used data types. Below is a list of the supported data types along with their details and storage bytes. To better understand data types and how to use them, see Data types for tables in Azure Synapse Analytics.
For a table of data type conversions, see the Implicit Conversions section, of CAST and CONVERT (Transact-SQL).
datetimeoffset
[ ( n ) ]
The default value for n is 7.
datetime2
[ ( n ) ]
Same as datetime
, except that you can specify the number of fractional seconds. The default value for n is 7
.
n value | Precision | Scale |
---|---|---|
0 | 19 | 0 |
1 | 21 | 1 |
2 | 22 | 2 |
3 | 23 | 3 |
4 | 24 | 4 |
5 | 25 | 5 |
6 | 26 | 6 |
7 | 27 | 7 |
datetime
Stores date and time of day with 19 to 23 characters according to the Gregorian calendar. The date can contain year, month, and day. The time contains hour, minutes, seconds.As an option, you can display three digits for fractional seconds. The storage size is 8 bytes.
smalldatetime
Stores a date and a time. Storage size is 4 bytes.
date
Stores a date using a maximum of 10 characters for year, month, and day according to the Gregorian calendar. The storage size is 3 bytes. Date is stored as an integer.
time
[ ( n ) ]
The default value for n is 7
.
float
[ ( n ) ]
Approximate number data type for use with floating point numeric data. Floating point data is approximate, which means that not all values in the data type range can be represented exactly. n specifies the number of bits used to store the mantissa of the float
in scientific notation. n dictates the precision and storage size. If n is specified, it must be a value between 1
and 53
. The default value of n is 53
.
n value | Precision | Storage size |
---|---|---|
1-24 | 7 digits | 4 bytes |
25-53 | 15 digits | 8 bytes |
Azure Synapse Analytics treats n as one of two possible values. If 1
<= n <= 24
, n is treated as 24
. If 25
<= n <= 53
, n is treated as 53
.
The Azure Synapse Analytics float
data type complies with the ISO standard for all values of n from 1
through 53
. The synonym for double precision is float(53)
.
real
[ ( n ) ]
The definition of real is the same as float. The ISO synonym for real
is float(24)
.
decimal
[ ( precision [ , scale ] ) ] | numeric
[ ( precision [ , scale ] ) ]
Stores fixed precision and scale numbers.
precision
The maximum total number of decimal digits that can be stored, both to the left and to the right of the decimal point. The precision must be a value from 1
through the maximum precision of 38
. The default precision is 18
.
scale
The maximum number of decimal digits that can be stored to the right of the decimal point. Scale must be a value from 0
through precision. You can only specify scale if precision is specified. The default scale is 0
and so 0
<= scale <= precision. Maximum storage sizes vary, based on the precision.
Precision | Storage bytes |
---|---|
1-9 | 5 |
10-19 | 9 |
20-28 | 13 |
29-38 | 17 |
money
| smallmoney
Data types that represent currency values.
Data Type | Storage bytes |
---|---|
money | 8 |
smallmoney | 4 |
bigint
| int
| smallint
| tinyint
Exact-number data types that use integer data. The storage is shown in the following table.
Data Type | Storage bytes |
---|---|
bigint | 8 |
int | 4 |
smallint | 2 |
tinyint | 1 |
bit
An integer data type that can take the value of 1
, 0
, or `NULL. Azure Synapse Analytics optimizes storage of bit columns. If there are 8 or fewer bit columns in a table, the columns are stored as 1 byte. If there are from 9-16 bit columns, the columns are stored as 2 bytes, and so on.
nvarchar
[ ( n | max
) ] -- max
applies only to Azure Synapse Analytics.
Variable-length Unicode character data. n can be a value from 1 through 4000. max
indicates that the maximum storage size is 2^31-1 bytes (2 GB). Storage size in bytes is two times the number of characters entered + 2 bytes. The data entered can be zero characters in length.
nchar
[ ( n ) ]
Fixed-length Unicode character data with a length of n characters. n must be a value from 1
through 4000
. The storage size is two times n bytes.
varchar
[ ( n | max
) ] -- max
applies only to Azure Synapse Analytics.
Variable-length, non-Unicode character data with a length of n bytes. n must be a value from 1
to 8000
. max
indicates that the maximum storage size is 2^31-1 bytes (2 GB).The storage size is the actual length of data entered + 2 bytes.
char
[ ( n ) ]
Fixed-length, non-Unicode character data with a length of n bytes. n must be a value from 1
to 8000
. The storage size is n bytes. The default for n is 1
.
varbinary
[ ( n | max
) ] -- max
applies only to Azure Synapse Analytics.
Variable-length binary data. n can be a value from 1
to 8000
. max
indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size is the actual length of data entered + 2 bytes. The default value for n is 7.
binary
[ ( n ) ]
Fixed-length binary data with a length of n bytes. n can be a value from 1
to 8000
. The storage size is n bytes. The default value for n is 7
.
uniqueidentifier
Is a 16-byte GUID.
Permissions
Creating a table requires permission in the db_ddladmin
fixed database role, or:
-
CREATE TABLE
permission on the database -
ALTER SCHEMA
permission on the schema that will contain the table.
Creating a partitioned table requires permission in the db_ddladmin
fixed database role, or
-
ALTER ANY DATASPACE
permission
The login that creates a local temporary table receives CONTROL
, INSERT
, SELECT
, and UPDATE
permissions on the table.
General Remarks
For minimum and maximum limits, see Azure Synapse Analytics capacity limits.
Determining the number of table partitions
Each user-defined table is divided into multiple smaller tables that are stored in separate locations called distributions. Azure Synapse Analytics uses 60 distributions. In Analytics Platform System (PDW), the number of distributions depends on the number of Compute nodes.
Each distribution contains all table partitions. For example, if there are 60 distributions and four table partitions plus one empty partition, there will be 300 partitions (5 x 60= 300). If the table is a clustered columnstore index, there will be one columnstore index per partition, which means you'll have 300 columnstore indexes.
We recommend using fewer table partitions to ensure each columnstore index has enough rows to take advantage of the benefits of columnstore indexes. For more information, see Partitioning tables in Azure Synapse Analytics and Indexing tables in Azure Synapse Analytics
Rowstore table (heap or clustered index)
A rowstore table is a table stored in row-by-row order. It's a heap or clustered index. Azure Synapse Analytics creates all rowstore tables with page compression; this behavior isn't user-configurable.
Columnstore table (columnstore index)
A columnstore table is a table stored in column-by-column order. The columnstore index is the technology that manages data stored in a columnstore table. The clustered columnstore index doesn't affect how data are distributed, rather, it affects how the data are stored within each distribution.
To change a rowstore table to a columnstore table, drop all existing indexes on the table and create a clustered columnstore index. For an example, see CREATE COLUMNSTORE INDEX (Transact-SQL).
For more information, see these articles:
-
Columnstore indexes versioned feature summary
-
Indexing tables in Azure Synapse Analytics
-
Columnstore Indexes Guide
Limitations and Restrictions
You can't define a DEFAULT constraint on a distribution column.
Partitions
When using partitions, the partition column can't have a Unicode-only collation. For example, the following statement fails.
CREATE TABLE t1 ( c1 varchar(20) COLLATE Divehi_90_CI_AS_KS_WS) WITH (PARTITION (c1 RANGE FOR VALUES (N'')))
If boundary_value is a literal value that must be implicitly converted to the data type in partition_column_name, a discrepancy will occur. The literal value is displayed through the Azure Synapse Analytics system views, but the converted value is used for Transact-SQL operations.
Temporary tables
Global temporary tables that begin with ## aren't supported.
Local temporary tables have the following limitations and restrictions:
- They're visible only to the current session. Azure Synapse Analytics drops them automatically at the end of the session. To drop them explicitly, use the DROP TABLE statement.
- They can't be renamed.
- They can't have partitions or views.
- Their permissions can't be changed.
GRANT
,DENY
, andREVOKE
statements can't be used with local temporary tables. - Database console commands are blocked for temporary tables.
- If more than one local temporary table is used within a batch, each must have a unique name. If multiple sessions are running the same batch and creating the same local temporary table, Azure Synapse Analytics internally appends a numeric suffix to the local temporary table name to maintain a unique name for each local temporary table.
Locking behavior
Takes an exclusive lock on the table. Takes a shared lock on the DATABASE, SCHEMA, and SCHEMARESOLUTION objects.
Examples for columns
A. Specify a column collation
In the following example, the table MyTable
is created with two different column collations. By default, the column, mycolumn1
, has the default collation Latin1_General_100_CI_AS_KS_WS. The column, mycolumn2
has the collation Frisian_100_CS_AS.
CREATE TABLE MyTable ( mycolumnnn1 nvarchar, mycolumn2 nvarchar COLLATE Frisian_100_CS_AS ) WITH ( CLUSTERED COLUMNSTORE INDEX ) ;
B. Specify a DEFAULT constraint for a column
The following example shows the syntax to specify a default value for a column. The colA column has a default constraint named constraint_colA and a default value of 0.
CREATE TABLE MyTable ( colA int CONSTRAINT constraint_colA DEFAULT 0, colB nvarchar COLLATE Frisian_100_CS_AS ) WITH ( CLUSTERED COLUMNSTORE INDEX ) ;
Examples for temporary tables
C. Create a local temporary table
The following example creates a local temporary table named #myTable. The table is specified with a three-part name, which starts with a #.
CREATE TABLE AdventureWorks.dbo.#myTable ( id int NOT NULL, lastName varchar(20), zipCode varchar(6) ) WITH ( DISTRIBUTION = HASH (id), CLUSTERED COLUMNSTORE INDEX ) ;
Examples for table structure
D. Create a table with a clustered columnstore index
The following example creates a distributed table with a clustered columnstore index. Each distribution will be stored as a columnstore.
The clustered columnstore index doesn't affect how the data is distributed; data is always distributed by row. The clustered columnstore index affects how the data is stored within each distribution.
CREATE TABLE MyTable ( colA int CONSTRAINT constraint_colA DEFAULT 0, colB nvarchar COLLATE Frisian_100_CS_AS ) WITH ( DISTRIBUTION = HASH ( colB ), CLUSTERED COLUMNSTORE INDEX ) ;
E. Create an ordered clustered columnstore index
The following example shows how to create an ordered clustered columnstore index. The index is ordered on SHIPDATE.
CREATE TABLE Lineitem WITH (DISTRIBUTION = ROUND_ROBIN, CLUSTERED COLUMNSTORE INDEX ORDER(SHIPDATE)) AS SELECT * FROM ext_Lineitem
Examples for table distribution
F. Create a ROUND_ROBIN table
The following example creates a ROUND_ROBIN table with three columns and without partitions. The data is spread across all distributions. The table is created with a CLUSTERED COLUMNSTORE INDEX, which gives better performance and data compression than a heap or rowstore clustered index.
CREATE TABLE myTable ( id int NOT NULL, lastName varchar(20), zipCode varchar(6) ) WITH ( CLUSTERED COLUMNSTORE INDEX );
G. Create a hash-distributed table
The following example creates the same table as the previous example. However, for this table, rows are distributed (on the id
column) instead of randomly spread like a ROUND_ROBIN table. The table is created with a CLUSTERED COLUMNSTORE INDEX, which gives better performance and data compression than a heap or rowstore clustered index.
CREATE TABLE myTable ( id int NOT NULL, lastName varchar(20), zipCode varchar(6) ) WITH ( DISTRIBUTION = HASH (id), CLUSTERED COLUMNSTORE INDEX );
H. Create a replicated table
The following example creates a replicated table similar to the previous examples. Replicated tables are copied in full to each Compute node. With this copy on each Compute node, data movement is reduced for queries. This example is created with a CLUSTERED INDEX, which gives better data compression than a heap. A heap may not contain enough rows to achieve good CLUSTERED COLUMNSTORE INDEX compression.
CREATE TABLE myTable ( id int NOT NULL, lastName varchar(20), zipCode varchar(6) ) WITH ( DISTRIBUTION = REPLICATE, CLUSTERED INDEX (lastName) );
Examples for table partitions
I. Create a partitioned table
The following example creates the same table as shown in example A, with the addition of RANGE LEFT partitioning on the id
column. It specifies four partition boundary values, which results in five partitions.
CREATE TABLE myTable ( id int NOT NULL, lastName varchar(20), zipCode int) WITH ( PARTITION ( id RANGE LEFT FOR VALUES (10, 20, 30, 40 )), CLUSTERED COLUMNSTORE INDEX ) ;
In this example, data will be sorted into the following partitions:
- Partition 1: col <= 10
- Partition 2: 10 < col <= 20
- Partition 3: 20 < col <= 30
- Partition 4: 30 < col <= 40
- Partition 5: 40 < col
If this same table was partitioned RANGE RIGHT instead of RANGE LEFT (default), the data will be sorted into the following partitions:
- Partition 1: col < 10
- Partition 2: 10 <= col < 20
- Partition 3: 20 <= col < 30
- Partition 4: 30 <= col < 40
- Partition 5: 40 <= col
J. Create a partitioned table with one partition
The following example creates a partitioned table with one partition. It doesn't specify any boundary value, which results in one partition.
CREATE TABLE myTable ( id int NOT NULL, lastName varchar(20), zipCode int) WITH ( PARTITION ( id RANGE LEFT FOR VALUES ( )), CLUSTERED COLUMNSTORE INDEX ) ;
K. Create a table with date partitioning
The following example creates a new table named myTable
, with partitioning on a date
column. By using RANGE RIGHT and dates for the boundary values, it puts a month of data in each partition.
CREATE TABLE myTable ( l_orderkey bigint, l_partkey bigint, l_suppkey bigint, l_linenumber bigint, l_quantity decimal(15,2), l_extendedprice decimal(15,2), l_discount decimal(15,2), l_tax decimal(15,2), l_returnflag char(1), l_linestatus char(1), l_shipdate date, l_commitdate date, l_receiptdate date, l_shipinstruct char(25), l_shipmode char(10), l_comment varchar(44)) WITH ( DISTRIBUTION = HASH (l_orderkey), CLUSTERED COLUMNSTORE INDEX, PARTITION ( l_shipdate RANGE RIGHT FOR VALUES ( '1992-01-01','1992-02-01','1992-03-01','1992-04-01','1992-05-01', '1992-06-01','1992-07-01','1992-08-01','1992-09-01','1992-10-01', '1992-11-01','1992-12-01','1993-01-01','1993-02-01','1993-03-01', '1993-04-01','1993-05-01','1993-06-01','1993-07-01','1993-08-01', '1993-09-01','1993-10-01','1993-11-01','1993-12-01','1994-01-01', '1994-02-01','1994-03-01','1994-04-01','1994-05-01','1994-06-01', '1994-07-01','1994-08-01','1994-09-01','1994-10-01','1994-11-01', '1994-12-01' )) );
See also
CREATE TABLE AS SELECT (Azure Synapse Analytics)
DROP TABLE (Transact-SQL)
ALTER TABLE (Transact-SQL)
sys.index_columns (Transact-SQL)
How To Create A Reference Table In Sql
Source: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-azure-sql-data-warehouse
Posted by: kirbythimakeent.blogspot.com
0 Response to "How To Create A Reference Table In Sql"
Post a Comment