SQLBolt - Lesson 16
Creating Tables
One of the hardest things to do in data science (at least, to me) is pick up another person’s collected data. It will often take a few minutes of “getting calibrated” to understand the rows from simple things like understanding naming conventions (some are really bad) to more complex things like what story the data is telling.
The more complicated problems will end up needing some visualization tools which is something we’ll look at later, so for now let’s not think too big. However, one of the best ways to become more familiar with a dataset is to build it from scratch. Since we’re building it from scratch, that means we’re going to need some tables to store the data in.
CREATE TABLE IF NOT EXISTS mytable (
column DataType TableConstraint DEFAULT default_value,
another_column DataType TableConstraint DEFAULT d_v,
...
);
The syntax works like this; we start with a IF NOT EXISTS
statement. This will prevent the creation of a duplicate table error, since two tables cannot exist with the same name. You can generally name your table whatever you want, but we like to stick to conventional naming practices; try to remember to use snake casing (snake_case
) and be as descriptive as you can without being too descriptive (think grade_table
vs. grades_for_my_senior_year_2024_2025
).
Inside parenthesis goes the schema for the table; the rows and the types expected for them. You can also use optional constraints on values being inserted or even a default value in the case of NULLS.
Here’s a really basic example;
CREATE TABLE movies(
id INTEGER PRIMARY KEY,
title TEXT,
director TEXT,
year INTEGER,
length_minutes INTEGER
);
Table Data Types
Type | Description |
---|---|
INTEGER , BOOLEAN |
Whole integer values, basic number or age. Bools will be represented as 0 or 1 |
FLOAT , DOUBLE , REAL |
More precise decimal numbers. Used for measurements or fractions |
CHARACTER(num_chars) , VARCHAR(num_chars) , TEXT |
Text-based types, used to store any information in quotations. First two are used when more precise memory management is required |
DATE , DATETIME |
Date and date with time, respectively |
BLOB |
Binary data |
Table Constraints
Certainly not a comprehensive list, but a few that are common and useful.
Constraint | Description |
---|---|
PRIMARY KEY |
Values in this column are unique and are identifies for each individual row |
AUTOINCREMENT |
Integer value that goes up automatically as data is inserted |
UNIQUE |
Ensures no other row in the database has the same value for this specific column |
NOT NULL |
Cannot be empty or NULL |
CHECK (expression) |
Allows you to basically have a conditional statement checked before inserting data |
FOREIGN KEY |
Ensures that there exists a value in another table that corresponds to the newly inserted value |
Next: Lesson 17 & 18