As we iterate over results the values for each column specified in the query are returned as a tuple, with each tuple corresponding to a different row.
Filtering with WHERE clauses
Often we will want to get only a subset of the rows that satisfy one or more conditions.
We can restrict the results using an extra WHERE clause:
SELECT<column(s) of interest>FROM<corresponding table(s)>WHERE<condition(s) records must match>
The conditions may be specified using (in)equality operators (=, !=, <, >, <=, >=) and additional operators such as IN, BETWEEN and LIKE.
Conditions can be logically combined using AND and OR, and negated using NOT.
Example: using a WHERE clause
In our example, we may wish to select all columns from the Survey table, returning only rows where the value for the reading column is negative.
This could be implemented as a SQL query as follows:
for result in connection.execute("SELECT * FROM Survey WHERE reading < 0"):print(result)
for result in connection.execute("""SELECT Person.family, Survey.readingFROM Survey, PersonWHERE Survey.person = Person.id AND Person.family = 'Lake'"""):print(result)
The family name of the person who took the reading for all returned results is Lake as expected.
While the previous query works, more commonly this sort of combining of data from tables would usually be expressed using the JOIN and ON keywords
for result in connection.execute("""SELECT Person.family, Survey.readingFROM SurveyJOIN Person ON Survey.person = Person.idWHERE Person.family = 'Lake'"""):print(result)
import pandas as pddataframe = pd.read_sql_table("Person", f"sqlite:///{database_temp_file.name}")dataframe.head()
id
personal
family
0
dyer
William
Dyer
1
pb
Frank
Pabodie
2
lake
Anderson
Lake
3
roe
Valentina
Roerich
4
danforth
Frank
Danforth
Non-relational databases
Sometimes the rigid structure of a relational database is restrictive.
Non-relational or NoSQL databases can store data even when it does not have a consistent structure.
This flexibility is useful, for example, when our data comes in various forms, or when we are still gathering it and therefore don’t know what the best structure to use will be.
There are many NoSQL database systems, such as MongoDB and Neo4j.
Instead of tables with fixed columns, these databases model their contents in different ways.
For example, MongoDB treats its contents as documents containing multiple fields.
The fields don’t need to be the same across documents.
This means we can store entries with different types of information in the same database without issue.
Instead of SQL, each system usually offers its own libraries for accessing a database through a programming language (for example, MongoDB has the pymongo package for Python, and similar for other languages).
Benefit: Access can be more intuitive, without needing to learn a new language.
Downside: Systems differ in how they expect users to structure and access their data.
Comparison
The two approaches are most effective in different situations:
SQL
Emphasises data integrity but requires rigid structure.
Structure focuses on eliminating duplication and improving performance.
Has an established presence and set of tools.
NoSQL
Offers flexibility but leaves checks up to the user.
More concerned with availability & scaling, even if it means replicating data.
Systems show more frequent innovation.
Summary
Databases provide a principled way for storing, sharing and querying large amounts of data.
Relational databases can be queried through SQL.
NoSQL databases offer more flexibility at the expense of potentially fewer guarantees on integrity and consistency.
Footnotes
Connolly and Begg (2014). Database Systems – A Practical Approach to Design Implementation and Management↩︎