IEEE.org     |     IEEE Xplore Digital Library     |     IEEE Standards     |     IEEE Spectrum     |     More Sites

Verified Commit a57f5e90 authored by Emi Simpson's avatar Emi Simpson
Browse files

[new arch] Document the Query module

parent 478a065d
'''
# Query Flow
In order to allow efficient testing, analysis, and type checking, Mystic's actions
**describe** what sort of queries they want to preform, rather than performing them during
the function call. A separate module, :mod:`mystic.query_driver`, is responsible for executing the
described queries, but this module is responsible for describing them.
Typically, the execution flow looks a little like this:
- An endpoint produces a :class:`Query`
- The driver calls :func:`Query.get_query()`, which produces the SQL and arguments to be
run
- The driver reports the results of executing the SQL back to the :class:`Query` using
:func:`Query.handle_results()`
- The query can then report one of three things:
- The query is finished, and executed successfully producing a value
- The query encountered an expected error or was unsuccessful
- The query hasn't finished resolving yet
- Either of the first two options result in the value being returned from the driver,
unless the last option is used, at which point the query will provide another query
for the driver to execute in its place, and the cycle continues from step 2
Most queries are simple, and will return after executing a single SQL statement. However,
it's not uncommon to need to execute multiple SQL statements or to conditionally execute
an SQL statement. For this reason, the :class:`BoundQuery` class is provided, which
allows multiple queries to be chained together.
**NOTE:** In this module, "Query" refers to a representation of a query, typically a class
which, when constructed with several arguments, can describe a series of SQL statements
and interpret their results, before finally returning some structured and typed data.
In an attempt to seperate a "Query" in this sense from what is typically called a query in
SQL (that is, an SQL statement which is fed into a database returning an untyped tuple of
raw information), this documentation refers to the former notion as a Query (or
:class:`Query`) and the later notion as an SQL statement or the execution of an SQL
statement.
'''
from mystic.request import ProjectID, UncheckedPID, UserID
from mystic.sources import Source
......@@ -6,6 +43,12 @@ from enum import Enum, auto
from typing import Any, Callable, cast, Collection, Generic, List, NamedTuple, NoReturn, Optional, Protocol, Sequence, Tuple, TypeVar
class SqlErrorCode(Enum):
"""
Represents possible error codes produced by SQL
Only error codes that can be returned as part of an IntegrityError are mapped, and in
some cases, multiple real error codes map to a single virtual error code.
"""
DUP_ENTRY = auto(),
NO_REFERENCED_ROW = auto(),
ROW_IS_REFERENCED = auto(),
......@@ -13,20 +56,98 @@ class SqlErrorCode(Enum):
BAD_NULL_ERROR = auto(),
class SqlIntegrityError(NamedTuple):
"""
An Integrity error produced by SQL
This typically means that the database is in an invalid state to accomidate an
otherwise valid query. For example, adding a duplicate entry would produce an
IntegrityError, not because the Query was wrong, but because the target entry already
existed in the database.
In this way, IntegrityErrors can be used to glean small amounts of information about
the state of the code
"""
error_code: SqlErrorCode
"""
A machine-readable code indicating what type of error this is
"""
error_message: str
"""
A developer-readable message explaining the error in a little bit more detail
"""
class QueryResult(NamedTuple):
"""
Represents a successful (non-sql-error) result from a :class:`QueryRequest`
This is the main (only) way that :class:`Query`s can retreive information about the
database. Typically, this class is constructed by whatever is driving the Query, and
then passed to :func:`Query.handle_results()`.
This class aims to summarize the main elements of a response concisely, exposing all
main data immutably and in a controlled fashion. Technically the `next` and `all`
methods do perform IO under the hood, but for the purposes of the Query, they may as
well just be lazy immutable data.
"""
row_count: int
"""
The number of rows affected by the execution of SQL statement
"""
last_row_id: int
"""
The foreign key of the last row affected by the execution of SQL statement
This may not be an `int` where the primary key isn't an `int`. As such, no guarantees
are made about this feild for tables which don't feature an `int` primary key.
"""
next: Callable[[], Optional[Tuple[Any, ...]]]
"""
Retreive the next item in the query results
Think of this as a lazy sequence of times. Each call pops one element off of the lazy
list.
"""
all: Callable[[], Sequence[Tuple[Any, ...]]]
"""
Produce a sequence of all records returned by the execution of the query
"""
class QueryRequest(NamedTuple):
"""
A description of a single SQL statement
Where a :class:`QueryResult` is how :class:`Query`s learn about the results of a
query, a :class:`QueryRequest` is how they describe what SQL to run in the first
place.
This is as simple as providing a string containing an SQL statement as well as
arguments to be substituted in
"""
query: str
"""
A string containing a single SQL statement
Variable substitions are possible using PyMySql's substitution syntax. The actual
values to be subsituted in should be placed in :attr:`args` attribute.
As always with SQL statements, directly placing any kind of data into the statement
using concatination or regular string formatting is **very dangerous**. Instead,
prefer to use substitutions using :attr:`args`
"""
args: Tuple[Any, ...] | List[Any]
"""
Arguments to be substituted into :attr:`query`
"""
@staticmethod
def _normalize_query_string(q: str) -> list[str]:
"""
Normalizes a query string, like that in :attr:`query` into an array of tokens
This is used for equality checking - by controlling for whitespace and
punctuation, queries which are identical in effect but not in streq can be
correctly marked as identical
"""
return [
word
for line in q\
......@@ -59,18 +180,58 @@ E = TypeVar('E', covariant=True)
@dataclass(frozen=True)
class Finished(Generic[T]):
"""
Designates that a query has executed successfully, producing a result
Part of a meta-type of :class:`Finished`, :class:`Error`, and :class:`Unfinished`.
Any of these three classes can be reported by a query, and each describes a different
state.
**See also:** :meth:`Query.handle_results`
"""
value: T
@dataclass(frozen=True)
class Error(Generic[E]):
"""
Designates that a query has encountered an expected error
Part of a meta-type of :class:`Finished`, :class:`Error`, and :class:`Unfinished`.
Any of these three classes can be reported by a query, and each describes a different
state.
**See also:** :meth:`Query.handle_results`
"""
value: E
@dataclass(frozen=True)
class Unfinished(Generic[T, E]):
"""
Designates that a query has partially completed, but more parts are left to run
Part of a meta-type of :class:`Finished`, :class:`Error`, and :class:`Unfinished`.
Any of these three classes can be reported by a query, and each describes a different
state.
**See also:** :meth:`Query.handle_results`
"""
next_query: 'Query[T, E]'
"""
The query that should now be run in place of the original
"""
@dataclass(frozen=True)
class Noop(Generic[T]):
"""
A :class:`Query` that always returns a fixed value
Crucially, this is one of two ":class:`Query`s" that **don't** trigger a SQL query
when run with the default driver. However, a database connection will still need to
be opened, so this should typically only be used where at least one real query is
being used.
See :class:`ENoop` for the erroring counterpart of this "query".
"""
value: T
def get_query(self) -> QueryRequest:
return QueryRequest('',(self.value,))
......@@ -80,6 +241,12 @@ class Noop(Generic[T]):
@dataclass(frozen=True)
class ENoop(Generic[T]):
"""
A :class:`Query` that always produces an error with a fixed value
Along with :class:`Noop`, this is one of two queries which does not produce an SQL
query when run with the default driver.
"""
value: T
def get_query(self) -> QueryRequest:
return QueryRequest('',(self.value,))
......@@ -88,15 +255,111 @@ class ENoop(Generic[T]):
return Error(self.value)
class Query(Protocol[T, E]):
"""
Describes a class which may be used to describe an SQL query
The :mod:`mystic.queries` module is themed around the idea of describing SQL queries
and their results. This protocol, then, is the centerpeice of the module. The
:class:`Query` protocol describes two methods necessary to describe a Query - one to
describe an SQL statement and one to deal with the result.
**Remember:** A :class:`Query` only *describes* a series of SQL statements. This
class has no methods for *executing* those statements. If you have a `Query` you want
to run, you'll need a driver, such as :func:`mystic.query_driver.drive_query_to_end()`
"""
def get_query(self) -> QueryRequest:
"""
Describe the next SQL statement necessary for this Query
This produces a singular SQL statement and it's arguments. If a query depends on
multiple statements being executed, it should define one :class:`Query` for each
stage/statement, and have the first Query's :meth:`handle_results` return
:class:`Unfinished` with the second stage, and so on until all statements have been
executed.
"""
...
def handle_results(self, results: QueryResult | SqlIntegrityError) -> Finished[T] | Error[E] | Unfinished[T, E]:
"""
Interpret the results of the execution of an SQL statement
After a statement is executed, one of two things can happen: Either the statement
can execute successfully, producing zero or more records and a number of rows
affected, or it can fail.
The first case is represented using a :class:`QueryResult`, while the second is
represented using an :class:`SqlIntegrityError`. Note that only integrity errors
are caught, as other errors typically indicate either a programming error (which
will be handled by the driver and displayed along with some debug info) or a
connection error, which should also be handled by the driver.
It is the responsibility of the :class:`Query` to interpret these results. The
interpretations available for a query to make are as follows:
- :class:`Finished`: The statement executed normally, and the intended result was
acheived. The normal execution flow can continue.
- :class:`Error`: The query encountered an anticipated issue. Processing should
stop, but because we know what the issue is, we know what it means and how to
deal with it. The normal ("happy path") execution flow should be haulted, and
we should divert to the exception path.
- :class:`Unfinished`: The statement executed normally, but at least one more
statement needs to be executed before results are available. The associated
:class:`Query` is a new, different query which should be executed instead of
this one.
- Raising an exception: An unexpected error was encountered, which we don't know
anything about, least of all what it means. It wasn't even supposed to be
possible in the first place. Execution flow should stop and an error should be
printed.
"""
...
Q1 = TypeVar('Q1', covariant=True)
Q2 = TypeVar('Q2', covariant=True)
@dataclass(frozen=True)
class BoundQuery(Generic[Q1, Q2, E]):
"""
A :class:`Query` which executes a :class:`Query` conditional on the results of a first
The :class:`BoundQuery` class enables the monadic behavior necessary to the
description of complex queries. A `BoundQuery` is unique in that it doesn't have any
preprogrammed SQL statement associated with it. Instead, it acts as the glue between
two other queries.
A `BoundQuery` takes two arguments. The first is a normal :class:`Query`. The second
is a function which takes as an argument the output of the first query. This function
then returns *another query*.
This is executed as follows:
- First, the normal (first) query is executed like normal.
- Once it's finished, it will produce some value
- That value is passed to the function to produce a second query
- That second query is run until it yeilds a value
- That value is returned as the final result.
If either query produces an :class:`Error` at any point in this process, that error is
returned instead. For this reason, the :class:`Error` types of both queries need to
be identical. You can reconcile two mismatched error types using a
:class:`MappedQuery`.
## Example
Suppose you want to retreive some information from the database, run some (pure in a
functional sense) algorithm on it, and write the result back to the database. This
flow requires two SQL statements to be run - one to retreive the data, and one to
write it back after the code is finished executing. However, the data that the second
query needs to write back depends on the data that the first query produces.
Here's how you would solve this conundrum using a `BoundQuery`:
```python
output = BoundQuery(
RetreiveTheDataQuery(), # The first step
lambda my_data: ( # The data retreived above will be available here
StoreTheData( # After the above step, we want to write the data back
run_my_algorithm(my_data) # And here's the data that we want to write
)
)
)
```
"""
query: Query[Q1, E]
transformation: Callable[[Q1], Query[Q2, E]]
def get_query(self) -> QueryRequest:
......@@ -114,6 +377,25 @@ E1 = TypeVar('E1', covariant=True)
E2 = TypeVar('E2', covariant=True)
@dataclass(frozen=True)
class MappedQuery(Generic[Q1, Q2, E1, E2]):
"""
A :class:`Query` which transforms the output/error of an underlying `Query`
A `MappedQuery` doesn't have an SQL statement of its own to run. Rather, it is used
to transform the results of another query. Does one query return a `boolean` error
value, but you want to convert it into a `str`? Use a `MappedQuery`! Need to format
that `int` into an :class:`mystic.outcome.Outcome`? `MappedQuery`!
Here's how it works: The :class:`MappedQuery` constructor takes three values. The
first is the underlying :class:`Query`, which will be executed as normal. Then, it
takes two functions. The first takes whatever the output of the query will be and
transforms it into a shape that suites your need. The second does the same, except on
the error value of the query. What results is a :class:`Query` that has the output
and error types not of the original query, but of your functions. Handy!
Only the transformation function that corresponds to the actual output of the query
will be run. That is, if the query you passed succeeds, your error transforming
function will never run. Keep that in mind!
"""
query: Query[Q1, E1]
s_transformation: Callable[[Q1], Q2]
e_transformation: Callable[[E1], E2]
......@@ -131,6 +413,19 @@ class MappedQuery(Generic[Q1, Q2, E1, E2]):
## Practical Queries
class DeleteSourceIfOwned(NamedTuple):
"""
A :class:`Query` which removes a data source iff the provided user owns the project
The constructor to `DeleteSourceIfOwned` takes two values: A `source_id`
corresponding to the primary key of the data source to be removed, and a `user_id`.
If the named source belongs to a project of which the named user is the owner, then
that source will be deleted from the project, and the query will have
:class:`Finished` successfully, returning the `source_id`.
If the user does not own the project to which the source belongs, or if the
`source_id` or `user_id` doesn't exist in the database, then the Query produces an
error containing the `source_id`.
"""
source_id: int
user_id: UserID
def get_query(self) -> QueryRequest:
......@@ -155,6 +450,19 @@ class DeleteSourceIfOwned(NamedTuple):
raise Exception(f"The DeleteSourceIfOwned query should not produce an integrity error, but we got {results}")
class ValidateProjectOwned(NamedTuple):
"""
Check if a user has permission to modify a given project
If the user and project both exist and the user has permission for the project, then
the query succeeds with the validated ID of the project. Otherwise, a nondescript
error is returned indicating that at least one of the following is true:
- The user does not exist, or has never logged in
- The project does not exist or has been deleted
- The user does not or no longer has permission to modify the project
If you need more details about the project than just its existance, try using
:class:`RetreiveProjectInfoifOwned`
"""
project_id: UncheckedPID
user_id: UserID
def get_query(self) -> QueryRequest:
......@@ -175,12 +483,45 @@ class ValidateProjectOwned(NamedTuple):
raise Exception(f"The DeleteSourceIfOwned query should not produce an integrity error, but we got {results}")
class ProjectInfo(NamedTuple):
"""
All of the information that can be efficiently discovered about a project
This can be retreived from :class:`RetreiveProjectInfoIfOwned`
"""
project_id: ProjectID
"""
The numeric ID of the project
"""
slug: str
"""
A slug for the project
Used to form the URLs that refer to the project
"""
display_name: str
"""
A pretty display name for the project
"""
description: str
"""
A long (multi-paragraph) description of the project provided by the user
"""
draft_owner: Optional[UserID]
"""
Who does the draft of this project belong to, if it is indeed a draft
For all standard projects, this will be `None`. However, for projects which are
drafts, this will be set to the :class:`UserID` of the owner the draft belongs to, as
drafts do not have normal owners listed in the database
"""
class RetreiveProjectInfoIfOwned(NamedTuple):
"""
Retreive details about a project iff the named user has permission to modify it
Identical to :class:`ValidateProjectOwned` except that it returns far more detail
about the project, at the cost of being a slightly more expensive query to run (1
indexed `JOIN`)
"""
project_id: UncheckedPID
user_id: UserID
def get_query(self) -> QueryRequest:
......@@ -205,12 +546,37 @@ class RetreiveProjectInfoIfOwned(NamedTuple):
raise Exception(f"The DeleteSourceIfOwned query should not produce an integrity error, but we got {results}")
class AddSourceError(Enum):
"""
Errors that can occur during :class:`AddSourceToProject`
"""
NonexistantProject = auto()
"""
The named project did not exist, despite validation
"""
SourceAlreadyPresent = auto()
"""
The source that was being added was already present in the database.
"""
class AddSourceToProject(NamedTuple):
"""
Adds a data source to a project
If this succeeds, then the ID of the new source is produced. Otherwise, one of the
:class:`AddSourceError`s will be produced. The conditions which produce these errors
are described in the documentation for :class:`AddSourceError`.
"""
project_id: ProjectID
"""
The ID of the project that the source is to added to
"""
source_type: str
"""
The type of the source, e.g. "github", "rss", or "git"
"""
source_url: str
"""
The URL of the repository of the source
"""
def get_query(self) -> QueryRequest:
return QueryRequest('''
......@@ -235,12 +601,37 @@ class AddSourceToProject(NamedTuple):
return Finished(results.last_row_id)
class AddOwnerError(Enum):
"""
An error that can occur during :class:`AddOwnerByUsername`
"""
NonexistantProject = auto()
"""
The named project does not exist, despite validation
"""
DuplicateOwner = auto()
"""
The owner that was being added is already listed as an owner
"""
UsernameDNE = auto()
"""
The username provided does not correspond to any user
"""
class AddOwnerByUsername(NamedTuple):
"""
A :class:`Query` which promotes a user with a given username to an owner of a project
Produces a nondescript success when successful, and an :class:`AddOwnerError` if there
were any problems. You can learn the circumstances which give rise to possible errors
in the docs for :class:`AddOwnerError`
"""
username: str
"""
The username of the owner-to-be
"""
project: ProjectID
"""
The project to which the owner is being added
"""
def get_query(self) -> QueryRequest:
return QueryRequest("""
INSERT INTO owners(user_id, project_id)
......@@ -263,13 +654,51 @@ class AddOwnerByUsername(NamedTuple):
return Finished(None)
class UpdateProjectError(Enum):
"""
An error whic can occur during :class:`UpdateProjectProperties`
"""
DuplicateSlug = auto()
"""
Attempted to change the project's slug, but the target slug was already taken
"""
NonexistantProjectOrNoChange = auto()
"""
Either the project doesn't exist, or no change was made
Using a validated project ID (:class:`ProjectID` instead of :class:`UncheckedPID`)
should guarantee that the project exists, meaning that this can be expect to mean just
NoChange for most cases.
"""
class UpdateProjectProperties(NamedTuple):
"""
A :class:`Query` to attempt to change some of the properties associated with a project
A successful change produces a boolean value. This value is set to `True` if this
query would have or did change this project from a draft to a finalized project. This
flag will be set even if the project was already a finalized project. You can use
it to determine when a project is no longer a draft by comparing it to the original
status of the project as a draft. If the project was a draft and the flag is true, it
was finalized.
An error produces an :class:`UpdateProjectError`. See the respective docs for more
details on what conditions give rise to such an error.
"""
project_id: ProjectID
"""
The ID of the project whose properties are being updated
"""
slug: Optional[str]
"""
The new slug for the project, or `None` to leave it unchanged
"""
display_name: Optional[str]
"""
The display name for the project, or `None` to leave it unchanged
"""
description: Optional[str]
"""
The description for the project, or `None` to leave it unchanged
"""
def get_query(self) -> QueryRequest:
values: Tuple[ProjectID|str, ...]
query_parts, values = zip(*[
......@@ -300,8 +729,24 @@ class UpdateProjectProperties(NamedTuple):
return Finished(all((self.slug, self.display_name, self.description)))
class RemoveOwner(NamedTuple):
"""
:class:`Query` to remove a specific user's permissions for a given project
A successful execution produces a nondescript :class:`Finished`, while an
:class:`Error` indicates that this owner does not exist, the project does not exist,
or the owner does not own the project.
Given that ProjectIDs and UserIDs are typically validated beforehand, it's generally
fairly safe to assume that this mean that the user wasn't an owner of the project
"""
project_id: ProjectID
"""
The ID of the project to affect
"""
user: UserID
"""
The ID of the owner to be removed
"""
def get_query(self) -> QueryRequest:
return QueryRequest('''
DELETE FROM owners WHERE user_id = %s AND project_id = %s;
......@@ -315,7 +760,19 @@ class RemoveOwner(NamedTuple):
return Finished(None)
class DeleteProject(NamedTuple):
"""
:class:`Query` to delete all information about a project
This includes its list of owners, all data sources associated with it, and specific
details.
When successful, this produces a nondescript :class:`Finished`, or, when the project
did not exist, a nondescript :class:`Error`
"""
project_id: ProjectID
"""
The ID of the project to delete
"""
def get_query(self) -> QueryRequest:
return QueryRequest('''
DELETE FROM projects WHERE project_id = %s;
......@@ -329,7 +786,19 @@ class DeleteProject(NamedTuple):
return Finished(None)
class GetSources(NamedTuple):
"""
A :class:`Query` to produce a list of all of the data sources linked to this project
On success, produces a collection of :class:`Source`s. If the project did not exist,
this is simply an empty list. This `Query` cannot produce an error.
"""
project_id: UncheckedPID
"""
The ID of the project to get sources for
**Note:** Unlike many other queries, this accepts an unvalidated :class:`UncheckedPID`
instead of a validated :class:`ProjectID`. Keep this in mind when using this query.
"""
def get_query(self) -> QueryRequest:
return QueryRequest('''
SELECT source_id, data_type, data_url, flagged
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment