New Upstream Release - sqlparse
Ready changes
Summary
Merged new upstream version: 0.4.4 (was: 0.4.3).
Diff
diff --git a/AUTHORS b/AUTHORS
index 2e31ae0..1717adf 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -8,6 +8,7 @@ project: https://bitbucket.org/gutworth/six.
Alphabetical list of contributors:
* Adam Greenhall <agreenhall@lyft.com>
+* Aki Ariga <chezou+github@gmail.com>
* Alexander Beedie <ayembee@gmail.com>
* Alexey Malyshev <nostrict@gmail.com>
* ali-tny <aliteeney@googlemail.com>
@@ -16,20 +17,24 @@ Alphabetical list of contributors:
* atronah <atronah.ds@gmail.com>
* casey <casey@cloudera.com>
* Cauê Beloni <cbeloni@gmail.com>
+* Christian Clauss <cclauss@me.com>
* circld <circld1@gmail.com>
* Corey Zumar <corey.zumar@databricks.com>
* Cristian Orellana <cristiano@groupon.com>
* Dag Wieers <dag@wieers.com>
+* Daniel Harding <dharding@living180.net>
* Darik Gamble <darik.gamble@gmail.com>
* Demetrio92 <Demetrio.Rodriguez.T@gmail.com>
* Dennis Taylor <dennis.taylor@clio.com>
* Dvořák Václav <Vaclav.Dvorak@ysoft.com>
+* Erik Cederstrand <erik@adamatics.com>
* Florian Bauer <florian.bauer@zmdi.com>
* Fredy Wijaya <fredy.wijaya@gmail.com>
* Gavin Wahl <gwahl@fusionbox.com>
* hurcy <cinyoung.hur@gmail.com>
* Ian Robertson <ian.robertson@capitalone.com>
* JacekPliszka <Jacek.Pliszka@gmail.com>
+* JavierPan <PeterSandwich@users.noreply.github.com>
* Jean-Martin Archer <jm@jmartin.ca>
* Jesús Leganés Combarro "Piranna" <piranna@gmail.com>
* Johannes Hoff <johshoff@gmail.com>
@@ -39,11 +44,13 @@ Alphabetical list of contributors:
* Kevin Jing Qiu <kevin.jing.qiu@gmail.com>
* koljonen <koljonen@outlook.com>
* Likai Liu <liulk@likai.org>
+* Long Le Xich <codenamelxl@users.noreply.github.com>
* mathilde.oustlant <mathilde.oustlant@ext.cdiscount.com>
* Michael Schuller <chick@mschuller.net>
* Mike Amy <cocoade@googlemail.com>
* mulos <daniel.strackbein@gmail.com>
* Oleg Broytman <phd@phdru.name>
+* osmnv <80402144+osmnv@users.noreply.github.com>
* Patrick Schemitz <patrick.schemitz@digitalbriefkasten.de>
* Pi Delport <pjdelport@gmail.com>
* Prudhvi Vatala <pvatala@gmail.com>
@@ -55,6 +62,7 @@ Alphabetical list of contributors:
* Ryan Wooden <rygwdn@gmail.com>
* saaj <id@saaj.me>
* Shen Longxing <shenlongxing2012@gmail.com>
+* Simon Heisterkamp <she@delegate.dk>
* Sjoerd Job Postmus
* Soloman Weng <soloman1124@gmail.com>
* spigwitmer <itgpmc@gmail.com>
diff --git a/CHANGELOG b/CHANGELOG
index 65e03fc..a42577e 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,3 +1,53 @@
+Release 0.4.4 (Apr 18, 2023)
+----------------------------
+
+Notable Changes
+
+* IMPORTANT: This release fixes a security vulnerability in the
+ parser where a regular expression vulnerable to ReDOS (Regular
+ Expression Denial of Service) was used. See the security advisory
+ for details: https://github.com/andialbrecht/sqlparse/security/advisories/GHSA-rrm6-wvj7-cwh2
+ The vulnerability was discovered by @erik-krogh from GitHub
+ Security Lab (GHSL). Thanks for reporting!
+
+Bug Fixes
+
+* Revert a change from 0.4.0 that changed IN to be a comparison (issue694).
+ The primary expectation is that IN is treated as a keyword and not as a
+ comparison operator. That also follows the definition of reserved keywords
+ for the major SQL syntax definitions.
+* Fix regular expressions for string parsing.
+
+Other
+
+* sqlparse now uses pyproject.toml instead of setup.cfg (issue685).
+
+
+Release 0.4.3 (Sep 23, 2022)
+----------------------------
+
+Enhancements
+
+* Add support for DIV operator (pr664, by chezou).
+* Add support for additional SPARK keywords (pr643, by mrmasterplan).
+* Avoid tokens copy (pr622, by living180).
+* Add REGEXP as a comparision (pr647, by PeterSandwich).
+* Add DISTINCTROW keyword for MS Access (issue677).
+* Improve parsing of CREATE TABLE AS SELECT (pr662, by chezou).
+
+Bug Fixes
+
+* Fix spelling of INDICATOR keyword (pr653, by ptld).
+* Fix formatting error in EXTRACT function (issue562, issue670, pr676, by ecederstrand).
+* Fix bad parsing of create table statements that use lower case (issue217, pr642, by mrmasterplan).
+* Handle backtick as valid quote char (issue628, pr629, by codenamelxl).
+* Allow any unicode character as valid identifier name (issue641).
+
+Other
+
+* Update github actions to test on Python 3.10 as well (pr661, by cclaus).
+
+
Release 0.4.2 (Sep 10, 2021)
----------------------------
@@ -78,7 +128,7 @@ Bug Fixes
* Remove support for parsing double slash comments introduced in
0.3.0 (issue456) as it had some side-effects with other dialects and
doesn't seem to be widely used (issue476).
-* Restrict detection of alias names to objects that acutally could
+* Restrict detection of alias names to objects that actually could
have an alias (issue455, adopted some parts of pr509 by john-bodley).
* Fix parsing of date/time literals (issue438, by vashek).
* Fix initialization of TokenList (issue499, pr505 by john-bodley).
diff --git a/MANIFEST.in b/MANIFEST.in
deleted file mode 100644
index 8043b35..0000000
--- a/MANIFEST.in
+++ /dev/null
@@ -1,11 +0,0 @@
-recursive-include docs source/*
-include docs/sqlformat.1
-include docs/Makefile
-recursive-include tests *.py *.sql
-include LICENSE
-include TODO
-include AUTHORS
-include CHANGELOG
-include Makefile
-include setup.cfg
-include tox.ini
diff --git a/Makefile b/Makefile
index ee35e54..1657822 100644
--- a/Makefile
+++ b/Makefile
@@ -22,5 +22,5 @@ clean:
release:
@rm -rf dist/
- python setup.py sdist bdist_wheel
+ python -m build
twine upload --sign --identity E0B84F81 dist/*
diff --git a/PKG-INFO b/PKG-INFO
index 556e02c..b19308d 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,92 +1,10 @@
-Metadata-Version: 1.2
+Metadata-Version: 2.1
Name: sqlparse
-Version: 0.4.2
+Version: 0.4.4
Summary: A non-validating SQL parser.
-Home-page: https://github.com/andialbrecht/sqlparse
-Author: Andi Albrecht
-Author-email: albrecht.andi@gmail.com
-License: BSD-3-Clause
-Project-URL: Documentation, https://sqlparse.readthedocs.io/
-Project-URL: Release Notes, https://sqlparse.readthedocs.io/en/latest/changes/
-Project-URL: Source, https://github.com/andialbrecht/sqlparse
-Project-URL: Tracker, https://github.com/andialbrecht/sqlparse/issues
-Description: python-sqlparse - Parse SQL statements
- ======================================
-
- |buildstatus|_
- |coverage|_
- |docs|_
-
- .. docincludebegin
-
- sqlparse is a non-validating SQL parser for Python.
- It provides support for parsing, splitting and formatting SQL statements.
-
- The module is compatible with Python 3.5+ and released under the terms of the
- `New BSD license <https://opensource.org/licenses/BSD-3-Clause>`_.
-
- Visit the project page at https://github.com/andialbrecht/sqlparse for
- further information about this project.
-
-
- Quick Start
- -----------
-
- .. code-block:: sh
-
- $ pip install sqlparse
-
- .. code-block:: python
-
- >>> import sqlparse
-
- >>> # Split a string containing two SQL statements:
- >>> raw = 'select * from foo; select * from bar;'
- >>> statements = sqlparse.split(raw)
- >>> statements
- ['select * from foo;', 'select * from bar;']
-
- >>> # Format the first statement and print it out:
- >>> first = statements[0]
- >>> print(sqlparse.format(first, reindent=True, keyword_case='upper'))
- SELECT *
- FROM foo;
-
- >>> # Parsing a SQL statement:
- >>> parsed = sqlparse.parse('select * from foo')[0]
- >>> parsed.tokens
- [<DML 'select' at 0x7f22c5e15368>, <Whitespace ' ' at 0x7f22c5e153b0>, <Wildcard '*' … ]
- >>>
-
- Links
- -----
-
- Project page
- https://github.com/andialbrecht/sqlparse
-
- Bug tracker
- https://github.com/andialbrecht/sqlparse/issues
-
- Documentation
- https://sqlparse.readthedocs.io/
-
- Online Demo
- https://sqlformat.org/
-
-
- sqlparse is licensed under the BSD license.
-
- Parts of the code are based on pygments written by Georg Brandl and others.
- pygments-Homepage: http://pygments.org/
-
- .. |buildstatus| image:: https://secure.travis-ci.org/andialbrecht/sqlparse.png?branch=master
- .. _buildstatus: https://travis-ci.org/#!/andialbrecht/sqlparse
- .. |coverage| image:: https://codecov.io/gh/andialbrecht/sqlparse/branch/master/graph/badge.svg
- .. _coverage: https://codecov.io/gh/andialbrecht/sqlparse
- .. |docs| image:: https://readthedocs.org/projects/sqlparse/badge/?version=latest
- .. _docs: https://sqlparse.readthedocs.io/en/latest/?badge=latest
-
-Platform: UNKNOWN
+Author-email: Andi Albrecht <albrecht.andi@gmail.com>
+Requires-Python: >=3.5
+Description-Content-Type: text/x-rst
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
@@ -99,8 +17,101 @@ Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Database
Classifier: Topic :: Software Development
-Requires-Python: >=3.5
+Requires-Dist: flake8 ; extra == "dev"
+Requires-Dist: build ; extra == "dev"
+Requires-Dist: sphinx ; extra == "doc"
+Requires-Dist: pytest ; extra == "test"
+Requires-Dist: pytest-cov ; extra == "test"
+Project-URL: Documentation, https://sqlparse.readthedocs.io/
+Project-URL: Home, https://github.com/andialbrecht/sqlparse
+Project-URL: Release Notes, https://sqlparse.readthedocs.io/en/latest/changes/
+Project-URL: Source, https://github.com/andialbrecht/sqlparse
+Project-URL: Tracker, https://github.com/andialbrecht/sqlparse/issues
+Provides-Extra: dev
+Provides-Extra: doc
+Provides-Extra: test
+
+python-sqlparse - Parse SQL statements
+======================================
+
+|buildstatus|_
+|coverage|_
+|docs|_
+|packageversion|_
+
+.. docincludebegin
+
+sqlparse is a non-validating SQL parser for Python.
+It provides support for parsing, splitting and formatting SQL statements.
+
+The module is compatible with Python 3.5+ and released under the terms of the
+`New BSD license <https://opensource.org/licenses/BSD-3-Clause>`_.
+
+Visit the project page at https://github.com/andialbrecht/sqlparse for
+further information about this project.
+
+
+Quick Start
+-----------
+
+.. code-block:: sh
+
+ $ pip install sqlparse
+
+.. code-block:: python
+
+ >>> import sqlparse
+
+ >>> # Split a string containing two SQL statements:
+ >>> raw = 'select * from foo; select * from bar;'
+ >>> statements = sqlparse.split(raw)
+ >>> statements
+ ['select * from foo;', 'select * from bar;']
+
+ >>> # Format the first statement and print it out:
+ >>> first = statements[0]
+ >>> print(sqlparse.format(first, reindent=True, keyword_case='upper'))
+ SELECT *
+ FROM foo;
+
+ >>> # Parsing a SQL statement:
+ >>> parsed = sqlparse.parse('select * from foo')[0]
+ >>> parsed.tokens
+ [<DML 'select' at 0x7f22c5e15368>, <Whitespace ' ' at 0x7f22c5e153b0>, <Wildcard '*' … ]
+ >>>
+
+Links
+-----
+
+Project page
+ https://github.com/andialbrecht/sqlparse
+
+Bug tracker
+ https://github.com/andialbrecht/sqlparse/issues
+
+Documentation
+ https://sqlparse.readthedocs.io/
+
+Online Demo
+ https://sqlformat.org/
+
+
+sqlparse is licensed under the BSD license.
+
+Parts of the code are based on pygments written by Georg Brandl and others.
+pygments-Homepage: http://pygments.org/
+
+.. |buildstatus| image:: https://github.com/andialbrecht/sqlparse/actions/workflows/python-app.yml/badge.svg
+.. _buildstatus: https://github.com/andialbrecht/sqlparse/actions/workflows/python-app.yml
+.. |coverage| image:: https://codecov.io/gh/andialbrecht/sqlparse/branch/master/graph/badge.svg
+.. _coverage: https://codecov.io/gh/andialbrecht/sqlparse
+.. |docs| image:: https://readthedocs.org/projects/sqlparse/badge/?version=latest
+.. _docs: https://sqlparse.readthedocs.io/en/latest/?badge=latest
+.. |packageversion| image:: https://img.shields.io/pypi/v/sqlparse?color=%2334D058&label=pypi%20package
+.. _packageversion: https://pypi.org/project/sqlparse
+
diff --git a/README.rst b/README.rst
index 92e15c1..df4e7e3 100644
--- a/README.rst
+++ b/README.rst
@@ -4,6 +4,7 @@ python-sqlparse - Parse SQL statements
|buildstatus|_
|coverage|_
|docs|_
+|packageversion|_
.. docincludebegin
@@ -59,7 +60,7 @@ Documentation
https://sqlparse.readthedocs.io/
Online Demo
- https://sqlformat.org/
+ https://sqlformat.org/
sqlparse is licensed under the BSD license.
@@ -67,9 +68,11 @@ sqlparse is licensed under the BSD license.
Parts of the code are based on pygments written by Georg Brandl and others.
pygments-Homepage: http://pygments.org/
-.. |buildstatus| image:: https://secure.travis-ci.org/andialbrecht/sqlparse.png?branch=master
-.. _buildstatus: https://travis-ci.org/#!/andialbrecht/sqlparse
+.. |buildstatus| image:: https://github.com/andialbrecht/sqlparse/actions/workflows/python-app.yml/badge.svg
+.. _buildstatus: https://github.com/andialbrecht/sqlparse/actions/workflows/python-app.yml
.. |coverage| image:: https://codecov.io/gh/andialbrecht/sqlparse/branch/master/graph/badge.svg
.. _coverage: https://codecov.io/gh/andialbrecht/sqlparse
.. |docs| image:: https://readthedocs.org/projects/sqlparse/badge/?version=latest
.. _docs: https://sqlparse.readthedocs.io/en/latest/?badge=latest
+.. |packageversion| image:: https://img.shields.io/pypi/v/sqlparse?color=%2334D058&label=pypi%20package
+.. _packageversion: https://pypi.org/project/sqlparse
diff --git a/debian/changelog b/debian/changelog
index 78ed9c2..232b45c 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,10 @@
+sqlparse (0.4.4-1) UNRELEASED; urgency=low
+
+ * New upstream release.
+ * New upstream release.
+
+ -- Debian Janitor <janitor@jelmer.uk> Fri, 19 May 2023 06:34:03 -0000
+
sqlparse (0.4.2-1) unstable; urgency=medium
* Team upload.
diff --git a/docs/source/extending.rst b/docs/source/extending.rst
new file mode 100644
index 0000000..0c10924
--- /dev/null
+++ b/docs/source/extending.rst
@@ -0,0 +1,76 @@
+Extending :mod:`sqlparse`
+=========================
+
+.. module:: sqlparse
+ :synopsis: Extending parsing capability of sqlparse.
+
+The :mod:`sqlparse` module uses a sql grammar that was tuned through usage and numerous
+PR to fit a broad range of SQL syntaxes, but it cannot cater to every given case since
+some SQL dialects have adopted conflicting meanings of certain keywords. Sqlparse
+therefore exposes a mechanism to configure the fundamental keywords and regular
+expressions that parse the language as described below.
+
+If you find an adaptation that works for your specific use-case. Please consider
+contributing it back to the community by opening a PR on
+`GitHub <https://github.com/andialbrecht/sqlparse>`_.
+
+Configuring the Lexer
+---------------------
+
+The lexer is a singleton class that breaks down the stream of characters into language
+tokens. It does this by using a sequence of regular expressions and keywords that are
+listed in the file ``sqlparse.keywords``. Instead of applying these fixed grammar
+definitions directly, the lexer is default initialized in its method called
+``default_initialization()``. As an api user, you can adapt the Lexer configuration by
+applying your own configuration logic. To do so, start out by clearing previous
+configurations with ``.clear()``, then apply the SQL list with
+``.set_SQL_REGEX(SQL_REGEX)``, and apply keyword lists with ``.add_keywords(KEYWORDS)``.
+
+You can do so by re-using the expressions in ``sqlparse.keywords`` (see example below),
+leaving parts out, or by making up your own master list.
+
+See the expected types of the arguments by inspecting their structure in
+``sqlparse.keywords``.
+(For compatibility with python 3.4, this library does not use type-hints.)
+
+The following example adds support for the expression ``ZORDER BY``, and adds ``BAR`` as
+a keyword to the lexer:
+
+.. code-block:: python
+
+ import re
+
+ import sqlparse
+ from sqlparse import keywords
+ from sqlparse.lexer import Lexer
+
+ # get the lexer singleton object to configure it
+ lex = Lexer.get_default_instance()
+
+ # Clear the default configurations.
+ # After this call, reg-exps and keyword dictionaries need to be loaded
+ # to make the lexer functional again.
+ lex.clear()
+
+ my_regex = (r"ZORDER\s+BY\b", sqlparse.tokens.Keyword)
+
+ # slice the default SQL_REGEX to inject the custom object
+ lex.set_SQL_REGEX(
+ keywords.SQL_REGEX[:38]
+ + [my_regex]
+ + keywords.SQL_REGEX[38:]
+ )
+
+ # add the default keyword dictionaries
+ lex.add_keywords(keywords.KEYWORDS_COMMON)
+ lex.add_keywords(keywords.KEYWORDS_ORACLE)
+ lex.add_keywords(keywords.KEYWORDS_PLPGSQL)
+ lex.add_keywords(keywords.KEYWORDS_HQL)
+ lex.add_keywords(keywords.KEYWORDS_MSACCESS)
+ lex.add_keywords(keywords.KEYWORDS)
+
+ # add a custom keyword dictionary
+ lex.add_keywords({'BAR', sqlparse.tokens.Keyword})
+
+ # no configuration is passed here. The lexer is used as a singleton.
+ sqlparse.parse("select * from foo zorder by bar;")
diff --git a/docs/source/index.rst b/docs/source/index.rst
index cba3314..e18d2b3 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -20,6 +20,7 @@ Contents
api
analyzing
ui
+ extending
changes
license
indices
diff --git a/pyproject.toml b/pyproject.toml
new file mode 100644
index 0000000..338a53c
--- /dev/null
+++ b/pyproject.toml
@@ -0,0 +1,70 @@
+[build-system]
+requires = ["flit_core >=3.2,<4"]
+build-backend = "flit_core.buildapi"
+
+[project]
+name = "sqlparse"
+description = "A non-validating SQL parser."
+authors = [{name = "Andi Albrecht", email = "albrecht.andi@gmail.com"}]
+readme = "README.rst"
+dynamic = ["version"]
+classifiers = [
+ "Development Status :: 5 - Production/Stable",
+ "Intended Audience :: Developers",
+ "License :: OSI Approved :: BSD License",
+ "Operating System :: OS Independent",
+ "Programming Language :: Python",
+ "Programming Language :: Python :: 3",
+ "Programming Language :: Python :: 3 :: Only",
+ "Programming Language :: Python :: 3.5",
+ "Programming Language :: Python :: 3.6",
+ "Programming Language :: Python :: 3.7",
+ "Programming Language :: Python :: 3.8",
+ "Programming Language :: Python :: 3.9",
+ "Programming Language :: Python :: 3.10",
+ "Programming Language :: Python :: Implementation :: CPython",
+ "Programming Language :: Python :: Implementation :: PyPy",
+ "Topic :: Database",
+ "Topic :: Software Development",
+]
+requires-python = ">=3.5"
+
+[project.urls]
+Home = "https://github.com/andialbrecht/sqlparse"
+Documentation = "https://sqlparse.readthedocs.io/"
+"Release Notes" = "https://sqlparse.readthedocs.io/en/latest/changes/"
+Source = "https://github.com/andialbrecht/sqlparse"
+Tracker = "https://github.com/andialbrecht/sqlparse/issues"
+
+[project.scripts]
+sqlformat = "sqlparse.__main__:main"
+
+[project.optional-dependencies]
+dev = [
+ "flake8",
+ "build",
+]
+test = [
+ "pytest",
+ "pytest-cov",
+]
+doc = [
+ "sphinx",
+]
+
+[tool.flit.sdist]
+include = [
+ "docs/source/",
+ "docs/sqlformat.1",
+ "docs/Makefile",
+ "tests/*.py", "tests/files/*.sql",
+ "LICENSE",
+ "TODO",
+ "AUTHORS",
+ "CHANGELOG",
+ "Makefile",
+ "tox.ini",
+]
+
+[tool.coverage.run]
+omit = ["sqlparse/__main__.py"]
diff --git a/setup.cfg b/setup.cfg
deleted file mode 100644
index cfef89c..0000000
--- a/setup.cfg
+++ /dev/null
@@ -1,59 +0,0 @@
-[metadata]
-name = sqlparse
-version = attr: sqlparse.__version__
-url = https://github.com/andialbrecht/sqlparse
-author = Andi Albrecht
-author_email = albrecht.andi@gmail.com
-description = A non-validating SQL parser.
-long_description = file: README.rst
-license = BSD-3-Clause
-classifiers =
- Development Status :: 5 - Production/Stable
- Intended Audience :: Developers
- License :: OSI Approved :: BSD License
- Operating System :: OS Independent
- Programming Language :: Python
- Programming Language :: Python :: 3
- Programming Language :: Python :: 3 :: Only
- Programming Language :: Python :: 3.5
- Programming Language :: Python :: 3.6
- Programming Language :: Python :: 3.7
- Programming Language :: Python :: 3.8
- Programming Language :: Python :: 3.9
- Programming Language :: Python :: Implementation :: CPython
- Programming Language :: Python :: Implementation :: PyPy
- Topic :: Database
- Topic :: Software Development
-project_urls =
- Documentation = https://sqlparse.readthedocs.io/
- Release Notes = https://sqlparse.readthedocs.io/en/latest/changes/
- Source = https://github.com/andialbrecht/sqlparse
- Tracker = https://github.com/andialbrecht/sqlparse/issues
-
-[options]
-python_requires = >=3.5
-packages = find:
-
-[options.packages.find]
-exclude = tests
-
-[options.entry_points]
-console_scripts =
- sqlformat = sqlparse.__main__:main
-
-[tool:pytest]
-xfail_strict = True
-
-[flake8]
-extend-ignore =
- E731
-
-[coverage:run]
-branch = False
-omit =
- sqlparse/__main__.py
-
-[egg_info]
-tag_build =
-tag_date = 0
-
diff --git a/setup.py b/setup.py
deleted file mode 100644
index ede0aff..0000000
--- a/setup.py
+++ /dev/null
@@ -1,12 +0,0 @@
-#!/usr/bin/env python
-#
-# Copyright (C) 2009-2020 the sqlparse authors and contributors
-# <see AUTHORS file>
-#
-# This setup script is part of python-sqlparse and is released under
-# the BSD License: https://opensource.org/licenses/BSD-3-Clause
-
-from setuptools import setup
-
-
-setup()
diff --git a/sqlparse.egg-info/PKG-INFO b/sqlparse.egg-info/PKG-INFO
deleted file mode 100644
index 556e02c..0000000
--- a/sqlparse.egg-info/PKG-INFO
+++ /dev/null
@@ -1,106 +0,0 @@
-Metadata-Version: 1.2
-Name: sqlparse
-Version: 0.4.2
-Summary: A non-validating SQL parser.
-Home-page: https://github.com/andialbrecht/sqlparse
-Author: Andi Albrecht
-Author-email: albrecht.andi@gmail.com
-License: BSD-3-Clause
-Project-URL: Documentation, https://sqlparse.readthedocs.io/
-Project-URL: Release Notes, https://sqlparse.readthedocs.io/en/latest/changes/
-Project-URL: Source, https://github.com/andialbrecht/sqlparse
-Project-URL: Tracker, https://github.com/andialbrecht/sqlparse/issues
-Description: python-sqlparse - Parse SQL statements
- ======================================
-
- |buildstatus|_
- |coverage|_
- |docs|_
-
- .. docincludebegin
-
- sqlparse is a non-validating SQL parser for Python.
- It provides support for parsing, splitting and formatting SQL statements.
-
- The module is compatible with Python 3.5+ and released under the terms of the
- `New BSD license <https://opensource.org/licenses/BSD-3-Clause>`_.
-
- Visit the project page at https://github.com/andialbrecht/sqlparse for
- further information about this project.
-
-
- Quick Start
- -----------
-
- .. code-block:: sh
-
- $ pip install sqlparse
-
- .. code-block:: python
-
- >>> import sqlparse
-
- >>> # Split a string containing two SQL statements:
- >>> raw = 'select * from foo; select * from bar;'
- >>> statements = sqlparse.split(raw)
- >>> statements
- ['select * from foo;', 'select * from bar;']
-
- >>> # Format the first statement and print it out:
- >>> first = statements[0]
- >>> print(sqlparse.format(first, reindent=True, keyword_case='upper'))
- SELECT *
- FROM foo;
-
- >>> # Parsing a SQL statement:
- >>> parsed = sqlparse.parse('select * from foo')[0]
- >>> parsed.tokens
- [<DML 'select' at 0x7f22c5e15368>, <Whitespace ' ' at 0x7f22c5e153b0>, <Wildcard '*' … ]
- >>>
-
- Links
- -----
-
- Project page
- https://github.com/andialbrecht/sqlparse
-
- Bug tracker
- https://github.com/andialbrecht/sqlparse/issues
-
- Documentation
- https://sqlparse.readthedocs.io/
-
- Online Demo
- https://sqlformat.org/
-
-
- sqlparse is licensed under the BSD license.
-
- Parts of the code are based on pygments written by Georg Brandl and others.
- pygments-Homepage: http://pygments.org/
-
- .. |buildstatus| image:: https://secure.travis-ci.org/andialbrecht/sqlparse.png?branch=master
- .. _buildstatus: https://travis-ci.org/#!/andialbrecht/sqlparse
- .. |coverage| image:: https://codecov.io/gh/andialbrecht/sqlparse/branch/master/graph/badge.svg
- .. _coverage: https://codecov.io/gh/andialbrecht/sqlparse
- .. |docs| image:: https://readthedocs.org/projects/sqlparse/badge/?version=latest
- .. _docs: https://sqlparse.readthedocs.io/en/latest/?badge=latest
-
-Platform: UNKNOWN
-Classifier: Development Status :: 5 - Production/Stable
-Classifier: Intended Audience :: Developers
-Classifier: License :: OSI Approved :: BSD License
-Classifier: Operating System :: OS Independent
-Classifier: Programming Language :: Python
-Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3 :: Only
-Classifier: Programming Language :: Python :: 3.5
-Classifier: Programming Language :: Python :: 3.6
-Classifier: Programming Language :: Python :: 3.7
-Classifier: Programming Language :: Python :: 3.8
-Classifier: Programming Language :: Python :: 3.9
-Classifier: Programming Language :: Python :: Implementation :: CPython
-Classifier: Programming Language :: Python :: Implementation :: PyPy
-Classifier: Topic :: Database
-Classifier: Topic :: Software Development
-Requires-Python: >=3.5
diff --git a/sqlparse.egg-info/SOURCES.txt b/sqlparse.egg-info/SOURCES.txt
deleted file mode 100644
index 6f51c22..0000000
--- a/sqlparse.egg-info/SOURCES.txt
+++ /dev/null
@@ -1,73 +0,0 @@
-AUTHORS
-CHANGELOG
-LICENSE
-MANIFEST.in
-Makefile
-README.rst
-TODO
-setup.cfg
-setup.py
-tox.ini
-docs/Makefile
-docs/sqlformat.1
-docs/source/analyzing.rst
-docs/source/api.rst
-docs/source/changes.rst
-docs/source/conf.py
-docs/source/index.rst
-docs/source/indices.rst
-docs/source/intro.rst
-docs/source/license.rst
-docs/source/ui.rst
-sqlparse/__init__.py
-sqlparse/__main__.py
-sqlparse/cli.py
-sqlparse/exceptions.py
-sqlparse/formatter.py
-sqlparse/keywords.py
-sqlparse/lexer.py
-sqlparse/sql.py
-sqlparse/tokens.py
-sqlparse/utils.py
-sqlparse.egg-info/PKG-INFO
-sqlparse.egg-info/SOURCES.txt
-sqlparse.egg-info/dependency_links.txt
-sqlparse.egg-info/entry_points.txt
-sqlparse.egg-info/top_level.txt
-sqlparse/engine/__init__.py
-sqlparse/engine/filter_stack.py
-sqlparse/engine/grouping.py
-sqlparse/engine/statement_splitter.py
-sqlparse/filters/__init__.py
-sqlparse/filters/aligned_indent.py
-sqlparse/filters/others.py
-sqlparse/filters/output.py
-sqlparse/filters/reindent.py
-sqlparse/filters/right_margin.py
-sqlparse/filters/tokens.py
-tests/__init__.py
-tests/conftest.py
-tests/test_cli.py
-tests/test_format.py
-tests/test_grouping.py
-tests/test_keywords.py
-tests/test_parse.py
-tests/test_regressions.py
-tests/test_split.py
-tests/test_tokenize.py
-tests/files/_Make_DirEntry.sql
-tests/files/begintag.sql
-tests/files/begintag_2.sql
-tests/files/casewhen_procedure.sql
-tests/files/dashcomment.sql
-tests/files/encoding_gbk.sql
-tests/files/encoding_utf8.sql
-tests/files/function.sql
-tests/files/function_psql.sql
-tests/files/function_psql2.sql
-tests/files/function_psql3.sql
-tests/files/function_psql4.sql
-tests/files/huge_select.sql
-tests/files/mysql_handler.sql
-tests/files/stream.sql
-tests/files/test_cp1251.sql
\ No newline at end of file
diff --git a/sqlparse.egg-info/dependency_links.txt b/sqlparse.egg-info/dependency_links.txt
deleted file mode 100644
index 8b13789..0000000
--- a/sqlparse.egg-info/dependency_links.txt
+++ /dev/null
@@ -1 +0,0 @@
-
diff --git a/sqlparse.egg-info/entry_points.txt b/sqlparse.egg-info/entry_points.txt
deleted file mode 100644
index 09d1990..0000000
--- a/sqlparse.egg-info/entry_points.txt
+++ /dev/null
@@ -1,3 +0,0 @@
-[console_scripts]
-sqlformat = sqlparse.__main__:main
-
diff --git a/sqlparse.egg-info/top_level.txt b/sqlparse.egg-info/top_level.txt
deleted file mode 100644
index dbd4a47..0000000
--- a/sqlparse.egg-info/top_level.txt
+++ /dev/null
@@ -1 +0,0 @@
-sqlparse
diff --git a/sqlparse/__init__.py b/sqlparse/__init__.py
index 9cab9d2..122595b 100644
--- a/sqlparse/__init__.py
+++ b/sqlparse/__init__.py
@@ -16,7 +16,7 @@ from sqlparse import filters
from sqlparse import formatter
-__version__ = '0.4.2'
+__version__ = '0.4.4'
__all__ = ['engine', 'filters', 'formatter', 'sql', 'tokens', 'cli']
diff --git a/sqlparse/engine/grouping.py b/sqlparse/engine/grouping.py
index 175ae8e..86d8fc6 100644
--- a/sqlparse/engine/grouping.py
+++ b/sqlparse/engine/grouping.py
@@ -91,13 +91,20 @@ def group_tzcasts(tlist):
def match(token):
return token.ttype == T.Keyword.TZCast
- def valid(token):
+ def valid_prev(token):
return token is not None
+ def valid_next(token):
+ return token is not None and (
+ token.is_whitespace
+ or token.match(T.Keyword, 'AS')
+ or token.match(*sql.TypedLiteral.M_CLOSE)
+ )
+
def post(tlist, pidx, tidx, nidx):
return pidx, nidx
- _group(tlist, sql.Identifier, match, valid, valid, post)
+ _group(tlist, sql.Identifier, match, valid_prev, valid_next, post)
def group_typed_literal(tlist):
@@ -334,12 +341,15 @@ def group_aliased(tlist):
def group_functions(tlist):
has_create = False
has_table = False
+ has_as = False
for tmp_token in tlist.tokens:
- if tmp_token.value == 'CREATE':
+ if tmp_token.value.upper() == 'CREATE':
has_create = True
- if tmp_token.value == 'TABLE':
+ if tmp_token.value.upper() == 'TABLE':
has_table = True
- if has_create and has_table:
+ if tmp_token.value == 'AS':
+ has_as = True
+ if has_create and has_table and not has_as:
return
tidx, token = tlist.token_next_by(t=T.Name)
diff --git a/sqlparse/keywords.py b/sqlparse/keywords.py
index 6850628..b45f3e0 100644
--- a/sqlparse/keywords.py
+++ b/sqlparse/keywords.py
@@ -5,99 +5,92 @@
# This module is part of python-sqlparse and is released under
# the BSD License: https://opensource.org/licenses/BSD-3-Clause
-import re
-
from sqlparse import tokens
-
-def is_keyword(value):
- val = value.upper()
- return (KEYWORDS_COMMON.get(val)
- or KEYWORDS_ORACLE.get(val)
- or KEYWORDS_PLPGSQL.get(val)
- or KEYWORDS_HQL.get(val)
- or KEYWORDS.get(val, tokens.Name)), value
-
-
-SQL_REGEX = {
- 'root': [
- (r'(--|# )\+.*?(\r\n|\r|\n|$)', tokens.Comment.Single.Hint),
- (r'/\*\+[\s\S]*?\*/', tokens.Comment.Multiline.Hint),
-
- (r'(--|# ).*?(\r\n|\r|\n|$)', tokens.Comment.Single),
- (r'/\*[\s\S]*?\*/', tokens.Comment.Multiline),
-
- (r'(\r\n|\r|\n)', tokens.Newline),
- (r'\s+?', tokens.Whitespace),
-
- (r':=', tokens.Assignment),
- (r'::', tokens.Punctuation),
-
- (r'\*', tokens.Wildcard),
-
- (r"`(``|[^`])*`", tokens.Name),
- (r"´(´´|[^´])*´", tokens.Name),
- (r'((?<!\S)\$(?:[_A-ZÀ-Ü]\w*)?\$)[\s\S]*?\1', tokens.Literal),
-
- (r'\?', tokens.Name.Placeholder),
- (r'%(\(\w+\))?s', tokens.Name.Placeholder),
- (r'(?<!\w)[$:?]\w+', tokens.Name.Placeholder),
-
- (r'\\\w+', tokens.Command),
- (r'(NOT\s+)?(IN)\b', tokens.Operator.Comparison),
- # FIXME(andi): VALUES shouldn't be listed here
- # see https://github.com/andialbrecht/sqlparse/pull/64
- # AS and IN are special, it may be followed by a parenthesis, but
- # are never functions, see issue183 and issue507
- (r'(CASE|IN|VALUES|USING|FROM|AS)\b', tokens.Keyword),
-
- (r'(@|##|#)[A-ZÀ-Ü]\w+', tokens.Name),
-
- # see issue #39
- # Spaces around period `schema . name` are valid identifier
- # TODO: Spaces before period not implemented
- (r'[A-ZÀ-Ü]\w*(?=\s*\.)', tokens.Name), # 'Name' .
- # FIXME(atronah): never match,
- # because `re.match` doesn't work with look-behind regexp feature
- (r'(?<=\.)[A-ZÀ-Ü]\w*', tokens.Name), # .'Name'
- (r'[A-ZÀ-Ü]\w*(?=\()', tokens.Name), # side effect: change kw to func
- (r'-?0x[\dA-F]+', tokens.Number.Hexadecimal),
- (r'-?\d+(\.\d+)?E-?\d+', tokens.Number.Float),
- (r'(?![_A-ZÀ-Ü])-?(\d+(\.\d*)|\.\d+)(?![_A-ZÀ-Ü])',
- tokens.Number.Float),
- (r'(?![_A-ZÀ-Ü])-?\d+(?![_A-ZÀ-Ü])', tokens.Number.Integer),
- (r"'(''|\\\\|\\'|[^'])*'", tokens.String.Single),
- # not a real string literal in ANSI SQL:
- (r'"(""|\\\\|\\"|[^"])*"', tokens.String.Symbol),
- (r'(""|".*?[^\\]")', tokens.String.Symbol),
- # sqlite names can be escaped with [square brackets]. left bracket
- # cannot be preceded by word character or a right bracket --
- # otherwise it's probably an array index
- (r'(?<![\w\])])(\[[^\]\[]+\])', tokens.Name),
- (r'((LEFT\s+|RIGHT\s+|FULL\s+)?(INNER\s+|OUTER\s+|STRAIGHT\s+)?'
- r'|(CROSS\s+|NATURAL\s+)?)?JOIN\b', tokens.Keyword),
- (r'END(\s+IF|\s+LOOP|\s+WHILE)?\b', tokens.Keyword),
- (r'NOT\s+NULL\b', tokens.Keyword),
- (r'NULLS\s+(FIRST|LAST)\b', tokens.Keyword),
- (r'UNION\s+ALL\b', tokens.Keyword),
- (r'CREATE(\s+OR\s+REPLACE)?\b', tokens.Keyword.DDL),
- (r'DOUBLE\s+PRECISION\b', tokens.Name.Builtin),
- (r'GROUP\s+BY\b', tokens.Keyword),
- (r'ORDER\s+BY\b', tokens.Keyword),
- (r'HANDLER\s+FOR\b', tokens.Keyword),
- (r'(LATERAL\s+VIEW\s+)'
- r'(EXPLODE|INLINE|PARSE_URL_TUPLE|POSEXPLODE|STACK)\b',
- tokens.Keyword),
- (r"(AT|WITH')\s+TIME\s+ZONE\s+'[^']+'", tokens.Keyword.TZCast),
- (r'(NOT\s+)?(LIKE|ILIKE|RLIKE)\b', tokens.Operator.Comparison),
- (r'[0-9_A-ZÀ-Ü][_$#\w]*', is_keyword),
- (r'[;:()\[\],\.]', tokens.Punctuation),
- (r'[<>=~!]+', tokens.Operator.Comparison),
- (r'[+/@#%^&|^-]+', tokens.Operator),
- ]}
-
-FLAGS = re.IGNORECASE | re.UNICODE
-SQL_REGEX = [(re.compile(rx, FLAGS).match, tt) for rx, tt in SQL_REGEX['root']]
+# object() only supports "is" and is useful as a marker
+# use this marker to specify that the given regex in SQL_REGEX
+# shall be processed further through a lookup in the KEYWORDS dictionaries
+PROCESS_AS_KEYWORD = object()
+
+
+SQL_REGEX = [
+ (r'(--|# )\+.*?(\r\n|\r|\n|$)', tokens.Comment.Single.Hint),
+ (r'/\*\+[\s\S]*?\*/', tokens.Comment.Multiline.Hint),
+
+ (r'(--|# ).*?(\r\n|\r|\n|$)', tokens.Comment.Single),
+ (r'/\*[\s\S]*?\*/', tokens.Comment.Multiline),
+
+ (r'(\r\n|\r|\n)', tokens.Newline),
+ (r'\s+?', tokens.Whitespace),
+
+ (r':=', tokens.Assignment),
+ (r'::', tokens.Punctuation),
+
+ (r'\*', tokens.Wildcard),
+
+ (r"`(``|[^`])*`", tokens.Name),
+ (r"´(´´|[^´])*´", tokens.Name),
+ (r'((?<!\S)\$(?:[_A-ZÀ-Ü]\w*)?\$)[\s\S]*?\1', tokens.Literal),
+
+ (r'\?', tokens.Name.Placeholder),
+ (r'%(\(\w+\))?s', tokens.Name.Placeholder),
+ (r'(?<!\w)[$:?]\w+', tokens.Name.Placeholder),
+
+ (r'\\\w+', tokens.Command),
+
+ # FIXME(andi): VALUES shouldn't be listed here
+ # see https://github.com/andialbrecht/sqlparse/pull/64
+ # AS and IN are special, it may be followed by a parenthesis, but
+ # are never functions, see issue183 and issue507
+ (r'(CASE|IN|VALUES|USING|FROM|AS)\b', tokens.Keyword),
+
+ (r'(@|##|#)[A-ZÀ-Ü]\w+', tokens.Name),
+
+ # see issue #39
+ # Spaces around period `schema . name` are valid identifier
+ # TODO: Spaces before period not implemented
+ (r'[A-ZÀ-Ü]\w*(?=\s*\.)', tokens.Name), # 'Name'.
+ # FIXME(atronah): never match,
+ # because `re.match` doesn't work with look-behind regexp feature
+ (r'(?<=\.)[A-ZÀ-Ü]\w*', tokens.Name), # .'Name'
+ (r'[A-ZÀ-Ü]\w*(?=\()', tokens.Name), # side effect: change kw to func
+ (r'-?0x[\dA-F]+', tokens.Number.Hexadecimal),
+ (r'-?\d+(\.\d+)?E-?\d+', tokens.Number.Float),
+ (r'(?![_A-ZÀ-Ü])-?(\d+(\.\d*)|\.\d+)(?![_A-ZÀ-Ü])',
+ tokens.Number.Float),
+ (r'(?![_A-ZÀ-Ü])-?\d+(?![_A-ZÀ-Ü])', tokens.Number.Integer),
+ (r"'(''|\\'|[^'])*'", tokens.String.Single),
+ # not a real string literal in ANSI SQL:
+ (r'"(""|\\"|[^"])*"', tokens.String.Symbol),
+ (r'(""|".*?[^\\]")', tokens.String.Symbol),
+ # sqlite names can be escaped with [square brackets]. left bracket
+ # cannot be preceded by word character or a right bracket --
+ # otherwise it's probably an array index
+ (r'(?<![\w\])])(\[[^\]\[]+\])', tokens.Name),
+ (r'((LEFT\s+|RIGHT\s+|FULL\s+)?(INNER\s+|OUTER\s+|STRAIGHT\s+)?'
+ r'|(CROSS\s+|NATURAL\s+)?)?JOIN\b', tokens.Keyword),
+ (r'END(\s+IF|\s+LOOP|\s+WHILE)?\b', tokens.Keyword),
+ (r'NOT\s+NULL\b', tokens.Keyword),
+ (r'NULLS\s+(FIRST|LAST)\b', tokens.Keyword),
+ (r'UNION\s+ALL\b', tokens.Keyword),
+ (r'CREATE(\s+OR\s+REPLACE)?\b', tokens.Keyword.DDL),
+ (r'DOUBLE\s+PRECISION\b', tokens.Name.Builtin),
+ (r'GROUP\s+BY\b', tokens.Keyword),
+ (r'ORDER\s+BY\b', tokens.Keyword),
+ (r'HANDLER\s+FOR\b', tokens.Keyword),
+ (r'(LATERAL\s+VIEW\s+)'
+ r'(EXPLODE|INLINE|PARSE_URL_TUPLE|POSEXPLODE|STACK)\b',
+ tokens.Keyword),
+ (r"(AT|WITH')\s+TIME\s+ZONE\s+'[^']+'", tokens.Keyword.TZCast),
+ (r'(NOT\s+)?(LIKE|ILIKE|RLIKE)\b', tokens.Operator.Comparison),
+ (r'(NOT\s+)?(REGEXP)\b', tokens.Operator.Comparison),
+ # Check for keywords, also returns tokens.Name if regex matches
+ # but the match isn't a keyword.
+ (r'\w[$#\w]*', PROCESS_AS_KEYWORD),
+ (r'[;:()\[\],\.]', tokens.Punctuation),
+ (r'[<>=~!]+', tokens.Operator.Comparison),
+ (r'[+/@#%^&|^-]+', tokens.Operator),
+]
KEYWORDS = {
'ABORT': tokens.Keyword,
@@ -241,6 +234,7 @@ KEYWORDS = {
'DISABLE': tokens.Keyword,
'DISCONNECT': tokens.Keyword,
'DISPATCH': tokens.Keyword,
+ 'DIV': tokens.Operator,
'DO': tokens.Keyword,
'DOMAIN': tokens.Keyword,
'DYNAMIC': tokens.Keyword,
@@ -314,7 +308,7 @@ KEYWORDS = {
'INCREMENT': tokens.Keyword,
'INDEX': tokens.Keyword,
- 'INDITCATOR': tokens.Keyword,
+ 'INDICATOR': tokens.Keyword,
'INFIX': tokens.Keyword,
'INHERITS': tokens.Keyword,
'INITIAL': tokens.Keyword,
@@ -907,6 +901,7 @@ KEYWORDS_HQL = {
'INLINE': tokens.Keyword,
'INSTR': tokens.Keyword,
'LEN': tokens.Keyword,
+ 'MAP': tokens.Name.Builtin,
'MAXELEMENT': tokens.Keyword,
'MAXINDEX': tokens.Keyword,
'MAX_PART_DATE': tokens.Keyword,
@@ -938,9 +933,12 @@ KEYWORDS_HQL = {
'SQRT': tokens.Keyword,
'STACK': tokens.Keyword,
'STR': tokens.Keyword,
+ 'STRING': tokens.Name.Builtin,
+ 'STRUCT': tokens.Name.Builtin,
'SUBSTR': tokens.Keyword,
'SUMMARY': tokens.Keyword,
'TBLPROPERTIES': tokens.Keyword,
+ 'TIMESTAMP': tokens.Name.Builtin,
'TIMESTAMP_ISO': tokens.Keyword,
'TO_CHAR': tokens.Keyword,
'TO_DATE': tokens.Keyword,
@@ -956,3 +954,8 @@ KEYWORDS_HQL = {
'BREAK': tokens.Keyword,
'LEAVE': tokens.Keyword,
}
+
+
+KEYWORDS_MSACCESS = {
+ 'DISTINCTROW': tokens.Keyword,
+}
diff --git a/sqlparse/lexer.py b/sqlparse/lexer.py
index 4397f18..9d25c9e 100644
--- a/sqlparse/lexer.py
+++ b/sqlparse/lexer.py
@@ -6,6 +6,7 @@
# the BSD License: https://opensource.org/licenses/BSD-3-Clause
"""SQL Lexer"""
+import re
# This code is based on the SqlLexer in pygments.
# http://pygments.org/
@@ -14,18 +15,90 @@
from io import TextIOBase
-from sqlparse import tokens
-from sqlparse.keywords import SQL_REGEX
+from sqlparse import tokens, keywords
from sqlparse.utils import consume
class Lexer:
- """Lexer
- Empty class. Leaving for backwards-compatibility
- """
+ """The Lexer supports configurable syntax.
+ To add support for additional keywords, use the `add_keywords` method."""
+
+ _default_intance = None
+
+ # Development notes:
+ # - This class is prepared to be able to support additional SQL dialects
+ # in the future by adding additional functions that take the place of
+ # the function default_initialization()
+ # - The lexer class uses an explicit singleton behavior with the
+ # instance-getter method get_default_instance(). This mechanism has
+ # the advantage that the call signature of the entry-points to the
+ # sqlparse library are not affected. Also, usage of sqlparse in third
+ # party code does not need to be adapted. On the other hand, singleton
+ # behavior is not thread safe, and the current implementation does not
+ # easily allow for multiple SQL dialects to be parsed in the same
+ # process. Such behavior can be supported in the future by passing a
+ # suitably initialized lexer object as an additional parameter to the
+ # entry-point functions (such as `parse`). Code will need to be written
+ # to pass down and utilize such an object. The current implementation
+ # is prepared to support this thread safe approach without the
+ # default_instance part needing to change interface.
+
+ @classmethod
+ def get_default_instance(cls):
+ """Returns the lexer instance used internally
+ by the sqlparse core functions."""
+ if cls._default_intance is None:
+ cls._default_intance = cls()
+ cls._default_intance.default_initialization()
+ return cls._default_intance
+
+ def default_initialization(self):
+ """Initialize the lexer with default dictionaries.
+ Useful if you need to revert custom syntax settings."""
+ self.clear()
+ self.set_SQL_REGEX(keywords.SQL_REGEX)
+ self.add_keywords(keywords.KEYWORDS_COMMON)
+ self.add_keywords(keywords.KEYWORDS_ORACLE)
+ self.add_keywords(keywords.KEYWORDS_PLPGSQL)
+ self.add_keywords(keywords.KEYWORDS_HQL)
+ self.add_keywords(keywords.KEYWORDS_MSACCESS)
+ self.add_keywords(keywords.KEYWORDS)
+
+ def clear(self):
+ """Clear all syntax configurations.
+ Useful if you want to load a reduced set of syntax configurations.
+ After this call, regexps and keyword dictionaries need to be loaded
+ to make the lexer functional again."""
+ self._SQL_REGEX = []
+ self._keywords = []
+
+ def set_SQL_REGEX(self, SQL_REGEX):
+ """Set the list of regex that will parse the SQL."""
+ FLAGS = re.IGNORECASE | re.UNICODE
+ self._SQL_REGEX = [
+ (re.compile(rx, FLAGS).match, tt)
+ for rx, tt in SQL_REGEX
+ ]
+
+ def add_keywords(self, keywords):
+ """Add keyword dictionaries. Keywords are looked up in the same order
+ that dictionaries were added."""
+ self._keywords.append(keywords)
+
+ def is_keyword(self, value):
+ """Checks for a keyword.
+
+ If the given value is in one of the KEYWORDS_* dictionary
+ it's considered a keyword. Otherwise, tokens.Name is returned.
+ """
+ val = value.upper()
+ for kwdict in self._keywords:
+ if val in kwdict:
+ return kwdict[val], value
+ else:
+ return tokens.Name, value
- @staticmethod
- def get_tokens(text, encoding=None):
+ def get_tokens(self, text, encoding=None):
"""
Return an iterable of (tokentype, value) pairs generated from
`text`. If `unfiltered` is set to `True`, the filtering mechanism
@@ -57,15 +130,15 @@ class Lexer:
iterable = enumerate(text)
for pos, char in iterable:
- for rexmatch, action in SQL_REGEX:
+ for rexmatch, action in self._SQL_REGEX:
m = rexmatch(text, pos)
if not m:
continue
elif isinstance(action, tokens._TokenType):
yield action, m.group()
- elif callable(action):
- yield action(m.group())
+ elif action is keywords.PROCESS_AS_KEYWORD:
+ yield self.is_keyword(m.group())
consume(iterable, m.end() - pos - 1)
break
@@ -79,4 +152,4 @@ def tokenize(sql, encoding=None):
Tokenize *sql* using the :class:`Lexer` and return a 2-tuple stream
of ``(token type, value)`` items.
"""
- return Lexer().get_tokens(sql, encoding)
+ return Lexer.get_default_instance().get_tokens(sql, encoding)
diff --git a/sqlparse/sql.py b/sqlparse/sql.py
index 6a32c26..1ccfbdb 100644
--- a/sqlparse/sql.py
+++ b/sqlparse/sql.py
@@ -234,16 +234,16 @@ class TokenList(Token):
if reverse:
assert end is None
- for idx in range(start - 2, -1, -1):
- token = self.tokens[idx]
- for func in funcs:
- if func(token):
- return idx, token
+ indexes = range(start - 2, -1, -1)
else:
- for idx, token in enumerate(self.tokens[start:end], start=start):
- for func in funcs:
- if func(token):
- return idx, token
+ if end is None:
+ end = len(self.tokens)
+ indexes = range(start, end)
+ for idx in indexes:
+ token = self.tokens[idx]
+ for func in funcs:
+ if func(token):
+ return idx, token
return None, None
def token_first(self, skip_ws=True, skip_cm=False):
@@ -413,27 +413,28 @@ class Statement(TokenList):
Whitespaces and comments at the beginning of the statement
are ignored.
"""
- first_token = self.token_first(skip_cm=True)
- if first_token is None:
+ token = self.token_first(skip_cm=True)
+ if token is None:
# An "empty" statement that either has not tokens at all
# or only whitespace tokens.
return 'UNKNOWN'
- elif first_token.ttype in (T.Keyword.DML, T.Keyword.DDL):
- return first_token.normalized
+ elif token.ttype in (T.Keyword.DML, T.Keyword.DDL):
+ return token.normalized
- elif first_token.ttype == T.Keyword.CTE:
+ elif token.ttype == T.Keyword.CTE:
# The WITH keyword should be followed by either an Identifier or
# an IdentifierList containing the CTE definitions; the actual
# DML keyword (e.g. SELECT, INSERT) will follow next.
- fidx = self.token_index(first_token)
- tidx, token = self.token_next(fidx, skip_ws=True)
- if isinstance(token, (Identifier, IdentifierList)):
- _, dml_keyword = self.token_next(tidx, skip_ws=True)
-
- if dml_keyword is not None \
- and dml_keyword.ttype == T.Keyword.DML:
- return dml_keyword.normalized
+ tidx = self.token_index(token)
+ while tidx is not None:
+ tidx, token = self.token_next(tidx, skip_ws=True)
+ if isinstance(token, (Identifier, IdentifierList)):
+ tidx, token = self.token_next(tidx, skip_ws=True)
+
+ if token is not None \
+ and token.ttype == T.Keyword.DML:
+ return token.normalized
# Hmm, probably invalid syntax, so return unknown.
return 'UNKNOWN'
diff --git a/sqlparse/utils.py b/sqlparse/utils.py
index 299a84c..512f038 100644
--- a/sqlparse/utils.py
+++ b/sqlparse/utils.py
@@ -55,7 +55,7 @@ def remove_quotes(val):
"""Helper that removes surrounding quotes from strings."""
if val is None:
return
- if val[0] in ('"', "'") and val[0] == val[-1]:
+ if val[0] in ('"', "'", '`') and val[0] == val[-1]:
val = val[1:-1]
return val
diff --git a/tests/test_grouping.py b/tests/test_grouping.py
index cf629e9..03d16c5 100644
--- a/tests/test_grouping.py
+++ b/tests/test_grouping.py
@@ -324,6 +324,11 @@ def test_grouping_alias_case():
assert p.tokens[0].get_alias() == 'foo'
+def test_grouping_alias_ctas():
+ p = sqlparse.parse('CREATE TABLE tbl1 AS SELECT coalesce(t1.col1, 0) AS col1 FROM t1')[0]
+ assert p.tokens[10].get_alias() == 'col1'
+ assert isinstance(p.tokens[10].tokens[0], sql.Function)
+
def test_grouping_subquery_no_parens():
# Not totally sure if this is the right approach...
# When a THEN clause contains a subquery w/o parenthesis around it *and*
@@ -371,20 +376,10 @@ def test_grouping_function_not_in():
# issue183
p = sqlparse.parse('in(1, 2)')[0]
assert len(p.tokens) == 2
- assert p.tokens[0].ttype == T.Comparison
+ assert p.tokens[0].ttype == T.Keyword
assert isinstance(p.tokens[1], sql.Parenthesis)
-def test_in_comparison():
- # issue566
- p = sqlparse.parse('a in (1, 2)')[0]
- assert len(p.tokens) == 1
- assert isinstance(p.tokens[0], sql.Comparison)
- assert len(p.tokens[0].tokens) == 5
- assert p.tokens[0].left.value == 'a'
- assert p.tokens[0].right.value == '(1, 2)'
-
-
def test_grouping_varchar():
p = sqlparse.parse('"text" Varchar(50) NOT NULL')[0]
assert isinstance(p.tokens[2], sql.Function)
@@ -655,3 +650,7 @@ def test_grouping_as_cte():
assert p[0].get_alias() is None
assert p[2].value == 'AS'
assert p[4].value == 'WITH'
+
+def test_grouping_create_table():
+ p = sqlparse.parse("create table db.tbl (a string)")[0].tokens
+ assert p[4].value == "db.tbl"
diff --git a/tests/test_keywords.py b/tests/test_keywords.py
index d4ded4b..b26e9b4 100644
--- a/tests/test_keywords.py
+++ b/tests/test_keywords.py
@@ -1,7 +1,7 @@
import pytest
from sqlparse import tokens
-from sqlparse.keywords import SQL_REGEX
+from sqlparse.lexer import Lexer
class TestSQLREGEX:
@@ -9,5 +9,5 @@ class TestSQLREGEX:
'1.', '-1.',
'.1', '-.1'])
def test_float_numbers(self, number):
- ttype = next(tt for action, tt in SQL_REGEX if action(number))
+ ttype = next(tt for action, tt in Lexer.get_default_instance()._SQL_REGEX if action(number))
assert tokens.Number.Float == ttype
diff --git a/tests/test_parse.py b/tests/test_parse.py
index 513b4be..5feef5a 100644
--- a/tests/test_parse.py
+++ b/tests/test_parse.py
@@ -4,7 +4,8 @@ from io import StringIO
import pytest
import sqlparse
-from sqlparse import sql, tokens as T
+from sqlparse import sql, tokens as T, keywords
+from sqlparse.lexer import Lexer
def test_parse_tokenize():
@@ -132,6 +133,12 @@ def test_parse_nested_function():
assert type(t[0]) is sql.Function
+def test_parse_div_operator():
+ p = sqlparse.parse('col1 DIV 5 AS div_col1')[0].tokens
+ assert p[0].tokens[0].tokens[2].ttype is T.Operator
+ assert p[0].get_alias() == 'div_col1'
+
+
def test_quoted_identifier():
t = sqlparse.parse('select x.y as "z" from foo')[0].tokens
assert isinstance(t[2], sql.Identifier)
@@ -142,6 +149,7 @@ def test_quoted_identifier():
@pytest.mark.parametrize('name', [
'foo', '_foo', # issue175
'1_data', # valid MySQL table name, see issue337
+ '業者名稱', # valid at least for SQLite3, see issue641
])
def test_valid_identifier_names(name):
t = sqlparse.parse(name)[0].tokens
@@ -482,3 +490,79 @@ def test_parenthesis():
T.Newline,
T.Newline,
T.Punctuation]
+
+
+def test_configurable_keywords():
+ sql = """select * from foo BACON SPAM EGGS;"""
+ tokens = sqlparse.parse(sql)[0]
+
+ assert list(
+ (t.ttype, t.value)
+ for t in tokens
+ if t.ttype not in sqlparse.tokens.Whitespace
+ ) == [
+ (sqlparse.tokens.Keyword.DML, "select"),
+ (sqlparse.tokens.Wildcard, "*"),
+ (sqlparse.tokens.Keyword, "from"),
+ (None, "foo BACON"),
+ (None, "SPAM EGGS"),
+ (sqlparse.tokens.Punctuation, ";"),
+ ]
+
+ Lexer.get_default_instance().add_keywords(
+ {
+ "BACON": sqlparse.tokens.Name.Builtin,
+ "SPAM": sqlparse.tokens.Keyword,
+ "EGGS": sqlparse.tokens.Keyword,
+ }
+ )
+
+ tokens = sqlparse.parse(sql)[0]
+
+ # reset the syntax for later tests.
+ Lexer.get_default_instance().default_initialization()
+
+ assert list(
+ (t.ttype, t.value)
+ for t in tokens
+ if t.ttype not in sqlparse.tokens.Whitespace
+ ) == [
+ (sqlparse.tokens.Keyword.DML, "select"),
+ (sqlparse.tokens.Wildcard, "*"),
+ (sqlparse.tokens.Keyword, "from"),
+ (None, "foo"),
+ (sqlparse.tokens.Name.Builtin, "BACON"),
+ (sqlparse.tokens.Keyword, "SPAM"),
+ (sqlparse.tokens.Keyword, "EGGS"),
+ (sqlparse.tokens.Punctuation, ";"),
+ ]
+
+
+def test_configurable_regex():
+ lex = Lexer.get_default_instance()
+ lex.clear()
+
+ my_regex = (r"ZORDER\s+BY\b", sqlparse.tokens.Keyword)
+
+ lex.set_SQL_REGEX(
+ keywords.SQL_REGEX[:38]
+ + [my_regex]
+ + keywords.SQL_REGEX[38:]
+ )
+ lex.add_keywords(keywords.KEYWORDS_COMMON)
+ lex.add_keywords(keywords.KEYWORDS_ORACLE)
+ lex.add_keywords(keywords.KEYWORDS_PLPGSQL)
+ lex.add_keywords(keywords.KEYWORDS_HQL)
+ lex.add_keywords(keywords.KEYWORDS_MSACCESS)
+ lex.add_keywords(keywords.KEYWORDS)
+
+ tokens = sqlparse.parse("select * from foo zorder by bar;")[0]
+
+ # reset the syntax for later tests.
+ Lexer.get_default_instance().default_initialization()
+
+ assert list(
+ (t.ttype, t.value)
+ for t in tokens
+ if t.ttype not in sqlparse.tokens.Whitespace
+ )[4] == (sqlparse.tokens.Keyword, "zorder by")
diff --git a/tests/test_regressions.py b/tests/test_regressions.py
index 38d1840..bc8b7dd 100644
--- a/tests/test_regressions.py
+++ b/tests/test_regressions.py
@@ -401,6 +401,15 @@ def test_issue489_tzcasts():
assert p.tokens[-1].get_alias() == 'foo'
+def test_issue562_tzcasts():
+ # Test that whitespace between 'from' and 'bar' is retained
+ formatted = sqlparse.format(
+ 'SELECT f(HOUR from bar AT TIME ZONE \'UTC\') from foo', reindent=True
+ )
+ assert formatted == \
+ 'SELECT f(HOUR\n from bar AT TIME ZONE \'UTC\')\nfrom foo'
+
+
def test_as_in_parentheses_indents():
# did raise NoneType has no attribute is_group in _process_parentheses
formatted = sqlparse.format('(as foo)', reindent=True)
@@ -418,3 +427,12 @@ def test_splitting_at_and_backticks_issue588():
'grant foo to user1@`myhost`; grant bar to user1@`myhost`;')
assert len(splitted) == 2
assert splitted[-1] == 'grant bar to user1@`myhost`;'
+
+
+def test_comment_between_cte_clauses_issue632():
+ p, = sqlparse.parse("""
+ WITH foo AS (),
+ -- A comment before baz subquery
+ baz AS ()
+ SELECT * FROM baz;""")
+ assert p.get_type() == "SELECT"
diff --git a/tests/test_split.py b/tests/test_split.py
index a9d7576..e79750e 100644
--- a/tests/test_split.py
+++ b/tests/test_split.py
@@ -18,8 +18,8 @@ def test_split_semicolon():
def test_split_backslash():
- stmts = sqlparse.parse(r"select '\\'; select '\''; select '\\\'';")
- assert len(stmts) == 3
+ stmts = sqlparse.parse("select '\'; select '\'';")
+ assert len(stmts) == 2
@pytest.mark.parametrize('fn', ['function.sql',
diff --git a/tests/test_utils.py b/tests/test_utils.py
new file mode 100644
index 0000000..d020f3f
--- /dev/null
+++ b/tests/test_utils.py
@@ -0,0 +1,12 @@
+import pytest
+
+from sqlparse import utils
+
+
+@pytest.mark.parametrize('value, expected', (
+ [None, None],
+ ['\'foo\'', 'foo'],
+ ['"foo"', 'foo'],
+ ['`foo`', 'foo']))
+def test_remove_quotes(value, expected):
+ assert utils.remove_quotes(value) == expected