New Upstream Release - python-clickhouse-driver
Ready changes
Summary
Merged new upstream version: 0.2.6 (was: 0.2.5).
Resulting package
Built on 2023-05-24T06:10 (took 6m27s)
The resulting binary packages can be installed (if you have the apt repository enabled) by running one of:
apt install -t fresh-releases python3-clickhouse-driver-dbgsymapt install -t fresh-releases python3-clickhouse-driver-docapt install -t fresh-releases python3-clickhouse-driver
Lintian Result
- python-clickhouse-driver_0.2.6-1~jan+nur1.dsc
- python-clickhouse-driver_0.2.6-1~jan+nur1_amd64.buildinfo
- python3-clickhouse-driver-dbgsym_0.2.6-1~jan+nur1_amd64.deb
- python3-clickhouse-driver-doc_0.2.6-1~jan+nur1_all.deb
- python3-clickhouse-driver_0.2.6-1~jan+nur1_amd64.deb
- python-clickhouse-driver_0.2.6-1~jan+nur1_amd64.changes
Diff
diff --git a/.flake8 b/.flake8
index bf09a38..1125795 100644
--- a/.flake8
+++ b/.flake8
@@ -1,5 +1,6 @@
[flake8]
filename = *.py, *.pyx
+exclude = perf
per-file-ignores =
clickhouse_driver/columns/largeint.pyx: E225, E226, E227, E999
clickhouse_driver/bufferedreader.pyx: E225, E226, E227, E999
diff --git a/.github/FUNDING.yml b/.github/FUNDING.yml
new file mode 100644
index 0000000..fbd41c7
--- /dev/null
+++ b/.github/FUNDING.yml
@@ -0,0 +1,2 @@
+patreon: xzkostyan
+
diff --git a/.github/workflows/actions.yml b/.github/workflows/actions.yml
index f27136a..0d359a7 100644
--- a/.github/workflows/actions.yml
+++ b/.github/workflows/actions.yml
@@ -2,13 +2,12 @@ on: [push, pull_request]
name: build
jobs:
tests:
- runs-on: ubuntu-20.04
+ runs-on: ubuntu-22.04
strategy:
matrix:
use-numpy:
- 0
python-version:
- - "3.6"
- "3.7"
- "3.8"
- "3.9"
@@ -17,10 +16,11 @@ jobs:
- "pypy-3.6"
- "pypy-3.7"
clickhouse-version:
+ - 23.2.6.34
+ - 22.9.5.25
- 22.2.3.5
- 21.12.3.32
- 21.9.3.30
- - 21.9.3.30
- 21.4.6.55
- 21.3.10.1
- 21.2.10.48
@@ -118,7 +118,7 @@ jobs:
valgrind:
name: Valgrind check
needs: tests
- runs-on: ubuntu-20.04
+ runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v2
- name: Set up Python
@@ -166,7 +166,7 @@ jobs:
wheels-linux:
name: Wheels for Linux
needs: valgrind
- runs-on: ubuntu-20.04
+ runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v2
- name: Set up Python
@@ -279,7 +279,7 @@ jobs:
wheels-linux-non-x86:
name: Wheels for Linux non-x86
needs: valgrind
- runs-on: ubuntu-20.04
+ runs-on: ubuntu-22.04
strategy:
matrix:
arch:
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
new file mode 100644
index 0000000..ebe9c4a
--- /dev/null
+++ b/.github/workflows/docs.yml
@@ -0,0 +1,21 @@
+on: [push, pull_request]
+name: build-docs
+jobs:
+ tests:
+ runs-on: ubuntu-22.04
+ name: Build docs
+ steps:
+ - uses: actions/checkout@v2
+ - name: Set up Python
+ uses: actions/setup-python@v2
+ with:
+ python-version: 3.11
+ architecture: x64
+ - name: Update tools
+ run: pip install --upgrade pip setuptools wheel
+ - name: Install sphinx
+ run: pip install sphinx
+ - name: Install package
+ run: pip install -e .
+ - name: Build docs
+ run: cd docs && make html
diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..9e00340
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,92 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+env/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# PyInstaller
+# Usually these files are written by a python script from a template
+# before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*,cover
+.hypothesis/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# IPython Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# dotenv
+.env
+
+# virtualenv
+venv/
+ENV/
+
+# Spyder project settings
+.spyderproject
+
+# Rope project settings
+.ropeproject
+
+# PyCharm project settings
+.idea/
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 51fb53d..ee1f46f 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,29 @@
## Unreleased
+## [0.2.6] - 2023-05-02
+### Added
+- JSON type. Solves issue [#320](https://github.com/mymarilyn/clickhouse-driver/issues/300).
+- On demand client revision downgrading.
+- Server-side query templating.
+- Sparse data type deserialization.
+- TCP keepalive.
+- [NumPy] Optional dataframe column names replacing. Pull request [#361](https://github.com/mymarilyn/clickhouse-driver/pull/361) by [notsovitalik](https://github.com/notsovitalik).
+- Substitution for parameters with time type. Solves issue [#359](https://github.com/mymarilyn/clickhouse-driver/issues/359). Pull request [#360](https://github.com/mymarilyn/clickhouse-driver/pull/360) by [ghazi-git](https://github.com/ghazi-git).
+
+### Fixed
+- Client construction with round_robin. Solves issue [#373](https://github.com/mymarilyn/clickhouse-driver/issues/373).
+- Column state prefixes reading and writing. Solves issue [#372](https://github.com/mymarilyn/clickhouse-driver/issues/372).
+- Inserts to a nullable LowCardinality columns. Solves issue [#363](https://github.com/mymarilyn/clickhouse-driver/issues/363). Pull request [#366](https://github.com/mymarilyn/clickhouse-driver/pull/366) by [Dmitry-k42](https://github.com/Dmitry-k42).
+- [NumPy] Avoid unnecessary timezone conversion. Solves issue [#354](https://github.com/mymarilyn/clickhouse-driver/issues/354). Pull request [#355](https://github.com/mymarilyn/clickhouse-driver/pull/355) by [edwinwzhe](https://github.com/edwinwzhe).
+
+### Changed
+- Protocol version bumped to 54459.
+- [NumPy] Speed-up reading Datetime64. Pull request [#365](https://github.com/mymarilyn/clickhouse-driver/pull/365) by [joelgibson](https://github.com/joelgibson).
+
+### Removed
+- Python 3.6 support.
+
## [0.2.5] - 2022-11-27
### Added
- [NumPy] More readable exception for less columns in frame. Solves issue [#320](https://github.com/mymarilyn/clickhouse-driver/issues/320).
@@ -420,7 +443,8 @@
- Date/DateTime types.
- String types.
-[Unreleased]: https://github.com/mymarilyn/clickhouse-driver/compare/0.2.5...HEAD
+[Unreleased]: https://github.com/mymarilyn/clickhouse-driver/compare/0.2.6...HEAD
+[0.2.6]: https://github.com/mymarilyn/clickhouse-driver/compare/0.2.5...0.2.6
[0.2.5]: https://github.com/mymarilyn/clickhouse-driver/compare/0.2.4...0.2.5
[0.2.4]: https://github.com/mymarilyn/clickhouse-driver/compare/0.2.3...0.2.4
[0.2.3]: https://github.com/mymarilyn/clickhouse-driver/compare/0.2.2...0.2.3
diff --git a/clickhouse_driver/__init__.py b/clickhouse_driver/__init__.py
index 38bd539..0f31ea2 100644
--- a/clickhouse_driver/__init__.py
+++ b/clickhouse_driver/__init__.py
@@ -3,7 +3,7 @@ from .client import Client
from .dbapi import connect
-VERSION = (0, 2, 5)
+VERSION = (0, 2, 6)
__version__ = '.'.join(str(x) for x in VERSION)
__all__ = ['Client', 'connect']
diff --git a/clickhouse_driver/block.py b/clickhouse_driver/block.py
index 5c63291..15589e9 100644
--- a/clickhouse_driver/block.py
+++ b/clickhouse_driver/block.py
@@ -1,7 +1,7 @@
+from .columns.util import get_inner_spec, get_inner_columns_with_types
from .reader import read_varint, read_binary_uint8, read_binary_int32
from .varint import write_varint
from .writer import write_binary_uint8, write_binary_int32
-from .columns import nestedcolumn
class BlockInfo(object):
@@ -172,7 +172,8 @@ class RowOrientedBlock(BaseBlock):
for name, type_ in columns_with_types:
cwt = None
if type_.startswith('Nested'):
- cwt = nestedcolumn.get_columns_with_types(type_)
+ inner_spec = get_inner_spec('Nested', type_)
+ cwt = get_inner_columns_with_types(inner_spec)
columns_with_cwt.append((name, cwt))
for i, row in enumerate(data):
diff --git a/clickhouse_driver/client.py b/clickhouse_driver/client.py
index a4d055b..ec6cf63 100644
--- a/clickhouse_driver/client.py
+++ b/clickhouse_driver/client.py
@@ -57,6 +57,12 @@ class Client(object):
* ``round_robin`` -- If ``alt_hosts`` are provided the query will be
executed on host picked with round-robin algorithm.
New in version *0.2.5*.
+ * ``namedtuple_as_json`` -- Controls named tuple and nested types
+ deserialization. To interpret these column alongside
+ with ``allow_experimental_object_type=1`` as Python
+ tuple set ``namedtuple_as_json`` to ``False``.
+ Default: True.
+ New in version *0.2.6*.
"""
available_client_settings = (
@@ -67,7 +73,8 @@ class Client(object):
'opentelemetry_traceparent',
'opentelemetry_tracestate',
'quota_key',
- 'input_format_null_as_default'
+ 'input_format_null_as_default',
+ 'namedtuple_as_json'
)
def __init__(self, *args, **kwargs):
@@ -97,6 +104,9 @@ class Client(object):
),
'input_format_null_as_default': self.settings.pop(
'input_format_null_as_default', False
+ ),
+ 'namedtuple_as_json': self.settings.pop(
+ 'namedtuple_as_json', True
)
}
@@ -125,13 +135,19 @@ class Client(object):
url = urlparse('clickhouse://' + host)
connection_kwargs = kwargs.copy()
- if len(args) > 2:
- # port as positional argument
+ num_args = len(args)
+ if num_args >= 2:
+ # host and port as positional arguments
connection_args = (url.hostname, url.port) + args[2:]
- else:
- # port as keyword argument
+ elif num_args >= 1:
+ # host as positional and port as keyword argument
connection_args = (url.hostname, ) + args[1:]
connection_kwargs['port'] = url.port
+ else:
+ # host and port as keyword arguments
+ connection_args = tuple()
+ connection_kwargs['host'] = url.hostname
+ connection_kwargs['port'] = url.port
connection = Connection(*connection_args, **connection_kwargs)
self.connections.append(connection)
@@ -438,7 +454,7 @@ class Client(object):
def query_dataframe(
self, query, params=None, external_tables=None, query_id=None,
- settings=None):
+ settings=None, replace_nonwords=True):
"""
*New in version 0.2.0.*
@@ -453,6 +469,8 @@ class Client(object):
ClickHouse server will generate it.
:param settings: dictionary of query settings.
Defaults to ``None`` (no additional settings).
+ :param replace_nonwords: boolean to replace non-words in column names
+ to underscores. Defaults to ``True``.
:return: pandas DataFrame.
"""
@@ -467,7 +485,10 @@ class Client(object):
settings=settings
)
- columns = [re.sub(r'\W', '_', name) for name, type_ in columns]
+ columns = [name for name, type_ in columns]
+ if replace_nonwords:
+ columns = [re.sub(r'\W', '_', x) for x in columns]
+
return pd.DataFrame(
{col: d for d, col in zip(data, columns)}, columns=columns
)
@@ -529,7 +550,7 @@ class Client(object):
query, params, self.connection.context
)
- self.connection.send_query(query, query_id=query_id)
+ self.connection.send_query(query, query_id=query_id, params=params)
self.connection.send_external_tables(external_tables,
types_check=types_check)
return self.receive_result(with_column_types=with_column_types,
@@ -544,8 +565,7 @@ class Client(object):
query = self.substitute_params(
query, params, self.connection.context
)
-
- self.connection.send_query(query, query_id=query_id)
+ self.connection.send_query(query, query_id=query_id, params=params)
self.connection.send_external_tables(external_tables,
types_check=types_check)
return self.receive_result(with_column_types=with_column_types,
@@ -561,7 +581,7 @@ class Client(object):
query, params, self.connection.context
)
- self.connection.send_query(query, query_id=query_id)
+ self.connection.send_query(query, query_id=query_id, params=params)
self.connection.send_external_tables(external_tables,
types_check=types_check)
return self.iter_receive_result(with_column_types=with_column_types)
@@ -773,6 +793,17 @@ class Client(object):
elif name == 'settings_is_important':
kwargs[name] = asbool(value)
+ elif name == 'tcp_keepalive':
+ try:
+ kwargs[name] = asbool(value)
+ except ValueError:
+ parts = value.split(',')
+ kwargs[name] = (
+ float(parts[0]), float(parts[1]), int(parts[2])
+ )
+ elif name == 'client_revision':
+ kwargs[name] = int(value)
+
# ssl
elif name == 'verify':
kwargs[name] = asbool(value)
diff --git a/clickhouse_driver/clientinfo.py b/clickhouse_driver/clientinfo.py
index da327f0..3deabe9 100644
--- a/clickhouse_driver/clientinfo.py
+++ b/clickhouse_driver/clientinfo.py
@@ -28,14 +28,13 @@ class ClientInfo(object):
client_version_major = defines.CLIENT_VERSION_MAJOR
client_version_minor = defines.CLIENT_VERSION_MINOR
client_version_patch = defines.CLIENT_VERSION_PATCH
- client_revision = defines.CLIENT_REVISION
interface = Interface.TCP
initial_user = ''
initial_query_id = ''
initial_address = '0.0.0.0:0'
- def __init__(self, client_name, context):
+ def __init__(self, client_name, context, client_revision):
self.query_kind = ClientInfo.QueryKind.NO_QUERY
try:
@@ -44,6 +43,7 @@ class ClientInfo(object):
self.os_user = ''
self.client_hostname = socket.gethostname()
self.client_name = client_name
+ self.client_revision = client_revision
self.client_trace_context = OpenTelemetryTraceContext(
context.client_settings['opentelemetry_traceparent'],
diff --git a/clickhouse_driver/columns/arraycolumn.py b/clickhouse_driver/columns/arraycolumn.py
index 9553b76..0ce336e 100644
--- a/clickhouse_driver/columns/arraycolumn.py
+++ b/clickhouse_driver/columns/arraycolumn.py
@@ -28,7 +28,8 @@ class ArrayColumn(Column):
py_types = (list, tuple)
def __init__(self, nested_column, **kwargs):
- self.size_column = UInt64Column()
+ self.init_kwargs = kwargs
+ self.size_column = UInt64Column(**kwargs)
self.nested_column = nested_column
self._write_depth_0_size = True
super(ArrayColumn, self).__init__(**kwargs)
@@ -37,17 +38,21 @@ class ArrayColumn(Column):
def write_data(self, data, buf):
# Column of Array(T) is stored in "compact" format and passed to server
# wrapped into another Array without size of wrapper array.
- self.nested_column = ArrayColumn(self.nested_column)
+ self.nested_column = ArrayColumn(
+ self.nested_column, **self.init_kwargs
+ )
self.nested_column.nullable = self.nullable
self.nullable = False
self._write_depth_0_size = False
self._write(data, buf)
- def read_data(self, rows, buf):
- self.nested_column = ArrayColumn(self.nested_column)
+ def read_data(self, n_rows, buf):
+ self.nested_column = ArrayColumn(
+ self.nested_column, **self.init_kwargs
+ )
self.nested_column.nullable = self.nullable
self.nullable = False
- return self._read(rows, buf)[0]
+ return self._read(n_rows, buf)[0]
def _write_sizes(self, value, buf):
nulls_map = []
@@ -106,9 +111,13 @@ class ArrayColumn(Column):
self._write_data(value, buf)
def read_state_prefix(self, buf):
- return self.nested_column.read_state_prefix(buf)
+ super(ArrayColumn, self).read_state_prefix(buf)
+
+ self.nested_column.read_state_prefix(buf)
def write_state_prefix(self, buf):
+ super(ArrayColumn, self).write_state_prefix(buf)
+
self.nested_column.write_state_prefix(buf)
def _read(self, size, buf):
diff --git a/clickhouse_driver/columns/base.py b/clickhouse_driver/columns/base.py
index c5d0888..fba3fd2 100644
--- a/clickhouse_driver/columns/base.py
+++ b/clickhouse_driver/columns/base.py
@@ -1,6 +1,61 @@
from struct import Struct, error as struct_error
from . import exceptions
+from ..varint import read_varint
+
+
+class CommonSerialization(object):
+ def __init__(self, column):
+ self.column = column
+ super(CommonSerialization, self).__init__()
+
+ def read_sparse(self, n_items, buf):
+ return n_items
+
+ def apply_sparse(self, items):
+ return items
+
+
+class SparseSerialization(CommonSerialization):
+
+ def __init__(self, *args, **kwargs):
+ self.sparse_indexes = []
+ self.items_total = None
+ super(SparseSerialization, self).__init__(*args, **kwargs)
+
+ def read_sparse(self, n_items, buf):
+ sparse_indexes = []
+ items_total = 0
+ non_default_items = 0
+
+ END_OF_GRANULE_FLAG = 1 << 62
+ end_of_granule = False
+
+ while not end_of_granule:
+ group_size = read_varint(buf)
+ end_of_granule = group_size & END_OF_GRANULE_FLAG
+ group_size &= ~END_OF_GRANULE_FLAG
+
+ items_total += group_size + 1
+ if not end_of_granule:
+ non_default_items += 1
+ sparse_indexes.append(items_total)
+
+ self.sparse_indexes = sparse_indexes
+ self.items_total = items_total
+
+ return non_default_items
+
+ def apply_sparse(self, items):
+ default = self.column.null_value
+ if self.column.after_read_items:
+ default = self.column.after_read_items([default])[0]
+
+ rv = [default] * (self.items_total - 1)
+ for item_number, i in enumerate(self.sparse_indexes):
+ rv[i - 1] = items[item_number]
+
+ return rv
class Column(object):
@@ -15,14 +70,17 @@ class Column(object):
null_value = 0
- def __init__(self, types_check=False, **kwargs):
+ def __init__(self, types_check=False, has_custom_serialization=False,
+ **kwargs):
self.nullable = False
self.types_check_enabled = types_check
+ self.has_custom_serialization = has_custom_serialization
+ self.serialization = CommonSerialization(self)
self.input_null_as_default = False
- if 'context' in kwargs:
- settings = kwargs['context'].client_settings
- self.input_null_as_default = settings \
- .get('input_format_null_as_default', False)
+
+ self.context = kwargs['context']
+ self.input_null_as_default = self.context.client_settings \
+ .get('input_format_null_as_default', False)
super(Column, self).__init__()
@@ -94,12 +152,15 @@ class Column(object):
raise NotImplementedError
def read_data(self, n_items, buf):
+ n_items = self.serialization.read_sparse(n_items, buf)
+
if self.nullable:
nulls_map = self._read_nulls_map(n_items, buf)
else:
nulls_map = None
- return self._read_data(n_items, buf, nulls_map=nulls_map)
+ items = self._read_data(n_items, buf, nulls_map=nulls_map)
+ return self.serialization.apply_sparse(items)
def _read_data(self, n_items, buf, nulls_map=None):
items = self.read_items(n_items, buf)
@@ -117,7 +178,10 @@ class Column(object):
raise NotImplementedError
def read_state_prefix(self, buf):
- pass
+ if self.has_custom_serialization:
+ use_custom_serialization = read_varint(buf)
+ if use_custom_serialization:
+ self.serialization = SparseSerialization(self)
def write_state_prefix(self, buf):
pass
diff --git a/clickhouse_driver/columns/jsoncolumn.py b/clickhouse_driver/columns/jsoncolumn.py
new file mode 100644
index 0000000..080b407
--- /dev/null
+++ b/clickhouse_driver/columns/jsoncolumn.py
@@ -0,0 +1,37 @@
+from .base import Column
+from .stringcolumn import String
+from ..reader import read_binary_uint8, read_binary_str
+from ..util.compat import json
+from ..writer import write_binary_uint8
+
+
+class JsonColumn(Column):
+ py_types = (dict, )
+
+ # No NULL value actually
+ null_value = {}
+
+ def __init__(self, column_by_spec_getter, **kwargs):
+ self.column_by_spec_getter = column_by_spec_getter
+ self.string_column = String(**kwargs)
+ super(JsonColumn, self).__init__(**kwargs)
+
+ def write_state_prefix(self, buf):
+ # Read in binary format.
+ # Write in text format.
+ write_binary_uint8(1, buf)
+
+ def read_items(self, n_items, buf):
+ read_binary_uint8(buf)
+ spec = read_binary_str(buf)
+ col = self.column_by_spec_getter(spec)
+ col.read_state_prefix(buf)
+ return col.read_data(n_items, buf)
+
+ def write_items(self, items, buf):
+ items = [x if isinstance(x, str) else json.dumps(x) for x in items]
+ self.string_column.write_items(items, buf)
+
+
+def create_json_column(spec, column_by_spec_getter, column_options):
+ return JsonColumn(column_by_spec_getter, **column_options)
diff --git a/clickhouse_driver/columns/lowcardinalitycolumn.py b/clickhouse_driver/columns/lowcardinalitycolumn.py
index 4e131be..43134f7 100644
--- a/clickhouse_driver/columns/lowcardinalitycolumn.py
+++ b/clickhouse_driver/columns/lowcardinalitycolumn.py
@@ -35,25 +35,32 @@ class LowCardinalityColumn(Column):
serialization_type = has_additional_keys_bit | need_update_dictionary
def __init__(self, nested_column, **kwargs):
+ self.init_kwargs = kwargs
self.nested_column = nested_column
super(LowCardinalityColumn, self).__init__(**kwargs)
def read_state_prefix(self, buf):
- return read_binary_uint64(buf)
+ super(LowCardinalityColumn, self).read_state_prefix(buf)
+
+ read_binary_uint64(buf)
def write_state_prefix(self, buf):
+ super(LowCardinalityColumn, self).write_state_prefix(buf)
+
# KeysSerializationVersion. See ClickHouse docs.
write_binary_int64(1, buf)
def _write_data(self, items, buf):
index, keys = [], []
key_by_index_element = {}
+ nested_is_nullable = False
if self.nested_column.nullable:
# First element represents NULL if column is nullable.
index.append(self.nested_column.null_value)
# Prevent null map writing. Reset nested column nullable flag.
self.nested_column.nullable = False
+ nested_is_nullable = True
for x in items:
if x is None:
@@ -87,14 +94,26 @@ class LowCardinalityColumn(Column):
return
int_type = int(log(len(index), 2) / 8)
- int_column = self.int_types[int_type]()
+ int_column = self.int_types[int_type](**self.init_kwargs)
serialization_type = self.serialization_type | int_type
write_binary_int64(serialization_type, buf)
write_binary_int64(len(index), buf)
- self.nested_column.write_data(index, buf)
+ if nested_is_nullable:
+ # Given we reset nested column nullable flag above,
+ # we need to write null map manually. If to invoke
+ # write_data method, it will cause an exception,
+ # because `prepare_data` may not be able to handle
+ # null value correctly.
+ self.nested_column.write_items(
+ [self.nested_column.null_value], buf)
+ # Remove null map from index, because it is already written.
+ index_to_write = index[1:]
+ self.nested_column.write_data(index_to_write, buf)
+ else:
+ self.nested_column.write_data(index, buf)
write_binary_int64(len(items), buf)
int_column.write_items(keys, buf)
@@ -106,7 +125,7 @@ class LowCardinalityColumn(Column):
# Lowest byte contains info about key type.
key_type = serialization_type & 0xf
- keys_column = self.int_types[key_type]()
+ keys_column = self.int_types[key_type](**self.init_kwargs)
nullable = self.nested_column.nullable
# Prevent null map reading. Reset nested column nullable flag.
diff --git a/clickhouse_driver/columns/mapcolumn.py b/clickhouse_driver/columns/mapcolumn.py
index f23f5d7..30f69bd 100644
--- a/clickhouse_driver/columns/mapcolumn.py
+++ b/clickhouse_driver/columns/mapcolumn.py
@@ -13,16 +13,20 @@ class MapColumn(Column):
null_value = {}
def __init__(self, key_column, value_column, **kwargs):
- self.offset_column = UInt64Column()
+ self.offset_column = UInt64Column(**kwargs)
self.key_column = key_column
self.value_column = value_column
super(MapColumn, self).__init__(**kwargs)
def read_state_prefix(self, buf):
+ super(MapColumn, self).read_state_prefix(buf)
+
self.key_column.read_state_prefix(buf)
self.value_column.read_state_prefix(buf)
def write_state_prefix(self, buf):
+ super(MapColumn, self).write_state_prefix(buf)
+
self.key_column.write_state_prefix(buf)
self.value_column.write_state_prefix(buf)
@@ -57,7 +61,7 @@ class MapColumn(Column):
def create_map_column(spec, column_by_spec_getter, column_options):
- # Match commas outside of parentheses so we don't match the comma in
+ # Match commas outside of parentheses, so we don't match the comma in
# Decimal types.
key, value = comma_re.split(spec[4:-1])
key_column = column_by_spec_getter(key.strip())
diff --git a/clickhouse_driver/columns/nestedcolumn.py b/clickhouse_driver/columns/nestedcolumn.py
index 809e62a..069db3f 100644
--- a/clickhouse_driver/columns/nestedcolumn.py
+++ b/clickhouse_driver/columns/nestedcolumn.py
@@ -1,21 +1,10 @@
from .arraycolumn import create_array_column
-from .util import get_inner_spec, get_inner_columns, \
- get_inner_columns_with_types
+from .util import get_inner_spec
def create_nested_column(spec, column_by_spec_getter, column_options):
return create_array_column(
- 'Array(Tuple({}))'.format(','.join(get_nested_columns(spec))),
+ 'Array(Tuple({}))'.format(get_inner_spec('Nested', spec)),
column_by_spec_getter, column_options
)
-
-
-def get_nested_columns(spec):
- inner_spec = get_inner_spec('Nested', spec)
- return get_inner_columns(inner_spec)
-
-
-def get_columns_with_types(spec):
- inner_spec = get_inner_spec('Nested', spec)
- return get_inner_columns_with_types(inner_spec)
diff --git a/clickhouse_driver/columns/numpy/datetimecolumn.py b/clickhouse_driver/columns/numpy/datetimecolumn.py
index 5fcae67..b6f14fe 100644
--- a/clickhouse_driver/columns/numpy/datetimecolumn.py
+++ b/clickhouse_driver/columns/numpy/datetimecolumn.py
@@ -21,12 +21,12 @@ class NumpyDateTimeColumnBase(NumpyColumn):
def apply_timezones_after_read(self, dt):
timezone = self.timezone if self.timezone else self.local_timezone
- ts = pd.to_datetime(dt, utc=True).tz_convert(timezone)
-
- if self.offset_naive:
+ if self.offset_naive and timezone.zone != 'UTC':
+ ts = pd.to_datetime(dt, utc=True).tz_convert(timezone)
ts = ts.tz_localize(None)
+ return ts.to_numpy(self.datetime_dtype)
- return ts.to_numpy(self.datetime_dtype)
+ return dt
def apply_timezones_before_write(self, items):
if isinstance(items, pd.DatetimeIndex):
@@ -65,7 +65,7 @@ class NumpyDateTimeColumn(NumpyDateTimeColumnBase):
class NumpyDateTime64Column(NumpyDateTimeColumnBase):
- dtype = np.dtype(np.uint64)
+ dtype = np.dtype(np.int64)
datetime_dtype = 'datetime64[ns]'
max_scale = 9
@@ -75,15 +75,15 @@ class NumpyDateTime64Column(NumpyDateTimeColumnBase):
super(NumpyDateTime64Column, self).__init__(**kwargs)
def read_items(self, n_items, buf):
- scale = 10 ** self.scale
- frac_scale = 10 ** (self.max_scale - self.scale)
-
+ # Clickhouse: t seconds is represented as t * 10^scale.
+ # datetime64[ns]: t seconds is represented as t * 10^9.
+ # Since 0 <= scale <= 9, multiply by the integer 10^(9 - scale).
items = super(NumpyDateTime64Column, self).read_items(n_items, buf)
- seconds = (items // scale).astype('datetime64[s]')
- microseconds = ((items % scale) * frac_scale).astype('timedelta64[ns]')
+ tmp = np.copy(items)
+ tmp *= 10 ** (9 - self.scale)
+ dt = tmp.view(dtype='datetime64[ns]')
- dt = seconds + microseconds
return self.apply_timezones_after_read(dt)
def write_items(self, items, buf):
@@ -120,12 +120,12 @@ def create_numpy_datetime_column(spec, column_options):
context = column_options['context']
- tz_name = timezone = None
+ tz_name = None
offset_naive = True
# As Numpy do not use local timezone for converting timestamp to
# datetime we need always detect local timezone for manual converting.
- local_timezone = get_localzone_name_compat()
+ local_tz_name = get_localzone_name_compat()
# Use column's timezone if it's specified.
if spec and spec[-1] == ')':
@@ -133,11 +133,11 @@ def create_numpy_datetime_column(spec, column_options):
offset_naive = False
else:
if not context.settings.get('use_client_time_zone', False):
- if local_timezone != context.server_info.timezone:
+ if local_tz_name != context.server_info.timezone:
tz_name = context.server_info.timezone
- if tz_name:
- timezone = get_timezone(tz_name)
+ timezone = get_timezone(tz_name) if tz_name else None
+ local_timezone = get_timezone(local_tz_name) if local_tz_name else None
return cls(timezone=timezone, offset_naive=offset_naive,
local_timezone=local_timezone, **column_options)
diff --git a/clickhouse_driver/columns/numpy/lowcardinalitycolumn.py b/clickhouse_driver/columns/numpy/lowcardinalitycolumn.py
index 781372a..7d06fae 100644
--- a/clickhouse_driver/columns/numpy/lowcardinalitycolumn.py
+++ b/clickhouse_driver/columns/numpy/lowcardinalitycolumn.py
@@ -37,7 +37,7 @@ class NumpyLowCardinalityColumn(LowCardinalityColumn):
c = pd.Categorical(items)
int_type = int(log(len(c.codes), 2) / 8)
- int_column = self.int_types[int_type]()
+ int_column = self.int_types[int_type](**self.init_kwargs)
serialization_type = self.serialization_type | int_type
@@ -66,7 +66,7 @@ class NumpyLowCardinalityColumn(LowCardinalityColumn):
# Lowest byte contains info about key type.
key_type = serialization_type & 0xf
- keys_column = self.int_types[key_type]()
+ keys_column = self.int_types[key_type](**self.init_kwargs)
nullable = self.nested_column.nullable
# Prevent null map reading. Reset nested column nullable flag.
diff --git a/clickhouse_driver/columns/service.py b/clickhouse_driver/columns/service.py
index 6541dc9..5d8b78e 100644
--- a/clickhouse_driver/columns/service.py
+++ b/clickhouse_driver/columns/service.py
@@ -15,6 +15,7 @@ from .intcolumn import (
UInt8Column, UInt16Column, UInt32Column, UInt64Column
)
from .lowcardinalitycolumn import create_low_cardinality_column
+from .jsoncolumn import create_json_column
from .mapcolumn import create_map_column
from .nothingcolumn import NothingColumn
from .nullcolumn import NullColumn
@@ -122,6 +123,11 @@ def get_column_by_spec(spec, column_options, use_numpy=None):
spec, create_column_with_options, column_options
)
+ elif spec.startswith("Object('json')"):
+ return create_json_column(
+ spec, create_column_with_options, column_options
+ )
+
else:
for alias, primitive in aliases:
if spec.startswith(alias):
@@ -137,8 +143,12 @@ def get_column_by_spec(spec, column_options, use_numpy=None):
raise errors.UnknownTypeError('Unknown type {}'.format(spec))
-def read_column(context, column_spec, n_items, buf, use_numpy=None):
- column_options = {'context': context}
+def read_column(context, column_spec, n_items, buf, use_numpy=None,
+ has_custom_serialization=False):
+ column_options = {
+ 'context': context,
+ 'has_custom_serialization': has_custom_serialization
+ }
col = get_column_by_spec(column_spec, column_options, use_numpy=use_numpy)
col.read_state_prefix(buf)
return col.read_data(n_items, buf)
diff --git a/clickhouse_driver/columns/tuplecolumn.py b/clickhouse_driver/columns/tuplecolumn.py
index d48ced4..5ff4636 100644
--- a/clickhouse_driver/columns/tuplecolumn.py
+++ b/clickhouse_driver/columns/tuplecolumn.py
@@ -1,13 +1,21 @@
from .base import Column
-from .util import get_inner_spec, get_inner_columns
+from .util import get_inner_spec, get_inner_columns_with_types
class TupleColumn(Column):
py_types = (list, tuple)
- def __init__(self, nested_columns, **kwargs):
+ def __init__(self, names, nested_columns, **kwargs):
+ self.names = names
self.nested_columns = nested_columns
+ client_settings = kwargs['context'].client_settings
+ settings = kwargs['context'].settings
+ self.namedtuple_as_json = (
+ settings.get('allow_experimental_object_type', False) and
+ client_settings.get('namedtuple_as_json', True)
+ )
+
super(TupleColumn, self).__init__(**kwargs)
self.null_value = tuple(x.null_value for x in nested_columns)
@@ -23,15 +31,33 @@ class TupleColumn(Column):
def read_data(self, n_items, buf):
rv = [x.read_data(n_items, buf) for x in self.nested_columns]
- return list(zip(*rv))
+ rv = list(zip(*rv))
+
+ if self.names[0] and self.namedtuple_as_json:
+ return [dict(zip(self.names, x)) for x in rv]
+ else:
+ return rv
def read_items(self, n_items, buf):
return self.read_data(n_items, buf)
+ def read_state_prefix(self, buf):
+ super(TupleColumn, self).read_state_prefix(buf)
+
+ for x in self.nested_columns:
+ x.read_state_prefix(buf)
+
+ def write_state_prefix(self, buf):
+ super(TupleColumn, self).write_state_prefix(buf)
+
+ for x in self.nested_columns:
+ x.write_state_prefix(buf)
+
def create_tuple_column(spec, column_by_spec_getter, column_options):
inner_spec = get_inner_spec('Tuple', spec)
- columns = get_inner_columns(inner_spec)
+ columns_with_types = get_inner_columns_with_types(inner_spec)
+ names, types = zip(*columns_with_types)
- return TupleColumn([column_by_spec_getter(x) for x in columns],
+ return TupleColumn(names, [column_by_spec_getter(x) for x in types],
**column_options)
diff --git a/clickhouse_driver/connection.py b/clickhouse_driver/connection.py
index 16c1321..4d8d347 100644
--- a/clickhouse_driver/connection.py
+++ b/clickhouse_driver/connection.py
@@ -3,6 +3,7 @@ import socket
import ssl
from collections import deque
from contextlib import contextmanager
+from sys import platform
from time import time
from urllib.parse import urlparse
@@ -21,9 +22,10 @@ from .protocol import Compression, ClientPacketTypes, ServerPacketTypes
from .queryprocessingstage import QueryProcessingStage
from .reader import read_binary_str
from .readhelpers import read_exception
-from .settings.writer import write_settings
+from .settings.writer import write_settings, SettingsFlags
from .streams.native import BlockInputStream, BlockOutputStream
from .util.compat import threading
+from .util.escape import escape_params
from .varint import write_varint, read_varint
from .writer import write_binary_str
@@ -44,7 +46,7 @@ class Packet(object):
class ServerInfo(object):
def __init__(self, name, version_major, version_minor, version_patch,
- revision, timezone, display_name):
+ revision, timezone, display_name, used_revision):
self.name = name
self.version_major = version_major
self.version_minor = version_minor
@@ -52,6 +54,7 @@ class ServerInfo(object):
self.revision = revision
self.timezone = timezone
self.display_name = display_name
+ self.used_revision = used_revision
super(ServerInfo, self).__init__()
@@ -66,6 +69,7 @@ class ServerInfo(object):
('name', self.name),
('version', version),
('revision', self.revision),
+ ('used revision', self.used_revision),
('timezone', self.timezone),
('display_name', self.display_name)
]
@@ -124,6 +128,15 @@ class Connection(object):
ignored, ``True`` means that the query will
fail with UNKNOWN_SETTING error.
Defaults to ``False``.
+ :param tcp_keepalive: enables `TCP keepalive <https://tldp.org/HOWTO/
+ TCP-Keepalive-HOWTO/overview.html>`_ on established
+ connection. If is set to ``True``` system keepalive
+ settings are used. You can also specify custom
+ keepalive setting with tuple:
+ ``(idle_time_sec, interval_sec, probes)``.
+ Defaults to ``False``.
+ :param client_revision: can be used for client version downgrading.
+ Defaults to ``None``.
"""
def __init__(
@@ -143,6 +156,8 @@ class Connection(object):
server_hostname=None,
alt_hosts=None,
settings_is_important=False,
+ tcp_keepalive=False,
+ client_revision=None
):
if secure:
default_port = defines.DEFAULT_SECURE_PORT
@@ -164,6 +179,10 @@ class Connection(object):
self.send_receive_timeout = send_receive_timeout
self.sync_request_timeout = sync_request_timeout
self.settings_is_important = settings_is_important
+ self.tcp_keepalive = tcp_keepalive
+ self.client_revision = min(
+ client_revision or defines.CLIENT_REVISION, defines.CLIENT_REVISION
+ )
self.secure_socket = secure
self.verify_cert = verify
@@ -310,6 +329,8 @@ class Connection(object):
# performance tweak
self.socket.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
+ if self.tcp_keepalive:
+ self._set_keepalive()
self.fin = BufferedSocketReader(self.socket, defines.BUFFER_SIZE)
self.fout = BufferedSocketWriter(self.socket, defines.BUFFER_SIZE)
@@ -317,10 +338,42 @@ class Connection(object):
self.send_hello()
self.receive_hello()
+ revision = self.server_info.used_revision
+ if revision >= defines.DBMS_MIN_PROTOCOL_VERSION_WITH_ADDENDUM:
+ self.send_addendum()
+
self.block_in = self.get_block_in_stream()
self.block_in_raw = BlockInputStream(self.fin, self.context)
self.block_out = self.get_block_out_stream()
+ def _set_keepalive(self):
+ self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
+
+ if not isinstance(self.tcp_keepalive, tuple):
+ return
+
+ idle_time_sec, interval_sec, probes = self.tcp_keepalive
+
+ if platform == 'linux' or platform == 'win32':
+ # This should also work for Windows
+ # starting with Windows 10, version 1709.
+ self.socket.setsockopt(
+ socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, idle_time_sec
+ )
+ self.socket.setsockopt(
+ socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, interval_sec
+ )
+ self.socket.setsockopt(
+ socket.IPPROTO_TCP, socket.TCP_KEEPCNT, probes
+ )
+
+ elif platform == 'darwin':
+ TCP_KEEPALIVE = 0x10
+ # Only interval is available in mac os.
+ self.socket.setsockopt(
+ socket.IPPROTO_TCP, TCP_KEEPALIVE, interval_sec
+ )
+
def _format_connection_error(self, e, host, port):
err = (e.strerror + ' ') if e.strerror else ''
return err + '({}:{})'.format(host, port)
@@ -410,7 +463,7 @@ class Connection(object):
write_varint(defines.CLIENT_VERSION_MINOR, self.fout)
# NOTE For backward compatibility of the protocol,
# client cannot send its version_patch.
- write_varint(defines.CLIENT_REVISION, self.fout)
+ write_varint(self.client_revision, self.fout)
write_binary_str(self.database, self.fout)
write_binary_str(self.user, self.fout)
write_binary_str(self.password, self.fout)
@@ -426,25 +479,27 @@ class Connection(object):
server_version_minor = read_varint(self.fin)
server_revision = read_varint(self.fin)
+ used_revision = min(self.client_revision, server_revision)
+
server_timezone = None
- if server_revision >= \
+ if used_revision >= \
defines.DBMS_MIN_REVISION_WITH_SERVER_TIMEZONE:
server_timezone = read_binary_str(self.fin)
server_display_name = ''
- if server_revision >= \
+ if used_revision >= \
defines.DBMS_MIN_REVISION_WITH_SERVER_DISPLAY_NAME:
server_display_name = read_binary_str(self.fin)
server_version_patch = server_revision
- if server_revision >= \
+ if used_revision >= \
defines.DBMS_MIN_REVISION_WITH_VERSION_PATCH:
server_version_patch = read_varint(self.fin)
self.server_info = ServerInfo(
server_name, server_version_major, server_version_minor,
server_version_patch, server_revision,
- server_timezone, server_display_name
+ server_timezone, server_display_name, used_revision
)
self.context.server_info = self.server_info
@@ -463,6 +518,14 @@ class Connection(object):
self.disconnect()
raise errors.UnexpectedPacketFromServerError(message)
+ def send_addendum(self):
+ revision = self.server_info.used_revision
+
+ if revision >= defines.DBMS_MIN_PROTOCOL_VERSION_WITH_QUOTA_KEY:
+ write_binary_str(
+ self.context.client_settings['quota_key'], self.fout
+ )
+
def ping(self):
timeout = self.sync_request_timeout
@@ -567,7 +630,7 @@ class Connection(object):
return BlockOutputStream(self.fout, self.context)
def receive_data(self, may_be_compressed=True, may_be_use_numpy=False):
- revision = self.server_info.revision
+ revision = self.server_info.used_revision
if revision >= defines.DBMS_MIN_REVISION_WITH_TEMPORARY_TABLES:
read_binary_str(self.fin)
@@ -581,7 +644,7 @@ class Connection(object):
def receive_progress(self):
progress = Progress()
- progress.read(self.server_info.revision, self.fin)
+ progress.read(self.server_info, self.fin)
return progress
def receive_profile_info(self):
@@ -597,14 +660,14 @@ class Connection(object):
start = time()
write_varint(ClientPacketTypes.DATA, self.fout)
- revision = self.server_info.revision
+ revision = self.server_info.used_revision
if revision >= defines.DBMS_MIN_REVISION_WITH_TEMPORARY_TABLES:
write_binary_str(table_name, self.fout)
self.block_out.write(block)
logger.debug('Block "%s" send time: %f', table_name, time() - start)
- def send_query(self, query, query_id=None):
+ def send_query(self, query, query_id=None, params=None):
if not self.connected:
self.connect()
@@ -612,9 +675,10 @@ class Connection(object):
write_binary_str(query_id or '', self.fout)
- revision = self.server_info.revision
+ revision = self.server_info.used_revision
if revision >= defines.DBMS_MIN_REVISION_WITH_CLIENT_INFO:
- client_info = ClientInfo(self.client_name, self.context)
+ client_info = ClientInfo(self.client_name, self.context,
+ client_revision=self.client_revision)
client_info.query_kind = ClientInfo.QueryKind.INITIAL_QUERY
client_info.write(revision, self.fout)
@@ -623,8 +687,11 @@ class Connection(object):
revision >= defines
.DBMS_MIN_REVISION_WITH_SETTINGS_SERIALIZED_AS_STRINGS
)
+ settings_flags = 0
+ if self.settings_is_important:
+ settings_flags |= SettingsFlags.IMPORTANT
write_settings(self.context.settings, self.fout, settings_as_strings,
- self.settings_is_important)
+ settings_flags)
if revision >= defines.DBMS_MIN_REVISION_WITH_INTERSERVER_SECRET:
write_binary_str('', self.fout)
@@ -634,6 +701,13 @@ class Connection(object):
write_binary_str(query, self.fout)
+ if revision >= defines.DBMS_MIN_PROTOCOL_VERSION_WITH_PARAMETERS:
+ # Always settings_as_strings = True
+ escaped = escape_params(
+ params or {}, self.context, for_server=True
+ )
+ write_settings(escaped, self.fout, True, SettingsFlags.CUSTOM)
+
logger.debug('Query: %s', query)
self.fout.flush()
diff --git a/clickhouse_driver/defines.py b/clickhouse_driver/defines.py
index a499ab5..3f72771 100644
--- a/clickhouse_driver/defines.py
+++ b/clickhouse_driver/defines.py
@@ -25,6 +25,10 @@ DBMS_MIN_PROTOCOL_VERSION_WITH_DISTRIBUTED_DEPTH = 54448
DBMS_MIN_PROTOCOL_VERSION_WITH_INITIAL_QUERY_START_TIME = 54449
DBMS_MIN_PROTOCOL_VERSION_WITH_INCREMENTAL_PROFILE_EVENTS = 54451
DBMS_MIN_REVISION_WITH_PARALLEL_REPLICAS = 54453
+DBMS_MIN_REVISION_WITH_CUSTOM_SERIALIZATION = 54454
+DBMS_MIN_PROTOCOL_VERSION_WITH_ADDENDUM = 54458
+DBMS_MIN_PROTOCOL_VERSION_WITH_QUOTA_KEY = 54458
+DBMS_MIN_PROTOCOL_VERSION_WITH_PARAMETERS = 54459
# Timeouts
DBMS_DEFAULT_CONNECT_TIMEOUT_SEC = 10
@@ -40,7 +44,7 @@ CLIENT_NAME = 'python-driver'
CLIENT_VERSION_MAJOR = 20
CLIENT_VERSION_MINOR = 10
CLIENT_VERSION_PATCH = 2
-CLIENT_REVISION = 54453
+CLIENT_REVISION = DBMS_MIN_PROTOCOL_VERSION_WITH_PARAMETERS
BUFFER_SIZE = 1048576
diff --git a/clickhouse_driver/progress.py b/clickhouse_driver/progress.py
index bbafc2f..571d1a4 100644
--- a/clickhouse_driver/progress.py
+++ b/clickhouse_driver/progress.py
@@ -12,11 +12,11 @@ class Progress(object):
super(Progress, self).__init__()
- def read(self, server_revision, fin):
+ def read(self, server_info, fin):
self.rows = read_varint(fin)
self.bytes = read_varint(fin)
- revision = server_revision
+ revision = server_info.used_revision
if revision >= defines.DBMS_MIN_REVISION_WITH_TOTAL_ROWS_IN_PROGRESS:
self.total_rows = read_varint(fin)
diff --git a/clickhouse_driver/settings/writer.py b/clickhouse_driver/settings/writer.py
index 6261784..4413c9e 100644
--- a/clickhouse_driver/settings/writer.py
+++ b/clickhouse_driver/settings/writer.py
@@ -7,13 +7,18 @@ from .available import settings as available_settings
logger = logging.getLogger(__name__)
-def write_settings(settings, buf, settings_as_strings, is_important=False):
+class SettingsFlags:
+ IMPORTANT = 0x1
+ CUSTOM = 0x2
+
+
+def write_settings(settings, buf, settings_as_strings, flags):
for setting, value in (settings or {}).items():
# If the server support settings as string we do not need to know
# anything about them, so we can write any setting.
if settings_as_strings:
write_binary_str(setting, buf)
- write_binary_uint8(int(is_important), buf)
+ write_binary_uint8(flags, buf)
write_binary_str(str(value), buf)
else:
diff --git a/clickhouse_driver/streams/native.py b/clickhouse_driver/streams/native.py
index bfd9a2c..7ad4f39 100644
--- a/clickhouse_driver/streams/native.py
+++ b/clickhouse_driver/streams/native.py
@@ -1,8 +1,8 @@
from ..block import ColumnOrientedBlock, BlockInfo
from ..columns.service import read_column, write_column
-from ..reader import read_binary_str
+from ..reader import read_binary_str, read_binary_uint8
from ..varint import write_varint, read_varint
-from ..writer import write_binary_str
+from ..writer import write_binary_str, write_binary_uint8
from .. import defines
@@ -14,7 +14,7 @@ class BlockOutputStream(object):
super(BlockOutputStream, self).__init__()
def write(self, block):
- revision = self.context.server_info.revision
+ revision = self.context.server_info.used_revision
if revision >= defines.DBMS_MIN_REVISION_WITH_BLOCK_INFO:
block.info.write(self.fout)
@@ -35,6 +35,11 @@ class BlockOutputStream(object):
except IndexError:
raise ValueError('Different rows length')
+ if revision >= \
+ defines.DBMS_MIN_REVISION_WITH_CUSTOM_SERIALIZATION:
+ # We write always sparse data without custom serialization.
+ write_binary_uint8(0, self.fout)
+
write_column(self.context, col_name, col_type, items,
self.fout, types_check=block.types_check)
@@ -54,7 +59,7 @@ class BlockInputStream(object):
def read(self, use_numpy=None):
info = BlockInfo()
- revision = self.context.server_info.revision
+ revision = self.context.server_info.used_revision
if revision >= defines.DBMS_MIN_REVISION_WITH_BLOCK_INFO:
info.read(self.fin)
@@ -70,9 +75,16 @@ class BlockInputStream(object):
names.append(column_name)
types.append(column_type)
+ has_custom_serialization = False
+ if revision >= defines.DBMS_MIN_REVISION_WITH_CUSTOM_SERIALIZATION:
+ has_custom_serialization = bool(read_binary_uint8(self.fin))
+
if n_rows:
- column = read_column(self.context, column_type, n_rows,
- self.fin, use_numpy=use_numpy)
+ column = read_column(
+ self.context, column_type, n_rows,
+ self.fin, use_numpy=use_numpy,
+ has_custom_serialization=has_custom_serialization
+ )
data.append(column)
if self.context.client_settings['use_numpy']:
diff --git a/clickhouse_driver/util/compat.py b/clickhouse_driver/util/compat.py
index bdcf3d7..08c2091 100644
--- a/clickhouse_driver/util/compat.py
+++ b/clickhouse_driver/util/compat.py
@@ -5,6 +5,18 @@ try:
except ImportError:
import dummy_threading as threading # noqa: F401
+import json # noqa: F401
+try:
+ import orjson as json # noqa: F811
+except ImportError:
+ pass
+
+try:
+ import ujson as json # noqa: F811,F401
+except ImportError:
+ pass
+
+
try:
# since tzlocal 4.0+
# this will avoid warning for get_localzone().key
diff --git a/clickhouse_driver/util/escape.py b/clickhouse_driver/util/escape.py
index 630e02b..16a1b93 100644
--- a/clickhouse_driver/util/escape.py
+++ b/clickhouse_driver/util/escape.py
@@ -1,5 +1,6 @@
-from datetime import date, datetime
+from datetime import date, datetime, time
from enum import Enum
+from functools import wraps
from uuid import UUID
from pytz import timezone
@@ -28,7 +29,24 @@ def escape_datetime(item, context):
return "'%s'" % item.strftime('%Y-%m-%d %H:%M:%S')
-def escape_param(item, context):
+def maybe_enquote_for_server(f):
+ @wraps(f)
+ def wrapper(*args, **kwargs):
+ rv = f(*args, **kwargs)
+
+ if kwargs.get('for_server'):
+ is_str = isinstance(rv, str)
+
+ if not is_str or (is_str and not rv.startswith("'")):
+ rv = "'%s'" % rv
+
+ return rv
+
+ return wrapper
+
+
+@maybe_enquote_for_server
+def escape_param(item, context, for_server=False):
if item is None:
return 'NULL'
@@ -38,17 +56,27 @@ def escape_param(item, context):
elif isinstance(item, date):
return "'%s'" % item.strftime('%Y-%m-%d')
+ elif isinstance(item, time):
+ return "'%s'" % item.strftime('%H:%M:%S')
+
elif isinstance(item, str):
+ # We need double escaping for server-side parameters.
+ if for_server:
+ item = ''.join(escape_chars_map.get(c, c) for c in item)
return "'%s'" % ''.join(escape_chars_map.get(c, c) for c in item)
elif isinstance(item, list):
- return "[%s]" % ', '.join(str(escape_param(x, context)) for x in item)
+ return "[%s]" % ', '.join(
+ str(escape_param(x, context, for_server=for_server)) for x in item
+ )
elif isinstance(item, tuple):
- return "(%s)" % ', '.join(str(escape_param(x, context)) for x in item)
+ return "(%s)" % ', '.join(
+ str(escape_param(x, context, for_server=for_server)) for x in item
+ )
elif isinstance(item, Enum):
- return escape_param(item.value, context)
+ return escape_param(item.value, context, for_server=for_server)
elif isinstance(item, UUID):
return "'%s'" % str(item)
@@ -57,10 +85,10 @@ def escape_param(item, context):
return item
-def escape_params(params, context):
+def escape_params(params, context, for_server=False):
escaped = {}
for key, value in params.items():
- escaped[key] = escape_param(value, context)
+ escaped[key] = escape_param(value, context, for_server=for_server)
return escaped
diff --git a/debian/changelog b/debian/changelog
index 0c60e03..b765868 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+python-clickhouse-driver (0.2.6-1) UNRELEASED; urgency=low
+
+ * New upstream release.
+
+ -- Debian Janitor <janitor@jelmer.uk> Wed, 24 May 2023 06:04:51 -0000
+
python-clickhouse-driver (0.2.5-1) unstable; urgency=medium
* New upstream release
diff --git a/debian/patches/insert-doc-build-path.patch b/debian/patches/insert-doc-build-path.patch
index 523a6f2..b8e68be 100644
--- a/debian/patches/insert-doc-build-path.patch
+++ b/debian/patches/insert-doc-build-path.patch
@@ -2,8 +2,10 @@ Author: Federico Ceratto <federico@debian.org>
Description: add Python import paths to conf.py
Forwarded: not-needed
---- a/docs/conf.py
-+++ b/docs/conf.py
+Index: python-clickhouse-driver.git/docs/conf.py
+===================================================================
+--- python-clickhouse-driver.git.orig/docs/conf.py
++++ python-clickhouse-driver.git/docs/conf.py
@@ -16,6 +16,11 @@
# import sys
# sys.path.insert(0, os.path.abspath('.'))
diff --git a/debian/patches/remove-sphinx-github-iframe.patch b/debian/patches/remove-sphinx-github-iframe.patch
index 0f00aa5..cf52d28 100644
--- a/debian/patches/remove-sphinx-github-iframe.patch
+++ b/debian/patches/remove-sphinx-github-iframe.patch
@@ -2,9 +2,11 @@ Author: Federico Ceratto <federico@debian.org>
Description: remove Sign In button from main HTML page
Forwarded: not-needed
---- a/docs/conf.py
-+++ b/docs/conf.py
-@@ -103,8 +103,6 @@
+Index: python-clickhouse-driver.git/docs/conf.py
+===================================================================
+--- python-clickhouse-driver.git.orig/docs/conf.py
++++ python-clickhouse-driver.git/docs/conf.py
+@@ -103,8 +103,6 @@ html_logo = ""
html_static_path = ['_static']
html_theme_options = {
"description": "Python driver for ClickHouse",
diff --git a/docs/Makefile b/docs/Makefile
index 298ea9e..22b28bd 100644
--- a/docs/Makefile
+++ b/docs/Makefile
@@ -2,7 +2,7 @@
#
# You can set these variables from the command line.
-SPHINXOPTS =
+SPHINXOPTS = -W
SPHINXBUILD = sphinx-build
SOURCEDIR = .
BUILDDIR = _build
diff --git a/docs/conf.py b/docs/conf.py
index d4cc8b3..6360b93 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -66,7 +66,7 @@ master_doc = 'index'
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
-language = None
+language = 'en'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
diff --git a/docs/development.rst b/docs/development.rst
index 6aff38b..eb00605 100644
--- a/docs/development.rst
+++ b/docs/development.rst
@@ -45,13 +45,13 @@ Create container desired version of ``clickhouse-server``:
.. code-block:: bash
- docker run --rm -e "TZ=Europe/Moscow" -p 127.0.0.1:9000:9000 --name test-clickhouse-server yandex/clickhouse-server:$VERSION
+ docker run --rm -e "TZ=Europe/Moscow" -p 127.0.0.1:9000:9000 --name test-clickhouse-server clickhouse/clickhouse-server:$VERSION
Create container with the same version of ``clickhouse-client``:
.. code-block:: bash
- docker run --rm --entrypoint "/bin/sh" --name test-clickhouse-client --link test-clickhouse-server:clickhouse-server yandex/clickhouse-client:$VERSION -c 'while :; do sleep 1; done'
+ docker run --rm --entrypoint "/bin/sh" --name test-clickhouse-client --link test-clickhouse-server:clickhouse-server clickhouse/clickhouse-client:$VERSION -c 'while :; do sleep 1; done'
Create ``clickhouse-client`` script on your host machine:
diff --git a/docs/features.rst b/docs/features.rst
index e1abe23..0154c67 100644
--- a/docs/features.rst
+++ b/docs/features.rst
@@ -279,6 +279,8 @@ of hosts if previous host is unavailable:
All queries within established connection will be sent to the same host.
+*New in version 0.2.5.*
+
You can specify `round_robin` parameter alongside with `alt_hosts`. The host
for query execution will be picked with round-robin algorithm.
@@ -544,3 +546,43 @@ Each Client instance can be used as a context manager:
Upon exit, any established connection to the ClickHouse server will be closed
automatically.
+
+
+TCP keepalive
+-------------
+
+*New in version 0.2.6.*
+
+You can enable `TCP keepalive
+<https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html>`_ on connection with
+ClickHouse server. This setting is disabled by default. When parameter
+``tcp_keepalive`` is set to ``True`` system TCP keepalive settings are used.
+
+ .. code-block:: python
+
+ >>> client = Client('localhost', tcp_keepalive=True)
+
+
+For Linux default TCP keepalive settings can be found in:
+
+ .. code-block:: bash
+
+ idle_time_sec - /proc/sys/net/ipv4/tcp_keepalive_time
+ interval_sec - /proc/sys/net/ipv4/tcp_keepalive_intvl
+ probes - /proc/sys/net/ipv4/tcp_keepalive_probes
+
+You can also specify custom keepalive settings with tuple
+``(idle_time_sec, interval_sec, probes)``:
+
+ .. code-block:: python
+
+ >>> client = Client('localhost', tcp_keepalive=(60.5, 5.1, 2))
+
+ .. note::
+
+ For Linux and Windows all parameters: idle time, interval and probes
+ can be changed for socket.
+
+ For Mac OS only the second parameter ``interval_sec`` can be changed
+ for socket. ``idle_time_sec``, ``probes`` are not used, but should be
+ specified for uniformity.
diff --git a/docs/installation.rst b/docs/installation.rst
index f54ff52..4b6b659 100644
--- a/docs/installation.rst
+++ b/docs/installation.rst
@@ -6,7 +6,7 @@ Installation
Python Version
--------------
-Clickhouse-driver supports Python 3.4 and newer and PyPy.
+Clickhouse-driver supports Python 3.7 and newer and PyPy.
Build Dependencies
------------------
@@ -21,9 +21,9 @@ Example for `python:alpine` docker image:
By default there are wheels for Linux, Mac OS X and Windows.
-Packages for Linux and Mac OS X are available for python: 3.6 -- 3.11.
+Packages for Linux and Mac OS X are available for python: 3.7 -- 3.11.
-Packages for Windows are available for python: 3.6 -- 3.11.
+Packages for Windows are available for python: 3.7 -- 3.11.
Starting from version *0.2.3* there are wheels for musl-based Linux distributions.
diff --git a/docs/misc.rst b/docs/misc.rst
index 8edccb9..8545295 100644
--- a/docs/misc.rst
+++ b/docs/misc.rst
@@ -139,3 +139,19 @@ enable it if you want cast ``None`` value into default value for current type:
>>> settings = {'input_format_null_as_default': True}
>>> client = Client('localhost', settings=settings)
+
+
+Client revision downgrading
+---------------------------
+
+*New in version 0.2.6.*
+
+For various purposes client can be downgraded with ``client_revision``
+parameter.
+
+ .. code-block:: python
+
+ >>> from clickhouse_driver import Client, defines
+ >>>
+ >>> client = Client('localhost', client_revision=defines.DBMS_MIN_PROTOCOL_VERSION_WITH_INITIAL_QUERY_START_TIME)
+ >>> client.execute('SELECT version()')
diff --git a/docs/performance.rst b/docs/performance.rst
index 93ce694..be240e4 100644
--- a/docs/performance.rst
+++ b/docs/performance.rst
@@ -164,7 +164,7 @@ Required packages
.. code-block:: bash
- pip install clickhouse-driver requests
+ pip install clickhouse-driver requests clickhouse-connect
For fast json parsing we'll use ``ujson`` package:
@@ -175,22 +175,33 @@ For fast json parsing we'll use ``ujson`` package:
Installed packages: ::
$ pip freeze
- certifi==2020.4.5.1
- chardet==3.0.4
- clickhouse-driver==0.1.3
- idna==2.9
- pytz==2019.3
- requests==2.23.0
- tzlocal==2.0.0
- ujson==2.0.3
- urllib3==1.25.9
+ backports.zoneinfo==0.2.1
+ certifi==2022.12.7
+ charset-normalizer==3.0.1
+ clickhouse-connect==0.5.0
+ clickhouse-driver==0.2.5
+ idna==3.4
+ lz4==4.3.2
+ pytz==2022.7.1
+ pytz-deprecation-shim==0.1.0.post0
+ requests==2.28.2
+ tzdata==2022.7
+ tzlocal==4.2
+ ujson==5.7.0
+ urllib3==1.26.14
+ zstandard==0.19.0
+
+For ``clickhouse-connect`` we need to turn off compression with
+``compress=False`` for elimination decompression overhead. This package also
+adds ``LIMIT`` clause to the query by default.
+Let's disable it off with ``query_limit=None``.
Versions
--------
-Machine: Linux ThinkPad-T460 4.4.0-177-generic #207-Ubuntu SMP Mon Mar 16 01:16:10 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
+Machine: Linux klebedev-ThinkPad-T460 5.15.0-57-generic #63-Ubuntu SMP Thu Nov 24 13:43:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
-Python: CPython 3.6.5 (default, May 30 2019, 14:48:31) [GCC 5.4.0 20160609]
+Python: Python 3.8.12 (default, Apr 13 2022, 21:16:23) [GCC 11.2.0]
Benchmarking
@@ -247,13 +258,8 @@ Let's take get plain text response from ClickHouse server as baseline.
Fetching not parsed data with pure requests (1)
-.. code-block:: python
-
- import sys
- import requests
-
- query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT {}".format(sys.argv[1], sys.argv[2])
- data = requests.get('http://localhost:8123/', params={'query': query})
+.. literalinclude:: ../perf/script_01.py
+ :language: python
Parsed rows
@@ -261,77 +267,28 @@ Parsed rows
Line split into elements will be consider as "parsed" for TSV format (2)
-.. code-block:: python
-
- import sys
- import requests
-
- query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(sys.argv[1])
- resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
-
- data = [line.decode('utf-8').split('\t') for line in resp.iter_lines(chunk_size=10000)]
-
+.. literalinclude:: ../perf/script_02.py
+ :language: python
Now we cast each element to it's data type (2.5)
-.. code-block:: python
-
- from datetime import date
- import sys
- import requests
-
-
- def get_python_type(ch_type):
- if ch_type.startswith('Int') or ch_type.startswith('UInt'):
- return int
-
- elif ch_type == 'String' or ch_type.startswith('FixedString'):
- return None
-
- elif ch_type == 'Date':
- return lambda value: date(*[int(x) for x in value.split('-')])
-
- raise ValueError(f'Unsupported type: "{ch_type}"')
-
-
- resp = requests.get('http://localhost:8123', params={'query': 'describe table perftest.ontime FORMAT TSV'})
- ch_types = [x.split('\t')[1] for x in resp.text.split('\n') if x]
- python_types = [get_python_type(x) for x in ch_types]
-
- query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(sys.argv[1])
- resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
-
- data = []
-
- for line in resp.iter_lines(chunk_size=10000):
- data.append([cls(x) if cls else x for x, cls in zip(line.decode('utf-8').split('\t'), python_types)])
-
+.. literalinclude:: ../perf/script_02_5.py
+ :language: python
JSONEachRow format can be loaded with json loads (3)
-.. code-block:: python
-
- import sys
- import requests
- from ujson import loads
-
- query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT JSONEachRow".format(sys.argv[1])
- resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
-
- data = [list(loads(line).values()) for line in resp.iter_lines(chunk_size=10000)]
-
+.. literalinclude:: ../perf/script_03.py
+ :language: python
Get fully parsed rows with ``clickhouse-driver`` in Native format (4)
-.. code-block:: python
+.. literalinclude:: ../perf/script_04.py
+ :language: python
- import sys
- from clickhouse_driver import Client
+Get fully parsed rows with ``clickhouse-connect`` (14)
- query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}'".format(sys.argv[1])
- client = Client.from_url('clickhouse://localhost')
-
- data = client.execute(query)
+.. literalinclude:: ../perf/script_14.py
+ :language: python
Iteration over rows
@@ -339,78 +296,28 @@ Iteration over rows
Iteration over TSV (5)
-.. code-block:: python
-
- import sys
- import requests
-
- query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(sys.argv[1])
- resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
-
- for line in resp.iter_lines(chunk_size=10000):
- line = line.decode('utf-8').split('\t')
-
+.. literalinclude:: ../perf/script_05.py
+ :language: python
Now we cast each element to it's data type (5.5)
-.. code-block:: python
-
- from datetime import date
- import sys
- import requests
-
-
- def get_python_type(ch_type):
- if ch_type.startswith('Int') or ch_type.startswith('UInt'):
- return int
-
- elif ch_type == 'String' or ch_type.startswith('FixedString'):
- return None
-
- elif ch_type == 'Date':
- return lambda value: date(*[int(x) for x in value.split('-')])
-
- raise ValueError(f'Unsupported type: "{ch_type}"')
-
-
- resp = requests.get('http://localhost:8123', params={'query': 'describe table perftest.ontime FORMAT TSV'})
- ch_types = [x.split('\t')[1] for x in resp.text.split('\n') if x]
- python_types = [get_python_type(x) for x in ch_types]
-
- query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(sys.argv[1])
- resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
-
- for line in resp.iter_lines(chunk_size=10000):
- line = [cls(x) if cls else x for x, cls in zip(line.decode('utf-8').split('\t'), python_types)]
-
+.. literalinclude:: ../perf/script_05_5.py
+ :language: python
Iteration over JSONEachRow (6)
-.. code-block:: python
-
- import sys
- import requests
- from ujson import loads
-
- query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT JSONEachRow".format(sys.argv[1])
- resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
-
- for line in resp.iter_lines(chunk_size=10000):
- line = list(loads(line).values())
-
+.. literalinclude:: ../perf/script_06.py
+ :language: python
Iteration over rows with ``clickhouse-driver`` in Native format (7)
-.. code-block:: python
-
- import sys
- from clickhouse_driver import Client
+.. literalinclude:: ../perf/script_07.py
+ :language: python
- query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}'".format(sys.argv[1])
- client = Client.from_url('clickhouse://localhost')
+Iteration over rows with ``clickhouse-connect`` (17)
- for row in client.execute_iter(query):
- pass
+.. literalinclude:: ../perf/script_17.py
+ :language: python
Iteration over string rows
@@ -420,84 +327,23 @@ OK, but what if we need only string columns?
Iteration over TSV (8)
-.. code-block:: python
-
- import sys
- import requests
-
- cols = [
- 'UniqueCarrier', 'Carrier', 'TailNum', 'FlightNum', 'Origin', 'OriginCityName', 'OriginState',
- 'OriginStateFips', 'OriginStateName', 'Dest', 'DestCityName', 'DestState', 'DestStateFips',
- 'DestStateName', 'DepartureDelayGroups', 'DepTimeBlk', 'ArrTimeBlk', 'CancellationCode',
- 'FirstDepTime', 'TotalAddGTime', 'LongestAddGTime', 'DivAirportLandings', 'DivReachedDest',
- 'DivActualElapsedTime', 'DivArrDelay', 'DivDistance', 'Div1Airport', 'Div1WheelsOn', 'Div1TotalGTime',
- 'Div1LongestGTime', 'Div1WheelsOff', 'Div1TailNum', 'Div2Airport', 'Div2WheelsOn', 'Div2TotalGTime',
- 'Div2LongestGTime', 'Div2WheelsOff', 'Div2TailNum', 'Div3Airport', 'Div3WheelsOn', 'Div3TotalGTime',
- 'Div3LongestGTime', 'Div3WheelsOff', 'Div3TailNum', 'Div4Airport', 'Div4WheelsOn', 'Div4TotalGTime',
- 'Div4LongestGTime', 'Div4WheelsOff', 'Div4TailNum', 'Div5Airport', 'Div5WheelsOn', 'Div5TotalGTime',
- 'Div5LongestGTime', 'Div5WheelsOff', 'Div5TailNum'
- ]
-
- query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(', '.join(cols), sys.argv[1])
- resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
-
- for line in resp.iter_lines(chunk_size=10000):
- line = line.decode('utf-8').split('\t')
-
+.. literalinclude:: ../perf/script_08.py
+ :language: python
Iteration over JSONEachRow (9)
-.. code-block:: python
-
- import sys
- import requests
- from ujson import loads
-
- cols = [
- 'UniqueCarrier', 'Carrier', 'TailNum', 'FlightNum', 'Origin', 'OriginCityName', 'OriginState',
- 'OriginStateFips', 'OriginStateName', 'Dest', 'DestCityName', 'DestState', 'DestStateFips',
- 'DestStateName', 'DepartureDelayGroups', 'DepTimeBlk', 'ArrTimeBlk', 'CancellationCode',
- 'FirstDepTime', 'TotalAddGTime', 'LongestAddGTime', 'DivAirportLandings', 'DivReachedDest',
- 'DivActualElapsedTime', 'DivArrDelay', 'DivDistance', 'Div1Airport', 'Div1WheelsOn', 'Div1TotalGTime',
- 'Div1LongestGTime', 'Div1WheelsOff', 'Div1TailNum', 'Div2Airport', 'Div2WheelsOn', 'Div2TotalGTime',
- 'Div2LongestGTime', 'Div2WheelsOff', 'Div2TailNum', 'Div3Airport', 'Div3WheelsOn', 'Div3TotalGTime',
- 'Div3LongestGTime', 'Div3WheelsOff', 'Div3TailNum', 'Div4Airport', 'Div4WheelsOn', 'Div4TotalGTime',
- 'Div4LongestGTime', 'Div4WheelsOff', 'Div4TailNum', 'Div5Airport', 'Div5WheelsOn', 'Div5TotalGTime',
- 'Div5LongestGTime', 'Div5WheelsOff', 'Div5TailNum'
- ]
-
- query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}' FORMAT JSONEachRow".format(', '.join(cols), sys.argv[1])
- resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
-
- for line in resp.iter_lines(chunk_size=10000):
- line = list(loads(line).values())
-
+.. literalinclude:: ../perf/script_09.py
+ :language: python
Iteration over string rows with ``clickhouse-driver`` in Native format (10)
-.. code-block:: python
+.. literalinclude:: ../perf/script_10.py
+ :language: python
- import sys
- from clickhouse_driver import Client
+Iteration over string rows with ``clickhouse-connect`` (15)
- cols = [
- 'UniqueCarrier', 'Carrier', 'TailNum', 'FlightNum', 'Origin', 'OriginCityName', 'OriginState',
- 'OriginStateFips', 'OriginStateName', 'Dest', 'DestCityName', 'DestState', 'DestStateFips',
- 'DestStateName', 'DepartureDelayGroups', 'DepTimeBlk', 'ArrTimeBlk', 'CancellationCode',
- 'FirstDepTime', 'TotalAddGTime', 'LongestAddGTime', 'DivAirportLandings', 'DivReachedDest',
- 'DivActualElapsedTime', 'DivArrDelay', 'DivDistance', 'Div1Airport', 'Div1WheelsOn', 'Div1TotalGTime',
- 'Div1LongestGTime', 'Div1WheelsOff', 'Div1TailNum', 'Div2Airport', 'Div2WheelsOn', 'Div2TotalGTime',
- 'Div2LongestGTime', 'Div2WheelsOff', 'Div2TailNum', 'Div3Airport', 'Div3WheelsOn', 'Div3TotalGTime',
- 'Div3LongestGTime', 'Div3WheelsOff', 'Div3TailNum', 'Div4Airport', 'Div4WheelsOn', 'Div4TotalGTime',
- 'Div4LongestGTime', 'Div4WheelsOff', 'Div4TailNum', 'Div5Airport', 'Div5WheelsOn', 'Div5TotalGTime',
- 'Div5LongestGTime', 'Div5WheelsOff', 'Div5TailNum'
- ]
-
- query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}'".format(', '.join(cols), sys.argv[1])
- client = Client.from_url('clickhouse://localhost')
-
- for row in client.execute_iter(query):
- pass
+.. literalinclude:: ../perf/script_15.py
+ :language: python
Iteration over int rows
@@ -505,78 +351,23 @@ Iteration over int rows
Iteration over TSV (11)
-.. code-block:: python
-
- import sys
- import requests
-
- cols = [
- 'Year', 'Quarter', 'Month', 'DayofMonth', 'DayOfWeek', 'AirlineID', 'OriginAirportID', 'OriginAirportSeqID',
- 'OriginCityMarketID', 'OriginWac', 'DestAirportID', 'DestAirportSeqID', 'DestCityMarketID', 'DestWac',
- 'CRSDepTime', 'DepTime', 'DepDelay', 'DepDelayMinutes', 'DepDel15', 'TaxiOut', 'WheelsOff', 'WheelsOn',
- 'TaxiIn', 'CRSArrTime', 'ArrTime', 'ArrDelay', 'ArrDelayMinutes', 'ArrDel15', 'ArrivalDelayGroups',
- 'Cancelled', 'Diverted', 'CRSElapsedTime', 'ActualElapsedTime', 'AirTime', 'Flights', 'Distance',
- 'DistanceGroup', 'CarrierDelay', 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay',
- 'Div1AirportID', 'Div1AirportSeqID', 'Div2AirportID', 'Div2AirportSeqID', 'Div3AirportID',
- 'Div3AirportSeqID', 'Div4AirportID', 'Div4AirportSeqID', 'Div5AirportID', 'Div5AirportSeqID'
- ]
-
- query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(', '.join(cols), sys.argv[1])
- resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
-
- for line in resp.iter_lines(chunk_size=10000):
- line = [int(x) for x in line.split(b'\t')]
-
+.. literalinclude:: ../perf/script_11.py
+ :language: python
Iteration over JSONEachRow (12)
-.. code-block:: python
-
- import sys
- import requests
- from ujson import loads
-
- cols = [
- 'Year', 'Quarter', 'Month', 'DayofMonth', 'DayOfWeek', 'AirlineID', 'OriginAirportID', 'OriginAirportSeqID',
- 'OriginCityMarketID', 'OriginWac', 'DestAirportID', 'DestAirportSeqID', 'DestCityMarketID', 'DestWac',
- 'CRSDepTime', 'DepTime', 'DepDelay', 'DepDelayMinutes', 'DepDel15', 'TaxiOut', 'WheelsOff', 'WheelsOn',
- 'TaxiIn', 'CRSArrTime', 'ArrTime', 'ArrDelay', 'ArrDelayMinutes', 'ArrDel15', 'ArrivalDelayGroups',
- 'Cancelled', 'Diverted', 'CRSElapsedTime', 'ActualElapsedTime', 'AirTime', 'Flights', 'Distance',
- 'DistanceGroup', 'CarrierDelay', 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay',
- 'Div1AirportID', 'Div1AirportSeqID', 'Div2AirportID', 'Div2AirportSeqID', 'Div3AirportID',
- 'Div3AirportSeqID', 'Div4AirportID', 'Div4AirportSeqID', 'Div5AirportID', 'Div5AirportSeqID'
- ]
-
- query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}' FORMAT JSONEachRow".format(', '.join(cols), sys.argv[1])
- resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
-
- for line in resp.iter_lines(chunk_size=10000):
- line = list(loads(line).values())
-
+.. literalinclude:: ../perf/script_12.py
+ :language: python
Iteration over int rows with ``clickhouse-driver`` in Native format (13)
-.. code-block:: python
-
- import sys
- from clickhouse_driver import Client
+.. literalinclude:: ../perf/script_13.py
+ :language: python
- cols = [
- 'Year', 'Quarter', 'Month', 'DayofMonth', 'DayOfWeek', 'AirlineID', 'OriginAirportID', 'OriginAirportSeqID',
- 'OriginCityMarketID', 'OriginWac', 'DestAirportID', 'DestAirportSeqID', 'DestCityMarketID', 'DestWac',
- 'CRSDepTime', 'DepTime', 'DepDelay', 'DepDelayMinutes', 'DepDel15', 'TaxiOut', 'WheelsOff', 'WheelsOn',
- 'TaxiIn', 'CRSArrTime', 'ArrTime', 'ArrDelay', 'ArrDelayMinutes', 'ArrDel15', 'ArrivalDelayGroups',
- 'Cancelled', 'Diverted', 'CRSElapsedTime', 'ActualElapsedTime', 'AirTime', 'Flights', 'Distance',
- 'DistanceGroup', 'CarrierDelay', 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay',
- 'Div1AirportID', 'Div1AirportSeqID', 'Div2AirportID', 'Div2AirportSeqID', 'Div3AirportID',
- 'Div3AirportSeqID', 'Div4AirportID', 'Div4AirportSeqID', 'Div5AirportID', 'Div5AirportSeqID'
- ]
+Iteration over int rows with ``clickhouse-connect`` (16)
- query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}'".format(', '.join(cols), sys.argv[1])
- client = Client.from_url('clickhouse://localhost')
-
- for row in client.execute_iter(query):
- pass
+.. literalinclude:: ../perf/script_16.py
+ :language: python
Results
@@ -595,87 +386,103 @@ JSON in table is shorthand for JSONEachRow.
+==================================+===========+===========+===========+===========+===========+
|**Plain text without parsing: timing** |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|Naive requests.get TSV (1) | 0.40 s | 0.67 s | 0.95 s | 1.67 s | 2.52 s |
+|Naive requests.get TSV (1) | 0.35 s | 0.56 s | 0.83 s | 1.15 s | 1.72 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|Naive requests.get JSON (1) | 0.61 s | 1.23 s | 2.09 s | 3.52 s | 5.20 s |
+|Naive requests.get JSON (1) | 0.99 s | 1.80 s | 2.77 s | 5.15 s | 7.80 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
|**Plain text without parsing: memory** |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|Naive requests.get TSV (1) | 49 MB | 107 MB | 165 MB | 322 MB | 488 MB |
+|Naive requests.get TSV (1) | 52 MB | 110 MB | 167 MB | 323 MB | 489 MB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|Naive requests.get JSON (1) | 206 MB | 564 MB | 916 MB | 1.83 GB | 2.83 GB |
+|Naive requests.get JSON (1) | 263 MB | 726 MB | 1.88 GB | 2.42 GB | 3.75 GB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
|**Parsed rows: timing** |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get TSV (2) | 0.81 s | 1.81 s | 3.09 s | 7.22 s | 11.87 s |
+|requests.get TSV (2) | 0.83 s | 1.97 s | 3.32 s | 7.90 s | 13.13 s |
++----------------------------------+-----------+-----------+-----------+-----------+-----------+
+|requests.get TSV with cast (2.5) | 1.59 s | 4.31 s | 6.99 s | 15.60 s | 25.89 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get TSV with cast (2.5) | 1.78 s | 4.58 s | 7.42 s | 16.12 s | 25.52 s |
+|requests.get JSON (3) | 2.78 s | 5.55 s | 9.23 s | 21.45 s | 31.50 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get JSON (3) | 2.14 s | 5.65 s | 9.20 s | 20.43 s | 31.72 s |
+|clickhouse-driver Native (4) | 0.87 s | 1.50 s | 2.21 s | 4.20 s | 6.32 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|clickhouse-driver Native (4) | 0.73 s | 1.40 s | 2.08 s | 4.03 s | 6.20 s |
+|clickhouse-connect (14) | 0.89 s | 1.72 s | 2.46 s | 4.85 s | 7.19 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
|**Parsed rows: memory** |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get TSV (2) | 171 MB | 462 MB | 753 MB | 1.51 GB | 2.33 GB |
+|requests.get TSV (2) | 182 MB | 487 MB | 794 MB | 1.63 GB | 2.51 GB |
++----------------------------------+-----------+-----------+-----------+-----------+-----------+
+|requests.get TSV with cast (2.5) | 138 MB | 359 MB | 579 MB | 1.18 GB | 1.82 GB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get TSV with cast (2.5) | 135 MB | 356 MB | 576 MB | 1.15 GB | 1.78 GB |
+|requests.get JSON (3) | 136 MB | 351 MB | 565 MB | 1.15 GB | 1.77 GB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get JSON (3) | 139 MB | 366 MB | 591 MB | 1.18 GB | 1.82 GB |
+|clickhouse-driver Native (4) | 155 MB | 343 MB | 530 MB | 1.04 GB | 1.58 GB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|clickhouse-driver Native (4) | 135 MB | 337 MB | 535 MB | 1.05 GB | 1.62 GB |
+|clickhouse-connect (14) | 139 MB | 333 MB | 524 MB | 1.05 GB | 1.61 GB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
|**Iteration over rows: timing** |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get TSV (5) | 0.49 s | 0.99 s | 1.34 s | 2.58 s | 4.00 s |
+|requests.get TSV (5) | 0.48 s | 0.91 s | 1.28 s | 2.57 s | 3.72 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get TSV with cast (5.5) | 1.38 s | 3.38 s | 5.40 s | 10.89 s | 16.59 s |
+|requests.get TSV with cast (5.5) | 1.25 s | 3.05 s | 4.77 s | 9.67 s | 15.04 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get JSON (6) | 1.89 s | 4.73 s | 7.63 s | 15.63 s | 24.60 s |
+|requests.get JSON (6) | 1.80 s | 4.48 s | 7.10 s | 14.45 s | 22.17 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|clickhouse-driver Native (7) | 0.62 s | 1.28 s | 1.93 s | 3.68 s | 5.54 s |
+|clickhouse-driver Native (7) | 0.72 s | 1.38 s | 2.01 s | 3.65 s | 5.45 s |
++----------------------------------+-----------+-----------+-----------+-----------+-----------+
+|clickhouse-connect (17) | 0.85 s | 1.62 s | 2.12 s | 4.12 s | 6.05 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
|**Iteration over rows: memory** |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get TSV (5) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
+|requests.get TSV (5) | 22 MB | 22 MB | 22 MB | 22 MB | 22 MB |
++----------------------------------+-----------+-----------+-----------+-----------+-----------+
+|requests.get TSV with cast (5.5) | 22 MB | 22 MB | 22 MB | 22 MB | 22 MB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get TSV with cast (5.5) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
+|requests.get JSON (6) | 24 MB | 24 MB | 24 MB | 24 MB | 24 MB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get JSON (6) | 20 MB | 20 MB | 20 MB | 20 MB | 20 MB |
+|clickhouse-driver Native (7) | 91 MB | 93 MB | 93 MB | 94 MB | 94 MB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|clickhouse-driver Native (7) | 56 MB | 70 MB | 71 MB | 71 MB | 71 MB |
+|clickhouse-connect (17) | 68 MB | 68 MB | 68 MB | 68 MB | 68 MB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
|**Iteration over string rows: timing** |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get TSV (8) | 0.40 s | 0.67 s | 0.80 s | 1.55 s | 2.18 s |
+|requests.get TSV (8) | 0.44 s | 0.57 s | 0.77 s | 1.40 s | 1.94 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get JSON (9) | 1.14 s | 2.64 s | 4.22 s | 8.48 s | 12.96 s |
+|requests.get JSON (9) | 1.03 s | 2.46 s | 3.87 s | 7.76 s | 11.96 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|clickhouse-driver Native (10) | 0.46 s | 0.91 s | 1.35 s | 2.49 s | 3.67 s |
+|clickhouse-driver Native (10) | 0.63 s | 1.06 s | 1.44 s | 2.45 s | 3.57 s |
++----------------------------------+-----------+-----------+-----------+-----------+-----------+
+|clickhouse-connect (15) | 0.62 s | 1.13 s | 1.53 s | 2.84 s | 4.00 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
|**Iteration over string rows: memory** |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get TSV (8) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
+|requests.get TSV (8) | 22 MB | 22 MB | 22 MB | 22 MB | 22 MB |
++----------------------------------+-----------+-----------+-----------+-----------+-----------+
+|requests.get JSON (9) | 24 MB | 24 MB | 24 MB | 24 MB | 24 MB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get JSON (9) | 20 MB | 20 MB | 20 MB | 20 MB | 20 MB |
+|clickhouse-driver Native (10) | 77 MB | 79 MB | 79 MB | 79 MB | 79 MB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|clickhouse-driver Native (10) | 46 MB | 56 MB | 57 MB | 57 MB | 57 MB |
+|clickhouse-connect (15) | 60 MB | 60 MB | 60 MB | 60 MB | 60 MB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
|**Iteration over int rows: timing** |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get TSV (11) | 0.84 s | 2.06 s | 3.22 s | 6.27 s | 10.06 s |
+|requests.get TSV (11) | 0.81 s | 1.66 s | 2.61 s | 5.08 s | 7.91 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get JSON (12) | 0.95 s | 2.15 s | 3.55 s | 6.93 s | 10.82 s |
+|requests.get JSON (12) | 0.97 s | 2.02 s | 3.29 s | 6.50 s | 10.00 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|clickhouse-driver Native (13) | 0.43 s | 0.61 s | 0.86 s | 1.53 s | 2.27 s |
+|clickhouse-driver Native (13) | 0.55 s | 0.78 s | 1.02 s | 1.73 s | 2.44 s |
++----------------------------------+-----------+-----------+-----------+-----------+-----------+
+|clickhouse-connect (16) | 0.54 s | 0.79 s | 1.01 s | 1.68 s | 2.20 s |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
|**Iteration over int rows: memory** |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get TSV (11) | 19 MB | 19 MB | 19 MB | 19 MB | 19 MB |
+|requests.get TSV (11) | 22 MB | 22 MB | 22 MB | 22 MB | 22 MB |
++----------------------------------+-----------+-----------+-----------+-----------+-----------+
+|requests.get JSON (12) | 24 MB | 24 MB | 24 MB | 24 MB | 24 MB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|requests.get JSON (12) | 20 MB | 20 MB | 20 MB | 20 MB | 20 MB |
+|clickhouse-driver Native (13) | 71 MB | 72 MB | 72 MB | 73 MB | 73 MB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
-|clickhouse-driver Native (13) | 41 MB | 48 MB | 48 MB | 48 MB | 49 MB |
+|clickhouse-connect (16) | 41 MB | 41 MB | 41 MB | 41 MB | 41 MB |
+----------------------------------+-----------+-----------+-----------+-----------+-----------+
diff --git a/docs/quickstart.rst b/docs/quickstart.rst
index 4f8b289..a0dcb1a 100644
--- a/docs/quickstart.rst
+++ b/docs/quickstart.rst
@@ -69,6 +69,16 @@ with ``%`` symbol and ``%(myvar)s`` parameters.
NOTE: formatting queries using Python's f-strings or concatenation can lead to SQL injections.
Use ``%(myvar)s`` parameters instead.
+Server-side parameters supported since 0.2.6 version.
+ You can mix client-side and server-side formatting in one query:
+
+ .. code-block:: python
+
+ >>> client.execute(
+ ... "SELECT 'test' like '%%es%%', %(myvar)s, {myvar:Int32}",
+ ... {'myvar': 1}
+ ... )
+
Customisation ``SELECT`` output with ``FORMAT`` clause is not supported.
.. _execute-with-progress:
diff --git a/docs/types.rst b/docs/types.rst
index 84bf268..28850da 100644
--- a/docs/types.rst
+++ b/docs/types.rst
@@ -56,9 +56,9 @@ Due to Python's current limitations minimal DateTime64 resolution is one microse
String/FixedString(N)
---------------------
-INSERT types: :class:`str`/:func:`basestring <basestring>`, :class:`bytes`. See note below.
+INSERT types: :class:`str`, :class:`bytes`. See note below.
-SELECT type: :class:`str`/:func:`basestring <basestring>`, :class:`bytes`. See note below.
+SELECT type: :class:`str`, :class:`bytes`. See note below.
String column is encoded/decoded with encoding specified by ``strings_encoding`` setting. Default encoding is UTF-8.
@@ -97,9 +97,9 @@ a ``UnicodeEncodeError`` will be raised.
Enum8/16
--------
-INSERT types: :class:`~enum.Enum`, :class:`int`, :class:`long`, :class:`str`/:func:`basestring <basestring>`.
+INSERT types: :class:`~enum.Enum`, :class:`int`, :class:`long`, :class:`str`.
-SELECT type: :class:`str`/:func:`basestring <basestring>`.
+SELECT type: :class:`str`.
.. code-block:: python
@@ -180,7 +180,7 @@ SELECT type: :class:`bool`.
UUID
----
-INSERT types: :class:`str`/:func:`basestring <basestring>`, :class:`~uuid.UUID`.
+INSERT types: :class:`str`, :class:`~uuid.UUID`.
SELECT type: :class:`~uuid.UUID`.
@@ -207,7 +207,7 @@ IPv4/IPv6
*New in version 0.0.19.*
-INSERT types: :class:`~ipaddress.IPv4Address`/:class:`~ipaddress.IPv6Address`, :class:`int`, :class:`long`, :class:`str`/:func:`basestring <basestring>`.
+INSERT types: :class:`~ipaddress.IPv4Address`/:class:`~ipaddress.IPv6Address`, :class:`int`, :class:`long`, :class:`str`.
SELECT type: :class:`~ipaddress.IPv4Address`/:class:`~ipaddress.IPv6Address`.
@@ -282,6 +282,52 @@ INSERT types: :class:`list`, :class:`tuple`.
SELECT type: :class:`tuple`.
+.. note::
+
+ Currently, for ClickHouse server 23.3.1, JSON column ``Object('json')``
+ and **namedtuple** column ``Tuple(b Int8)`` have the same binary
+ representation. There is no way to distinct one column from another without
+ additional inspection like ``DESCRIBE TABLE`` `query
+ <https://github.com/ClickHouse/ClickHouse/issues/48822>`_. But this will
+ not work for complicated queries with joins.
+
+ To interpret ClickHouse namedtuple column alongside with
+ ``allow_experimental_object_type=1`` as Python tuple set
+ ``namedtuple_as_json`` setting to ``False``.
+
+ .. code-block:: python
+
+ client.execute(..., settings={'namedtuple_as_json': False})
+
+ .. code-block:: sql
+
+ CREATE TABLE test (
+ a Tuple(b Int8),
+ c Object('json')
+ ) ENGINE = Memory
+
+ INSERT INTO test VALUES ((1), '{"x": 2}');
+
+ .. code-block:: python
+
+ >>> client.execute('SELECT * FROM test')
+ [((1,), (2,))]
+
+ >>> client.execute(
+ ... 'SELECT * FROM test',
+ ... settings={'allow_experimental_object_type': 1}
+ ... )
+ [({'b': 1}, {'x': 2})]
+
+ >>> client.execute(
+ ... 'SELECT * FROM test',
+ ... settings={
+ ... 'allow_experimental_object_type': 1,
+ ... 'namedtuple_as_json': False
+ ... }
+ ... )
+ [((1,), (2,))]
+
Nested(flatten_nested=1, default)
---------------------------------
@@ -422,3 +468,18 @@ These types are just aliases:
* Ring: Array(Point)
* Polygon: Array(Ring)
* MultiPolygon: Array(Polygon)
+
+
+Object('json')
+--------------
+
+*New in version 0.2.6.*
+
+INSERT types: :class:`dict`.
+
+SELECT type: :class:`dict`, :class:`str`.
+
+``orjson`` and ``ujson`` implementations are supported for dumping data into
+json during ``INSERT``.
+
+Set ``allow_experimental_object_type=1`` for to enable json support.
diff --git a/perf/script_01.py b/perf/script_01.py
new file mode 100644
index 0000000..edc0889
--- /dev/null
+++ b/perf/script_01.py
@@ -0,0 +1,5 @@
+import sys
+import requests
+
+query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT {}".format(sys.argv[1], sys.argv[2])
+data = requests.get('http://localhost:8123/', params={'query': query})
diff --git a/perf/script_02.py b/perf/script_02.py
new file mode 100644
index 0000000..b74d0a1
--- /dev/null
+++ b/perf/script_02.py
@@ -0,0 +1,7 @@
+import sys
+import requests
+
+query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(sys.argv[1])
+resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
+
+data = [line.decode('utf-8').split('\t') for line in resp.iter_lines(chunk_size=10000)]
diff --git a/perf/script_02_5.py b/perf/script_02_5.py
new file mode 100644
index 0000000..5a53065
--- /dev/null
+++ b/perf/script_02_5.py
@@ -0,0 +1,29 @@
+from datetime import date
+import sys
+import requests
+
+
+def get_python_type(ch_type):
+ if ch_type.startswith('Int') or ch_type.startswith('UInt'):
+ return int
+
+ elif ch_type == 'String' or ch_type.startswith('FixedString'):
+ return None
+
+ elif ch_type == 'Date':
+ return lambda value: date(*[int(x) for x in value.split('-')])
+
+ raise ValueError(f'Unsupported type: "{ch_type}"')
+
+
+resp = requests.get('http://localhost:8123', params={'query': 'describe table perftest.ontime FORMAT TSV'})
+ch_types = [x.split('\t')[1] for x in resp.text.split('\n') if x]
+python_types = [get_python_type(x) for x in ch_types]
+
+query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(sys.argv[1])
+resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
+
+data = []
+
+for line in resp.iter_lines(chunk_size=10000):
+ data.append([cls(x) if cls else x for x, cls in zip(line.decode('utf-8').split('\t'), python_types)])
diff --git a/perf/script_03.py b/perf/script_03.py
new file mode 100644
index 0000000..51ef885
--- /dev/null
+++ b/perf/script_03.py
@@ -0,0 +1,8 @@
+import sys
+import requests
+from ujson import loads
+
+query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT JSONEachRow".format(sys.argv[1])
+resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
+
+data = [list(loads(line).values()) for line in resp.iter_lines(chunk_size=10000)]
diff --git a/perf/script_04.py b/perf/script_04.py
new file mode 100644
index 0000000..e1b779a
--- /dev/null
+++ b/perf/script_04.py
@@ -0,0 +1,7 @@
+import sys
+from clickhouse_driver import Client
+
+query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}'".format(sys.argv[1])
+client = Client.from_url('clickhouse://localhost')
+
+data = client.execute(query)
diff --git a/perf/script_05.py b/perf/script_05.py
new file mode 100644
index 0000000..6ed6b9c
--- /dev/null
+++ b/perf/script_05.py
@@ -0,0 +1,8 @@
+import sys
+import requests
+
+query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(sys.argv[1])
+resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
+
+for line in resp.iter_lines(chunk_size=10000):
+ line = line.decode('utf-8').split('\t')
diff --git a/perf/script_05_5.py b/perf/script_05_5.py
new file mode 100644
index 0000000..e20cbbe
--- /dev/null
+++ b/perf/script_05_5.py
@@ -0,0 +1,27 @@
+from datetime import date
+import sys
+import requests
+
+
+def get_python_type(ch_type):
+ if ch_type.startswith('Int') or ch_type.startswith('UInt'):
+ return int
+
+ elif ch_type == 'String' or ch_type.startswith('FixedString'):
+ return None
+
+ elif ch_type == 'Date':
+ return lambda value: date(*[int(x) for x in value.split('-')])
+
+ raise ValueError(f'Unsupported type: "{ch_type}"')
+
+
+resp = requests.get('http://localhost:8123', params={'query': 'describe table perftest.ontime FORMAT TSV'})
+ch_types = [x.split('\t')[1] for x in resp.text.split('\n') if x]
+python_types = [get_python_type(x) for x in ch_types]
+
+query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(sys.argv[1])
+resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
+
+for line in resp.iter_lines(chunk_size=10000):
+ line = [cls(x) if cls else x for x, cls in zip(line.decode('utf-8').split('\t'), python_types)]
diff --git a/perf/script_06.py b/perf/script_06.py
new file mode 100644
index 0000000..e628f99
--- /dev/null
+++ b/perf/script_06.py
@@ -0,0 +1,9 @@
+import sys
+import requests
+from ujson import loads
+
+query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}' FORMAT JSONEachRow".format(sys.argv[1])
+resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
+
+for line in resp.iter_lines(chunk_size=10000):
+ line = list(loads(line).values())
diff --git a/perf/script_07.py b/perf/script_07.py
new file mode 100644
index 0000000..87b048c
--- /dev/null
+++ b/perf/script_07.py
@@ -0,0 +1,8 @@
+import sys
+from clickhouse_driver import Client
+
+query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}'".format(sys.argv[1])
+client = Client.from_url('clickhouse://localhost')
+
+for row in client.execute_iter(query):
+ pass
diff --git a/perf/script_08.py b/perf/script_08.py
new file mode 100644
index 0000000..11878c5
--- /dev/null
+++ b/perf/script_08.py
@@ -0,0 +1,21 @@
+import sys
+import requests
+
+cols = [
+ 'UniqueCarrier', 'Carrier', 'TailNum', 'FlightNum', 'Origin', 'OriginCityName', 'OriginState',
+ 'OriginStateFips', 'OriginStateName', 'Dest', 'DestCityName', 'DestState', 'DestStateFips',
+ 'DestStateName', 'DepartureDelayGroups', 'DepTimeBlk', 'ArrTimeBlk', 'CancellationCode',
+ 'FirstDepTime', 'TotalAddGTime', 'LongestAddGTime', 'DivAirportLandings', 'DivReachedDest',
+ 'DivActualElapsedTime', 'DivArrDelay', 'DivDistance', 'Div1Airport', 'Div1WheelsOn', 'Div1TotalGTime',
+ 'Div1LongestGTime', 'Div1WheelsOff', 'Div1TailNum', 'Div2Airport', 'Div2WheelsOn', 'Div2TotalGTime',
+ 'Div2LongestGTime', 'Div2WheelsOff', 'Div2TailNum', 'Div3Airport', 'Div3WheelsOn', 'Div3TotalGTime',
+ 'Div3LongestGTime', 'Div3WheelsOff', 'Div3TailNum', 'Div4Airport', 'Div4WheelsOn', 'Div4TotalGTime',
+ 'Div4LongestGTime', 'Div4WheelsOff', 'Div4TailNum', 'Div5Airport', 'Div5WheelsOn', 'Div5TotalGTime',
+ 'Div5LongestGTime', 'Div5WheelsOff', 'Div5TailNum'
+]
+
+query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(', '.join(cols), sys.argv[1])
+resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
+
+for line in resp.iter_lines(chunk_size=10000):
+ line = line.decode('utf-8').split('\t')
diff --git a/perf/script_09.py b/perf/script_09.py
new file mode 100644
index 0000000..957edec
--- /dev/null
+++ b/perf/script_09.py
@@ -0,0 +1,22 @@
+import sys
+import requests
+from ujson import loads
+
+cols = [
+ 'UniqueCarrier', 'Carrier', 'TailNum', 'FlightNum', 'Origin', 'OriginCityName', 'OriginState',
+ 'OriginStateFips', 'OriginStateName', 'Dest', 'DestCityName', 'DestState', 'DestStateFips',
+ 'DestStateName', 'DepartureDelayGroups', 'DepTimeBlk', 'ArrTimeBlk', 'CancellationCode',
+ 'FirstDepTime', 'TotalAddGTime', 'LongestAddGTime', 'DivAirportLandings', 'DivReachedDest',
+ 'DivActualElapsedTime', 'DivArrDelay', 'DivDistance', 'Div1Airport', 'Div1WheelsOn', 'Div1TotalGTime',
+ 'Div1LongestGTime', 'Div1WheelsOff', 'Div1TailNum', 'Div2Airport', 'Div2WheelsOn', 'Div2TotalGTime',
+ 'Div2LongestGTime', 'Div2WheelsOff', 'Div2TailNum', 'Div3Airport', 'Div3WheelsOn', 'Div3TotalGTime',
+ 'Div3LongestGTime', 'Div3WheelsOff', 'Div3TailNum', 'Div4Airport', 'Div4WheelsOn', 'Div4TotalGTime',
+ 'Div4LongestGTime', 'Div4WheelsOff', 'Div4TailNum', 'Div5Airport', 'Div5WheelsOn', 'Div5TotalGTime',
+ 'Div5LongestGTime', 'Div5WheelsOff', 'Div5TailNum'
+]
+
+query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}' FORMAT JSONEachRow".format(', '.join(cols), sys.argv[1])
+resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
+
+for line in resp.iter_lines(chunk_size=10000):
+ line = list(loads(line).values())
diff --git a/perf/script_10.py b/perf/script_10.py
new file mode 100644
index 0000000..bdef787
--- /dev/null
+++ b/perf/script_10.py
@@ -0,0 +1,21 @@
+import sys
+from clickhouse_driver import Client
+
+cols = [
+ 'UniqueCarrier', 'Carrier', 'TailNum', 'FlightNum', 'Origin', 'OriginCityName', 'OriginState',
+ 'OriginStateFips', 'OriginStateName', 'Dest', 'DestCityName', 'DestState', 'DestStateFips',
+ 'DestStateName', 'DepartureDelayGroups', 'DepTimeBlk', 'ArrTimeBlk', 'CancellationCode',
+ 'FirstDepTime', 'TotalAddGTime', 'LongestAddGTime', 'DivAirportLandings', 'DivReachedDest',
+ 'DivActualElapsedTime', 'DivArrDelay', 'DivDistance', 'Div1Airport', 'Div1WheelsOn', 'Div1TotalGTime',
+ 'Div1LongestGTime', 'Div1WheelsOff', 'Div1TailNum', 'Div2Airport', 'Div2WheelsOn', 'Div2TotalGTime',
+ 'Div2LongestGTime', 'Div2WheelsOff', 'Div2TailNum', 'Div3Airport', 'Div3WheelsOn', 'Div3TotalGTime',
+ 'Div3LongestGTime', 'Div3WheelsOff', 'Div3TailNum', 'Div4Airport', 'Div4WheelsOn', 'Div4TotalGTime',
+ 'Div4LongestGTime', 'Div4WheelsOff', 'Div4TailNum', 'Div5Airport', 'Div5WheelsOn', 'Div5TotalGTime',
+ 'Div5LongestGTime', 'Div5WheelsOff', 'Div5TailNum'
+]
+
+query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}'".format(', '.join(cols), sys.argv[1])
+client = Client.from_url('clickhouse://localhost')
+
+for row in client.execute_iter(query):
+ pass
diff --git a/perf/script_11.py b/perf/script_11.py
new file mode 100644
index 0000000..4ebee0c
--- /dev/null
+++ b/perf/script_11.py
@@ -0,0 +1,19 @@
+import sys
+import requests
+
+cols = [
+ 'Year', 'Quarter', 'Month', 'DayofMonth', 'DayOfWeek', 'AirlineID', 'OriginAirportID', 'OriginAirportSeqID',
+ 'OriginCityMarketID', 'OriginWac', 'DestAirportID', 'DestAirportSeqID', 'DestCityMarketID', 'DestWac',
+ 'CRSDepTime', 'DepTime', 'DepDelay', 'DepDelayMinutes', 'DepDel15', 'TaxiOut', 'WheelsOff', 'WheelsOn',
+ 'TaxiIn', 'CRSArrTime', 'ArrTime', 'ArrDelay', 'ArrDelayMinutes', 'ArrDel15', 'ArrivalDelayGroups',
+ 'Cancelled', 'Diverted', 'CRSElapsedTime', 'ActualElapsedTime', 'AirTime', 'Flights', 'Distance',
+ 'DistanceGroup', 'CarrierDelay', 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay',
+ 'Div1AirportID', 'Div1AirportSeqID', 'Div2AirportID', 'Div2AirportSeqID', 'Div3AirportID',
+ 'Div3AirportSeqID', 'Div4AirportID', 'Div4AirportSeqID', 'Div5AirportID', 'Div5AirportSeqID'
+]
+
+query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}' FORMAT TSV".format(', '.join(cols), sys.argv[1])
+resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
+
+for line in resp.iter_lines(chunk_size=10000):
+ line = [int(x) for x in line.split(b'\t')]
diff --git a/perf/script_12.py b/perf/script_12.py
new file mode 100644
index 0000000..f54a37d
--- /dev/null
+++ b/perf/script_12.py
@@ -0,0 +1,20 @@
+import sys
+import requests
+from ujson import loads
+
+cols = [
+ 'Year', 'Quarter', 'Month', 'DayofMonth', 'DayOfWeek', 'AirlineID', 'OriginAirportID', 'OriginAirportSeqID',
+ 'OriginCityMarketID', 'OriginWac', 'DestAirportID', 'DestAirportSeqID', 'DestCityMarketID', 'DestWac',
+ 'CRSDepTime', 'DepTime', 'DepDelay', 'DepDelayMinutes', 'DepDel15', 'TaxiOut', 'WheelsOff', 'WheelsOn',
+ 'TaxiIn', 'CRSArrTime', 'ArrTime', 'ArrDelay', 'ArrDelayMinutes', 'ArrDel15', 'ArrivalDelayGroups',
+ 'Cancelled', 'Diverted', 'CRSElapsedTime', 'ActualElapsedTime', 'AirTime', 'Flights', 'Distance',
+ 'DistanceGroup', 'CarrierDelay', 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay',
+ 'Div1AirportID', 'Div1AirportSeqID', 'Div2AirportID', 'Div2AirportSeqID', 'Div3AirportID',
+ 'Div3AirportSeqID', 'Div4AirportID', 'Div4AirportSeqID', 'Div5AirportID', 'Div5AirportSeqID'
+]
+
+query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}' FORMAT JSONEachRow".format(', '.join(cols), sys.argv[1])
+resp = requests.get('http://localhost:8123/', stream=True, params={'query': query})
+
+for line in resp.iter_lines(chunk_size=10000):
+ line = list(loads(line).values())
diff --git a/perf/script_13.py b/perf/script_13.py
new file mode 100644
index 0000000..7c02384
--- /dev/null
+++ b/perf/script_13.py
@@ -0,0 +1,19 @@
+import sys
+from clickhouse_driver import Client
+
+cols = [
+ 'Year', 'Quarter', 'Month', 'DayofMonth', 'DayOfWeek', 'AirlineID', 'OriginAirportID', 'OriginAirportSeqID',
+ 'OriginCityMarketID', 'OriginWac', 'DestAirportID', 'DestAirportSeqID', 'DestCityMarketID', 'DestWac',
+ 'CRSDepTime', 'DepTime', 'DepDelay', 'DepDelayMinutes', 'DepDel15', 'TaxiOut', 'WheelsOff', 'WheelsOn',
+ 'TaxiIn', 'CRSArrTime', 'ArrTime', 'ArrDelay', 'ArrDelayMinutes', 'ArrDel15', 'ArrivalDelayGroups',
+ 'Cancelled', 'Diverted', 'CRSElapsedTime', 'ActualElapsedTime', 'AirTime', 'Flights', 'Distance',
+ 'DistanceGroup', 'CarrierDelay', 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay',
+ 'Div1AirportID', 'Div1AirportSeqID', 'Div2AirportID', 'Div2AirportSeqID', 'Div3AirportID',
+ 'Div3AirportSeqID', 'Div4AirportID', 'Div4AirportSeqID', 'Div5AirportID', 'Div5AirportSeqID'
+]
+
+query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}'".format(', '.join(cols), sys.argv[1])
+client = Client.from_url('clickhouse://localhost')
+
+for row in client.execute_iter(query):
+ pass
diff --git a/perf/script_14.py b/perf/script_14.py
new file mode 100644
index 0000000..66e1fab
--- /dev/null
+++ b/perf/script_14.py
@@ -0,0 +1,7 @@
+import sys
+import clickhouse_connect
+
+query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}'".format(sys.argv[1])
+client = clickhouse_connect.get_client(host='localhost', query_limit=None, compress=False)
+
+data = client.query(query).result_rows
diff --git a/perf/script_15.py b/perf/script_15.py
new file mode 100644
index 0000000..650c7f0
--- /dev/null
+++ b/perf/script_15.py
@@ -0,0 +1,23 @@
+import sys
+import clickhouse_connect
+
+cols = [
+ 'UniqueCarrier', 'Carrier', 'TailNum', 'FlightNum', 'Origin', 'OriginCityName', 'OriginState',
+ 'OriginStateFips', 'OriginStateName', 'Dest', 'DestCityName', 'DestState', 'DestStateFips',
+ 'DestStateName', 'DepartureDelayGroups', 'DepTimeBlk', 'ArrTimeBlk', 'CancellationCode',
+ 'FirstDepTime', 'TotalAddGTime', 'LongestAddGTime', 'DivAirportLandings', 'DivReachedDest',
+ 'DivActualElapsedTime', 'DivArrDelay', 'DivDistance', 'Div1Airport', 'Div1WheelsOn', 'Div1TotalGTime',
+ 'Div1LongestGTime', 'Div1WheelsOff', 'Div1TailNum', 'Div2Airport', 'Div2WheelsOn', 'Div2TotalGTime',
+ 'Div2LongestGTime', 'Div2WheelsOff', 'Div2TailNum', 'Div3Airport', 'Div3WheelsOn', 'Div3TotalGTime',
+ 'Div3LongestGTime', 'Div3WheelsOff', 'Div3TailNum', 'Div4Airport', 'Div4WheelsOn', 'Div4TotalGTime',
+ 'Div4LongestGTime', 'Div4WheelsOff', 'Div4TailNum', 'Div5Airport', 'Div5WheelsOn', 'Div5TotalGTime',
+ 'Div5LongestGTime', 'Div5WheelsOff', 'Div5TailNum'
+]
+
+query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}'".format(', '.join(cols), sys.argv[1])
+client = clickhouse_connect.get_client(host='localhost', query_limit=None, compress=False)
+
+rv = client.query(query)
+with rv:
+ for row in rv.stream_rows():
+ pass
diff --git a/perf/script_16.py b/perf/script_16.py
new file mode 100644
index 0000000..339c9bd
--- /dev/null
+++ b/perf/script_16.py
@@ -0,0 +1,21 @@
+import sys
+import clickhouse_connect
+
+cols = [
+ 'Year', 'Quarter', 'Month', 'DayofMonth', 'DayOfWeek', 'AirlineID', 'OriginAirportID', 'OriginAirportSeqID',
+ 'OriginCityMarketID', 'OriginWac', 'DestAirportID', 'DestAirportSeqID', 'DestCityMarketID', 'DestWac',
+ 'CRSDepTime', 'DepTime', 'DepDelay', 'DepDelayMinutes', 'DepDel15', 'TaxiOut', 'WheelsOff', 'WheelsOn',
+ 'TaxiIn', 'CRSArrTime', 'ArrTime', 'ArrDelay', 'ArrDelayMinutes', 'ArrDel15', 'ArrivalDelayGroups',
+ 'Cancelled', 'Diverted', 'CRSElapsedTime', 'ActualElapsedTime', 'AirTime', 'Flights', 'Distance',
+ 'DistanceGroup', 'CarrierDelay', 'WeatherDelay', 'NASDelay', 'SecurityDelay', 'LateAircraftDelay',
+ 'Div1AirportID', 'Div1AirportSeqID', 'Div2AirportID', 'Div2AirportSeqID', 'Div3AirportID',
+ 'Div3AirportSeqID', 'Div4AirportID', 'Div4AirportSeqID', 'Div5AirportID', 'Div5AirportSeqID'
+]
+
+query = "SELECT {} FROM perftest.ontime WHERE FlightDate < '{}'".format(', '.join(cols), sys.argv[1])
+client = clickhouse_connect.get_client(host='localhost', query_limit=None, compress=False)
+
+rv = client.query(query)
+with rv:
+ for row in rv.stream_rows():
+ pass
diff --git a/perf/script_17.py b/perf/script_17.py
new file mode 100644
index 0000000..999b7b0
--- /dev/null
+++ b/perf/script_17.py
@@ -0,0 +1,10 @@
+import sys
+import clickhouse_connect
+
+query = "SELECT * FROM perftest.ontime WHERE FlightDate < '{}'".format(sys.argv[1])
+client = clickhouse_connect.get_client(host='localhost', query_limit=None, compress=False)
+
+rv = client.query(query)
+with rv:
+ for row in rv.stream_rows():
+ pass
diff --git a/setup.py b/setup.py
index 15c49bf..d892848 100644
--- a/setup.py
+++ b/setup.py
@@ -97,7 +97,6 @@ setup(
'Programming Language :: SQL',
'Programming Language :: Python :: 3',
- 'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
@@ -120,7 +119,7 @@ setup(
'Changes': github_url + '/blob/master/CHANGELOG.md'
},
packages=find_packages('.', exclude=['tests*']),
- python_requires='>=3.6, <4',
+ python_requires='>=3.7, <4',
install_requires=[
'pytz',
'tzlocal'
diff --git a/tests/columns/test_json.py b/tests/columns/test_json.py
new file mode 100644
index 0000000..03ea0ff
--- /dev/null
+++ b/tests/columns/test_json.py
@@ -0,0 +1,80 @@
+import json
+
+from tests.testcase import BaseTestCase
+
+
+class JSONTestCase(BaseTestCase):
+ required_server_version = (22, 3, 2)
+
+ def client_kwargs(self, version):
+ return {'settings': {'allow_experimental_object_type': True}}
+
+ def cli_client_kwargs(self):
+ return {'allow_experimental_object_type': 1}
+
+ def test_simple(self):
+ rv = self.client.execute("SELECT '{\"bb\": {\"cc\": [255, 1]}}'::JSON")
+ self.assertEqual(rv, [({'bb': {'cc': [255, 1]}},)])
+
+ def test_from_table(self):
+ with self.create_table('a JSON'):
+ data = [
+ ({},),
+ ({'key1': 1}, ),
+ ({'key1': 2.1, 'key2': {'nested': 'key'}}, ),
+ ({'key1': 3, 'key3': ['test'], 'key4': [10, 20]}, )
+ ]
+ self.client.execute('INSERT INTO test (a) VALUES', data)
+ query = 'SELECT * FROM test'
+ inserted = self.emit_cli(query)
+ self.assertEqual(
+ inserted,
+ "(0,(''),[],[])\n"
+ "(1,(''),[],[])\n"
+ "(2.1,('key'),[],[])\n"
+ "(3,(''),['test'],[10,20])\n"
+ )
+ inserted = self.client.execute(query)
+ data_with_all_keys = [
+ ({'key1': 0, 'key2': {'nested': ''}, 'key3': [], 'key4': []},),
+ ({'key1': 1, 'key2': {'nested': ''}, 'key3': [], 'key4': []},),
+ ({'key1': 2.1, 'key2': {'nested': 'key'}, 'key3': [],
+ 'key4': []},),
+ ({'key1': 3, 'key2': {'nested': ''}, 'key3': ['test'],
+ 'key4': [10, 20]},)
+ ]
+ self.assertEqual(inserted, data_with_all_keys)
+
+ def test_insert_json_strings(self):
+ with self.create_table('a JSON'):
+ data = [
+ (json.dumps({'i-am': 'dumped json'}),),
+ ]
+ self.client.execute('INSERT INTO test (a) VALUES', data)
+ query = 'SELECT * FROM test'
+ inserted = self.emit_cli(query)
+ self.assertEqual(
+ inserted,
+ "('dumped json')\n"
+ )
+ inserted = self.client.execute(query)
+ data_with_all_keys = [
+ ({'`i-am`': 'dumped json'},)
+ ]
+ self.assertEqual(inserted, data_with_all_keys)
+
+ def test_json_as_named_tuple(self):
+ settings = {'namedtuple_as_json': False}
+ query = 'SELECT * FROM test'
+
+ with self.create_table('a JSON'):
+ data = [
+ ({'key': 'value'}, ),
+ ]
+ self.client.execute('INSERT INTO test (a) VALUES', data)
+ inserted = self.client.execute(query)
+ self.assertEqual(inserted, data)
+
+ with self.created_client(settings=settings) as client:
+ inserted = client.execute(query)
+ self.assertEqual(inserted, [(('value',),)])
diff --git a/tests/columns/test_low_cardinality.py b/tests/columns/test_low_cardinality.py
index d61c2d1..f7d1ac1 100644
--- a/tests/columns/test_low_cardinality.py
+++ b/tests/columns/test_low_cardinality.py
@@ -1,7 +1,9 @@
from datetime import date, timedelta
from decimal import Decimal
+from uuid import UUID
from tests.testcase import BaseTestCase
+from tests.util import require_server_version
class LowCardinalityTestCase(BaseTestCase):
@@ -65,6 +67,25 @@ class LowCardinalityTestCase(BaseTestCase):
inserted = self.client.execute(query)
self.assertEqual(inserted, data)
+ def test_nullable_date(self):
+ with self.create_table('a LowCardinality(Nullable(Date))'):
+ data = [(date(2023, 4, 1), ), (None, ), (date(1970, 1, 1), )]
+ self.client.execute('INSERT INTO test (a) VALUES', data)
+
+ query = 'SELECT * FROM test'
+ inserted = self.client.execute(query)
+ self.assertEqual(inserted, data)
+
+ @require_server_version(21, 6)
+ def test_nullable_uuid(self):
+ with self.create_table('a LowCardinality(Nullable(UUID))'):
+ data = [(UUID('2efcead4-ff55-4db5-bdb4-6b36a308d8e0'), ), (None, )]
+ self.client.execute('INSERT INTO test (a) VALUES', data)
+
+ query = 'SELECT * FROM test'
+ inserted = self.client.execute(query)
+ self.assertEqual(inserted, data)
+
def test_float(self):
with self.create_table('a LowCardinality(Float)'):
data = [(float(x),) for x in range(300)]
diff --git a/tests/columns/test_nested.py b/tests/columns/test_nested.py
index 25e8c3d..9c9f390 100644
--- a/tests/columns/test_nested.py
+++ b/tests/columns/test_nested.py
@@ -1,15 +1,14 @@
from tests.testcase import BaseTestCase
-from tests.util import require_server_version
-from clickhouse_driver.columns import nestedcolumn
class NestedTestCase(BaseTestCase):
+ required_server_version = (21, 3, 13)
+
def entuple(self, lst):
return tuple(
self.entuple(x) if isinstance(x, list) else x for x in lst
)
- @require_server_version(21, 3, 13)
def test_simple(self):
columns = 'n Nested(i Int32, s String)'
@@ -40,7 +39,6 @@ class NestedTestCase(BaseTestCase):
[(['a', 'b'],)]
)
- @require_server_version(21, 3, 13)
def test_multiple_rows(self):
columns = 'n Nested(i Int32, s String)'
@@ -61,7 +59,6 @@ class NestedTestCase(BaseTestCase):
inserted = self.client.execute(query)
self.assertEqual(inserted, data)
- @require_server_version(21, 3, 13)
def test_dict(self):
columns = 'n Nested(i Int32, s String)'
@@ -88,18 +85,22 @@ class NestedTestCase(BaseTestCase):
[([(0, 'a'), (1, 'b')],), ([(3, 'd'), (4, 'e')],)]
)
- def test_get_nested_columns(self):
- self.assertEqual(
- nestedcolumn.get_nested_columns(
- 'Nested(a Tuple(Array(Int8)),\n b Nullable(String))',
- ),
- ['Tuple(Array(Int8))', 'Nullable(String)']
- )
+ def test_nested_side_effect_as_json(self):
+ client_settings = {
+ 'allow_experimental_object_type': True
+ }
+ columns = 'n Nested(i Int32, s String)'
- def test_get_columns_with_types(self):
- self.assertEqual(
- nestedcolumn.get_columns_with_types(
- 'Nested(a Tuple(Array(Int8)),\n b Nullable(String))',
- ),
- [('a', 'Tuple(Array(Int8))'), ('b', 'Nullable(String)')]
- )
+ data = [([(0, 'a'), (1, 'b')],)]
+
+ with self.create_table(columns, flatten_nested=0):
+ with self.created_client(settings=client_settings) as client:
+ client.execute(
+ 'INSERT INTO test (n) VALUES', data
+ )
+
+ inserted = client.execute('SELECT * FROM test')
+ self.assertEqual(
+ inserted,
+ [([{'i': 0, 's': 'a'}, {'i': 1, 's': 'b'}],)]
+ )
diff --git a/tests/columns/test_sparse.py b/tests/columns/test_sparse.py
new file mode 100644
index 0000000..e7ae861
--- /dev/null
+++ b/tests/columns/test_sparse.py
@@ -0,0 +1,126 @@
+from datetime import date
+
+from tests.testcase import BaseTestCase
+from clickhouse_driver import errors
+
+ErrorCodes = errors.ErrorCodes
+
+
+class SparseTestCase(BaseTestCase):
+ required_server_version = (22, 1)
+
+ create_table_template = (
+ 'CREATE TABLE test ({}) '
+ 'ENGINE = MergeTree '
+ 'ORDER BY tuple() '
+ 'SETTINGS ratio_of_defaults_for_sparse_serialization = 0.5'
+ )
+
+ def test_int_all_defaults(self):
+ columns = 'a Int32'
+
+ data = [(0, ), (0, ), (0, )]
+ with self.create_table(columns):
+ self.client.execute(
+ 'INSERT INTO test (a) VALUES', data
+ )
+
+ query = 'SELECT * FROM test'
+ inserted = self.emit_cli(query)
+ self.assertEqual(inserted, '0\n0\n0\n')
+
+ inserted = self.client.execute(query)
+ self.assertEqual(inserted, data)
+
+ data = [(0,)]
+ with self.create_table(columns):
+ self.client.execute(
+ 'INSERT INTO test (a) VALUES', data
+ )
+
+ query = 'SELECT * FROM test'
+ inserted = self.emit_cli(query)
+ self.assertEqual(inserted, '0\n')
+
+ inserted = self.client.execute(query)
+ self.assertEqual(inserted, data)
+
+ def test_int_borders_cases(self):
+ columns = 'a Int32'
+
+ data = [(1, ), (0, ), (0, ), (1, ), (0, ), (0, ), (1, )]
+ with self.create_table(columns):
+ self.client.execute(
+ 'INSERT INTO test (a) VALUES', data
+ )
+
+ query = 'SELECT * FROM test'
+ inserted = self.emit_cli(query)
+ self.assertEqual(inserted, '1\n0\n0\n1\n0\n0\n1\n')
+
+ inserted = self.client.execute(query)
+ self.assertEqual(inserted, data)
+
+ def test_int_default_last(self):
+ columns = 'a Int32'
+
+ data = [(1, ), (0, ), (0, )]
+ with self.create_table(columns):
+ self.client.execute(
+ 'INSERT INTO test (a) VALUES', data
+ )
+
+ query = 'SELECT * FROM test'
+ inserted = self.emit_cli(query)
+ self.assertEqual(inserted, '1\n0\n0\n')
+
+ inserted = self.client.execute(query)
+ self.assertEqual(inserted, data)
+
+ def test_sparse_tuples(self):
+ columns = 'a Int32, b Tuple(Int32, Tuple(Int32, Int32))'
+
+ data = [
+ (1, (1, (1, 0))),
+ (0, (0, (0, 0))),
+ (0, (0, (0, 0)))
+ ]
+ with self.create_table(columns):
+ self.client.execute(
+ 'INSERT INTO test VALUES', data
+ )
+
+ query = 'SELECT * FROM test'
+ inserted = self.emit_cli(query)
+ self.assertEqual(
+ inserted,
+ '1\t(1,(1,0))\n'
+ '0\t(0,(0,0))\n'
+ '0\t(0,(0,0))\n'
+ )
+
+ inserted = self.client.execute(query)
+ self.assertEqual(inserted, data)
+
+ def test_sparse_dates(self):
+ columns = 'a Date32'
+
+ data = [
+ (date(1970, 1, 1), ),
+ (date(1970, 1, 1), ),
+ ]
+ with self.create_table(columns):
+ self.client.execute(
+ 'INSERT INTO test VALUES', data
+ )
+
+ query = 'SELECT * FROM test'
+ inserted = self.emit_cli(query)
+ self.assertEqual(
+ inserted,
+ '1970-01-01\n'
+ '1970-01-01\n'
+ )
+
+ inserted = self.client.execute(query)
+ self.assertEqual(inserted, data)
diff --git a/tests/columns/test_tuple.py b/tests/columns/test_tuple.py
index 45d19fd..46fd014 100644
--- a/tests/columns/test_tuple.py
+++ b/tests/columns/test_tuple.py
@@ -178,3 +178,22 @@ class TupleTestCase(BaseTestCase):
with self.create_table(columns):
with self.assertRaises(errors.TypeMismatchError):
self.client.execute('INSERT INTO test (a) VALUES', data)
+
+ def test_tuple_of_low_cardinality(self):
+ data = [((1, 2), )]
+ columns = 'a Tuple(LowCardinality(Int32), LowCardinality(Int32))'
+
+ with self.create_table(columns):
+ self.client.execute(
+ 'INSERT INTO test (a) VALUES', data
+ )
+
+ query = 'SELECT * FROM test'
+ inserted = self.emit_cli(query)
+ self.assertEqual(
+ inserted,
+ '(1,2)\n'
+ )
+
+ inserted = self.client.execute(query)
+ self.assertEqual(inserted, data)
diff --git a/tests/numpy/columns/test_datetime.py b/tests/numpy/columns/test_datetime.py
index 46f4ee8..2d3c593 100644
--- a/tests/numpy/columns/test_datetime.py
+++ b/tests/numpy/columns/test_datetime.py
@@ -226,6 +226,19 @@ class DateTimeTestCase(BaseDateTimeTestCase):
inserted = self.emit_cli(query)
self.assertEqual(inserted, '0\n1\n1500000000\n4294967295\n')
+ @require_server_version(20, 1, 2)
+ def test_negative_timestamps(self):
+ with self.create_table("a DateTime64(3, 'UTC')"):
+ times = np.array(['1900-01-01 00:00'], dtype='datetime64[ns]')
+ self.client.execute(
+ 'INSERT INTO test(a) VALUES',
+ [times],
+ columnar=True,
+ )
+
+ inserted = self.client.execute('SELECT * FROM test', columnar=True)
+ self.assertArraysEqual(inserted[0], times)
+
class DateTimeTimezonesTestCase(BaseDateTimeTestCase):
dt_type = 'DateTime'
@@ -570,6 +583,78 @@ class DateTimeTimezonesTestCase(BaseDateTimeTestCase):
inserted[0], self.make_tz_numpy_array(dt, self.col_tz_name)
)
+ @require_server_version(1, 1, 54337)
+ def test_read_tz_aware_column(self):
+ # read data from tz aware column Asia/Novosibirsk
+ # offset_naive is False -> tz convert not needed
+
+ with self.create_table(self.table_columns(with_tz=True)):
+ with patch.object(
+ pd, 'to_datetime', wraps=pd.to_datetime
+ ) as to_datetime_spy:
+ self.client.execute(
+ 'INSERT INTO test (a) VALUES', [self.dt_arr], columnar=True
+ )
+
+ self.emit_cli(
+ "INSERT INTO test (a) VALUES ('2017-07-14 05:40:00')",
+ )
+
+ to_datetime_calls_before_read = to_datetime_spy.call_count
+
+ query = 'SELECT * FROM test'
+ inserted = self.client.execute(query, columnar=True)
+
+ self.assertEqual(
+ to_datetime_calls_before_read,
+ to_datetime_spy.call_count
+ )
+
+ self.assertArraysEqual(
+ inserted[0],
+ self.make_tz_numpy_array(self.dt, self.col_tz_name)
+ )
+
+ @require_server_version(1, 1, 54337)
+ def test_read_tz_naive_column_with_client_timezone(self):
+ # read data from column without timezone
+ # client timezone = Asia/Novosibirsk
+ # offset_naive is True and timezone is not UTC -> tz convert needed
+
+ settings = {'use_client_time_zone': True}
+
+ with patch_env_tz('Asia/Novosibirsk'):
+ with self.create_table(self.table_columns()):
+ with patch.object(
+ pd, 'to_datetime', wraps=pd.to_datetime
+ ) as to_datetime_spy:
+ self.client.execute(
+ 'INSERT INTO test (a) VALUES', [self.dt_arr],
+ settings=settings, columnar=True
+ )
+
+ self.emit_cli(
+ "INSERT INTO test (a) VALUES ('2017-07-14 05:40:00')",
+ use_client_time_zone=1
+ )
+
+ to_datetime_calls_before_read = to_datetime_spy.call_count
+
+ query = 'SELECT * FROM test'
+ inserted = self.client.execute(
+ query, settings=settings, columnar=True
+ )
+
+ self.assertEqual(
+ to_datetime_calls_before_read + 2,
+ to_datetime_spy.call_count
+ )
+
+ self.assertArraysEqual(
+ inserted[0],
+ self.make_numpy_d64ns([self.dt_str] * 2)
+ )
+
class DateTime64TimezonesTestCase(DateTimeTimezonesTestCase):
dt_type = 'DateTime64'
diff --git a/tests/numpy/test_columns_names.py b/tests/numpy/test_columns_names.py
new file mode 100644
index 0000000..ae5af7a
--- /dev/null
+++ b/tests/numpy/test_columns_names.py
@@ -0,0 +1,22 @@
+from tests.numpy.testcase import NumpyBaseTestCase
+
+
+class ColumnsNamesTestCase(NumpyBaseTestCase):
+
+ def test_columns_names_replace_nonwords(self):
+ columns = (
+ 'regular Int64, '
+ 'CamelCase Int64, '
+ 'With_Underscores Int64, '
+ '`Any%different.Column?` Int64'
+ )
+
+ expected_columns = [
+ 'regular', 'CamelCase', 'With_Underscores', 'Any%different.Column?'
+ ]
+
+ with self.create_table(columns):
+ df = self.client.query_dataframe(
+ 'SELECT * FROM test', replace_nonwords=False
+ )
+ self.assertEqual(expected_columns, list(df.columns))
diff --git a/tests/test_client.py b/tests/test_client.py
index d8d5c7c..1a8787b 100644
--- a/tests/test_client.py
+++ b/tests/test_client.py
@@ -275,3 +275,16 @@ class ClientFromUrlTestCase(TestCase):
'clickhouse://host?round_robin=true&alt_hosts=host2'
)
self.assertEqual(len(c.connections), 1)
+
+ def test_tcp_keepalive(self):
+ c = Client.from_url('clickhouse://host?tcp_keepalive=true')
+ self.assertTrue(c.connection.tcp_keepalive)
+
+ c = Client.from_url('clickhouse://host?tcp_keepalive=10.5,2.5,3')
+ self.assertEqual(
+ c.connection.tcp_keepalive, (10.5, 2.5, 3)
+ )
+
+ def test_client_revision(self):
+ c = Client.from_url('clickhouse://host?client_revision=54032')
+ self.assertEqual(c.connection.client_revision, 54032)
diff --git a/tests/test_connect.py b/tests/test_connect.py
index b45f2a0..5986db6 100644
--- a/tests/test_connect.py
+++ b/tests/test_connect.py
@@ -264,6 +264,48 @@ class ConnectTestCase(BaseTestCase):
self.assertFalse(client.connection.connected)
self.assertFalse(list(client.connections)[0].connected)
+ def test_round_robin_client_construction(self):
+ # host and port as keyword args
+ Client(
+ host='host',
+ port=9000,
+ round_robin=True,
+ alt_hosts='host2'
+ )
+
+ # host as positional and port as keyword arg
+ Client(
+ 'host',
+ 9000,
+ round_robin=True,
+ alt_hosts='host2'
+ )
+
+ # host and port as positional args
+ Client(
+ 'host',
+ 9000,
+ round_robin=True,
+ alt_hosts='host2'
+ )
+
+ def test_tcp_keepalive(self):
+ self.assertFalse(self.client.connection.tcp_keepalive)
+
+ with self.created_client(tcp_keepalive=True) as client:
+ self.assertTrue(client.connection.tcp_keepalive)
+
+ client.execute('SELECT 1')
+
+ with self.created_client(tcp_keepalive=(100, 20, 2)) as client:
+ self.assertEqual(client.connection.tcp_keepalive, (100, 20, 2))
+
+ client.execute('SELECT 1')
+
+ def test_client_revision(self):
+ with self.created_client(client_revision=54032) as client:
+ client.execute('SELECT 1')
+
class FakeBufferedReader(BufferedReader):
def __init__(self, inputs, bufsize=128):
diff --git a/tests/test_insert.py b/tests/test_insert.py
index 795fb71..703c28c 100644
--- a/tests/test_insert.py
+++ b/tests/test_insert.py
@@ -155,7 +155,7 @@ class InsertTestCase(BaseTestCase):
data = [{'a': 1}]
self.client.execute(
"INSERT INTO test (a) "
- "SELECT a from input ('a Int8')",
+ "SELECT a FROM input ('a Int8') FORMAT Native",
data
)
diff --git a/tests/test_substitution.py b/tests/test_substitution.py
index 2268bb4..34dc563 100644
--- a/tests/test_substitution.py
+++ b/tests/test_substitution.py
@@ -1,7 +1,7 @@
# coding=utf-8
from __future__ import unicode_literals
-from datetime import date, datetime
+from datetime import date, datetime, time
from decimal import Decimal
from unittest.mock import Mock
from uuid import UUID
@@ -53,6 +53,15 @@ class ParametersSubstitutionTestCase(BaseTestCase):
rv = self.client.execute(tpl, params)
self.assertEqual(rv, [(d, )])
+ def test_time(self):
+ t = time(8, 20, 15)
+ params = {'x': t}
+
+ self.assert_subst(self.single_tpl, params, "SELECT '08:20:15'")
+
+ rv = self.client.execute(self.single_tpl, params)
+ self.assertEqual(rv, [('08:20:15', )])
+
def test_datetime(self):
dt = datetime(2017, 10, 16, 0, 18, 50)
params = {'x': dt}
@@ -231,3 +240,31 @@ class ParametersSubstitutionTestCase(BaseTestCase):
self.assertEqual(e.exception.args[0],
'Parameters are expected in dict form')
+
+
+class ServerSideParametersSubstitutionTestCase(BaseTestCase):
+ required_server_version = (22, 8)
+
+ def test_int(self):
+ rv = self.client.execute('SELECT {x:Int32}', {'x': 123})
+ self.assertEqual(rv, [(123, )])
+
+ def test_str(self):
+ rv = self.client.execute('SELECT {x:Int32}', {'x': '123'})
+ self.assertEqual(rv, [(123, )])
+
+ def test_escaped_str(self):
+ rv = self.client.execute(
+ 'SELECT {x:String}, length({x:String})', {'x': '\t'}
+ )
+ self.assertEqual(rv, [('\t', 1)])
+
+ rv = self.client.execute(
+ 'SELECT {x:String}, length({x:String})', {'x': '\\'}
+ )
+ self.assertEqual(rv, [('\\', 1)])
+
+ rv = self.client.execute(
+ 'SELECT {x:String}, length({x:String})', {'x': "'"}
+ )
+ self.assertEqual(rv, [("'", 1)])
diff --git a/tests/testcase.py b/tests/testcase.py
index 597ecf8..40b79b0 100644
--- a/tests/testcase.py
+++ b/tests/testcase.py
@@ -30,6 +30,8 @@ class BaseTestCase(TestCase):
client_kwargs = None
cli_client_kwargs = None
+ create_table_template = 'CREATE TABLE test ({}) ENGINE = Memory'
+
@classmethod
def emit_cli(cls, statement, database=None, encoding='utf-8', **kwargs):
if database is None:
@@ -109,7 +111,7 @@ class BaseTestCase(TestCase):
super(BaseTestCase, cls).tearDownClass()
@contextmanager
- def create_table(self, columns, **kwargs):
+ def create_table(self, columns, template=None, **kwargs):
if self.cli_client_kwargs:
if callable(self.cli_client_kwargs):
cli_client_kwargs = self.cli_client_kwargs()
@@ -118,10 +120,8 @@ class BaseTestCase(TestCase):
else:
kwargs.update(self.cli_client_kwargs)
- self.emit_cli(
- 'CREATE TABLE test ({}) ''ENGINE = Memory'.format(columns),
- **kwargs
- )
+ template = template or self.create_table_template
+ self.emit_cli(template.format(columns), **kwargs)
try:
yield
except Exception:
Debdiff
[The following lists of changes regard files as different if they have different names, permissions or owners.]
Files in second set of .debs but not in first
-rw-r--r-- root/root /usr/lib/python3/dist-packages/clickhouse_driver-0.2.6.egg-info/PKG-INFO -rw-r--r-- root/root /usr/lib/python3/dist-packages/clickhouse_driver-0.2.6.egg-info/dependency_links.txt -rw-r--r-- root/root /usr/lib/python3/dist-packages/clickhouse_driver-0.2.6.egg-info/requires.txt -rw-r--r-- root/root /usr/lib/python3/dist-packages/clickhouse_driver-0.2.6.egg-info/top_level.txt -rw-r--r-- root/root /usr/lib/python3/dist-packages/clickhouse_driver/columns/jsoncolumn.py
Files in first set of .debs but not in second
-rw-r--r-- root/root /usr/lib/python3/dist-packages/clickhouse_driver-0.2.5.egg-info/PKG-INFO -rw-r--r-- root/root /usr/lib/python3/dist-packages/clickhouse_driver-0.2.5.egg-info/dependency_links.txt -rw-r--r-- root/root /usr/lib/python3/dist-packages/clickhouse_driver-0.2.5.egg-info/requires.txt -rw-r--r-- root/root /usr/lib/python3/dist-packages/clickhouse_driver-0.2.5.egg-info/top_level.txt
No differences were encountered between the control files of package python3-clickhouse-driver
No differences were encountered between the control files of package python3-clickhouse-driver-dbgsym
No differences were encountered between the control files of package python3-clickhouse-driver-doc