New Upstream Release - python-rdata
Ready changes
Summary
Merged new upstream version: 0.9 (was: 0.5).
Resulting package
Built on 2022-10-22T12:22 (took 2m50s)
The resulting binary packages can be installed (if you have the apt repository enabled) by running one of:
apt install -t fresh-releases python3-rdata
Diff
diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 9a16941..82f1468 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -30,5 +30,9 @@ jobs:
pip3 install .
coverage run --source=rdata/ --omit=rdata/tests/ setup.py test;
+ - name: Generate coverage XML
+ run: |
+ coverage xml
+
- name: Upload coverage to Codecov
- uses: codecov/codecov-action@v1
+ uses: codecov/codecov-action@v2
diff --git a/.github/workflows/python-publish.yml b/.github/workflows/python-publish.yml
new file mode 100644
index 0000000..ec70354
--- /dev/null
+++ b/.github/workflows/python-publish.yml
@@ -0,0 +1,39 @@
+# This workflow will upload a Python Package using Twine when a release is created
+# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
+
+# This workflow uses actions that are not certified by GitHub.
+# They are provided by a third-party and are governed by
+# separate terms of service, privacy policy, and support
+# documentation.
+
+name: Upload Python Package
+
+on:
+ release:
+ types: [published]
+
+permissions:
+ contents: read
+
+jobs:
+ deploy:
+
+ runs-on: ubuntu-latest
+
+ steps:
+ - uses: actions/checkout@v3
+ - name: Set up Python
+ uses: actions/setup-python@v3
+ with:
+ python-version: '3.x'
+ - name: Install dependencies
+ run: |
+ python -m pip install --upgrade pip
+ pip install build
+ - name: Build package
+ run: python -m build
+ - name: Publish package
+ uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29
+ with:
+ user: __token__
+ password: ${{ secrets.PYPI_API_TOKEN }}
diff --git a/CITATION.cff b/CITATION.cff
new file mode 100644
index 0000000..b54e80d
--- /dev/null
+++ b/CITATION.cff
@@ -0,0 +1,26 @@
+cff-version: 1.2.0
+message: "If you use this software, please cite it as below."
+authors:
+ - family-names: "Ramos-Carreño"
+ given-names: "Carlos"
+ orcid: "https://orcid.org/0000-0003-2566-7058"
+ affiliation: "Universidad Autónoma de Madrid"
+ email: vnmabus@gmail.com
+title: "rdata: Read R datasets from Python"
+date-released: 2022-03-24
+doi: 10.5281/zenodo.6382237
+url: "https://github.com/vnmabus/rdata"
+license: MIT
+keywords:
+ - rdata
+ - Python
+ - R
+ - parser
+ - conversion
+identifiers:
+ - description: "This is the collection of archived snapshots of all versions of rdata"
+ type: doi
+ value: 10.5281/zenodo.6382237
+ - description: "This is the archived snapshot of version 0.7 of rdata"
+ type: doi
+ value: 10.5281/zenodo.6382238
\ No newline at end of file
diff --git a/MANIFEST.in b/MANIFEST.in
index 56e0267..4e06f8e 100644
--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -1,5 +1,5 @@
include MANIFEST.in
-include VERSION
+include rdata/VERSION
include LICENSE
include rdata/py.typed
include *.txt
\ No newline at end of file
diff --git a/README.rst b/README.rst
index 98a4a44..1dd572e 100644
--- a/README.rst
+++ b/README.rst
@@ -1,7 +1,7 @@
rdata
=====
-|build-status| |docs| |coverage| |landscape| |pypi|
+|build-status| |docs| |coverage| |landscape| |pypi| |zenodo|
Read R datasets from Python.
@@ -103,9 +103,9 @@ Pandas `Categorical` objects:
>>> converted = rdata.conversion.convert(parsed, new_dict)
>>> converted
{'test_dataframe': class value
- 0 b'a' 1
- 1 b'b' 2
- 2 b'b' 3}
+ 1 b'a' 1
+ 2 b'b' 2
+ 3 b'b' 3}
.. |build-status| image:: https://github.com/vnmabus/rdata/actions/workflows/main.yml/badge.svg?branch=master
@@ -130,4 +130,9 @@ Pandas `Categorical` objects:
.. |pypi| image:: https://badge.fury.io/py/rdata.svg
:alt: Pypi version
:scale: 100%
- :target: https://pypi.python.org/pypi/rdata/
\ No newline at end of file
+ :target: https://pypi.python.org/pypi/rdata/
+
+.. |zenodo| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.6382237.svg
+ :alt: Zenodo DOI
+ :scale: 100%
+ :target: https://doi.org/10.5281/zenodo.6382237
\ No newline at end of file
diff --git a/VERSION b/VERSION
deleted file mode 100644
index ea2303b..0000000
--- a/VERSION
+++ /dev/null
@@ -1 +0,0 @@
-0.5
\ No newline at end of file
diff --git a/debian/changelog b/debian/changelog
index 016c39c..cc299f3 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+python-rdata (0.9-1) UNRELEASED; urgency=low
+
+ * New upstream release.
+
+ -- Debian Janitor <janitor@jelmer.uk> Sat, 22 Oct 2022 12:20:08 -0000
+
python-rdata (0.5-3) unstable; urgency=medium
[ Debian Janitor ]
diff --git a/docs/conf.py b/docs/conf.py
index 335ac7b..04d31bb 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -22,7 +22,9 @@
# sys.path.insert(0, '/home/carlos/git/rdata/rdata')
import sys
+
import pkg_resources
+
try:
release = pkg_resources.get_distribution('rdata').version
except pkg_resources.DistributionNotFound:
@@ -208,3 +210,6 @@ epub_exclude_files = ['search.html']
intersphinx_mapping = {'python': ('https://docs.python.org/3', None),
'pandas': ('http://pandas.pydata.org/pandas-docs/dev', None)}
+
+autodoc_preserve_defaults = True
+autodoc_typehints = "description"
diff --git a/docs/simpleusage.rst b/docs/simpleusage.rst
index 4ecf266..968cc25 100644
--- a/docs/simpleusage.rst
+++ b/docs/simpleusage.rst
@@ -70,6 +70,6 @@ Pandas :class:`~pandas.Categorical` objects:
>>> converted = rdata.conversion.convert(parsed, new_dict)
>>> converted
{'test_dataframe': class value
- 0 b'a' 1
- 1 b'b' 2
- 2 b'b' 3}
+ 1 b'a' 1
+ 2 b'b' 2
+ 3 b'b' 3}
diff --git a/rdata/VERSION b/rdata/VERSION
new file mode 100644
index 0000000..9a7d84f
--- /dev/null
+++ b/rdata/VERSION
@@ -0,0 +1 @@
+0.9
\ No newline at end of file
diff --git a/rdata/__init__.py b/rdata/__init__.py
index 90d9fa6..c83f931 100644
--- a/rdata/__init__.py
+++ b/rdata/__init__.py
@@ -1,3 +1,5 @@
+"""rdata: Read R datasets from Python."""
+import errno as _errno
import os as _os
import pathlib as _pathlib
@@ -13,3 +15,15 @@ TESTDATA_PATH = _get_test_data_path()
Path of the test data.
"""
+
+try:
+ with open(
+ _pathlib.Path(_os.path.dirname(__file__)) / 'VERSION',
+ 'r',
+ ) as version_file:
+ __version__ = version_file.read().strip()
+except IOError as e:
+ if e.errno != _errno.ENOENT:
+ raise
+
+ __version__ = "0.0"
diff --git a/rdata/conversion/__init__.py b/rdata/conversion/__init__.py
index 9d9e1cb..8f8926c 100644
--- a/rdata/conversion/__init__.py
+++ b/rdata/conversion/__init__.py
@@ -1,8 +1,20 @@
-from ._conversion import (RExpression, RLanguage,
- convert_list, convert_attrs, convert_vector,
- convert_char, convert_symbol, convert_array,
- Converter, SimpleConverter,
- dataframe_constructor,
- factor_constructor,
- ts_constructor,
- DEFAULT_CLASS_MAP, convert)
+from ._conversion import (
+ DEFAULT_CLASS_MAP as DEFAULT_CLASS_MAP,
+ Converter as Converter,
+ RBuiltin as RBuiltin,
+ RBytecode as RBytecode,
+ RExpression as RExpression,
+ RFunction as RFunction,
+ RLanguage as RLanguage,
+ SimpleConverter as SimpleConverter,
+ convert as convert,
+ convert_array as convert_array,
+ convert_attrs as convert_attrs,
+ convert_char as convert_char,
+ convert_list as convert_list,
+ convert_symbol as convert_symbol,
+ convert_vector as convert_vector,
+ dataframe_constructor as dataframe_constructor,
+ factor_constructor as factor_constructor,
+ ts_constructor as ts_constructor,
+)
diff --git a/rdata/conversion/_conversion.py b/rdata/conversion/_conversion.py
index ed86853..3374acb 100644
--- a/rdata/conversion/_conversion.py
+++ b/rdata/conversion/_conversion.py
@@ -1,17 +1,20 @@
+from __future__ import annotations
+
import abc
import warnings
+from dataclasses import dataclass
from fractions import Fraction
from types import MappingProxyType, SimpleNamespace
from typing import (
Any,
Callable,
ChainMap,
- Hashable,
List,
Mapping,
MutableMapping,
NamedTuple,
Optional,
+ Sequence,
Union,
cast,
)
@@ -23,27 +26,77 @@ import xarray
from .. import parser
from ..parser import RObject
+ConversionFunction = Callable[[Union[parser.RData, parser.RObject]], Any]
+StrMap = Mapping[Union[str, bytes], Any]
+
class RLanguage(NamedTuple):
- """
- R language construct.
- """
+ """R language construct."""
+
elements: List[Any]
+ attributes: Mapping[str, Any]
class RExpression(NamedTuple):
- """
- R expression.
- """
+ """R expression."""
+
elements: List[RLanguage]
+@dataclass
+class RBuiltin:
+ """R builtin."""
+
+ name: str
+
+
+@dataclass
+class RFunction:
+ """R function."""
+
+ environment: Mapping[str, Any]
+ formals: Optional[Mapping[str, Any]]
+ body: RLanguage
+ attributes: StrMap
+
+ @property
+ def source(self) -> str:
+ return "\n".join(self.attributes["srcref"].srcfile.lines)
+
+
+@dataclass
+class RExternalPointer:
+ """R bytecode."""
+
+ protected: Any
+ tag: Any
+
+
+@dataclass
+class RBytecode:
+ """R bytecode."""
+
+ code: xarray.DataArray
+ constants: Sequence[Any]
+ attributes: StrMap
+
+
+class REnvironment(ChainMap[Union[str, bytes], Any]):
+ """R environment."""
+
+ def __init__(
+ self,
+ *maps: MutableMapping[str | bytes, Any],
+ frame: StrMap | None = None,
+ ) -> None:
+ super().__init__(*maps)
+ self.frame = frame
+
+
def convert_list(
r_list: parser.RObject,
- conversion_function: Callable[
- [Union[parser.RData, parser.RObject]
- ], Any]=lambda x: x
-) -> Union[Mapping[Union[str, bytes], Any], List[Any]]:
+ conversion_function: ConversionFunction,
+) -> Union[StrMap, List[Any]]:
"""
Expand a tagged R pairlist to a Python dictionary.
@@ -68,8 +121,10 @@ def convert_list(
"""
if r_list.info.type is parser.RObjectType.NILVALUE:
return {}
- elif r_list.info.type not in [parser.RObjectType.LIST,
- parser.RObjectType.LANG]:
+ elif r_list.info.type not in {
+ parser.RObjectType.LIST,
+ parser.RObjectType.LANG,
+ }:
raise TypeError("Must receive a LIST, LANG or NILVALUE object")
if r_list.tag is None:
@@ -84,20 +139,18 @@ def convert_list(
cdr = {}
return {tag: conversion_function(r_list.value[0]), **cdr}
- else:
- if cdr is None:
- cdr = []
- return [conversion_function(r_list.value[0]), *cdr]
+ if cdr is None:
+ cdr = []
+
+ return [conversion_function(r_list.value[0]), *cdr]
def convert_env(
r_env: parser.RObject,
- conversion_function: Callable[
- [Union[parser.RData, parser.RObject]
- ], Any]=lambda x: x
-) -> ChainMap[Union[str, bytes], Any]:
-
+ conversion_function: ConversionFunction,
+) -> REnvironment:
+ """Convert environment objects."""
if r_env.info.type is not parser.RObjectType.ENV:
raise TypeError("Must receive a ENV object")
@@ -106,19 +159,18 @@ def convert_env(
hash_table = conversion_function(r_env.value.hash_table)
dictionary = {}
- for d in hash_table:
- if d is not None:
- dictionary.update(d)
+ if hash_table is not None:
+ for d in hash_table:
+ if d is not None:
+ dictionary.update(d)
- return ChainMap(dictionary, enclosure)
+ return REnvironment(dictionary, enclosure, frame=frame)
def convert_attrs(
r_obj: parser.RObject,
- conversion_function: Callable[
- [Union[parser.RData, parser.RObject]
- ], Any]=lambda x: x
-) -> Mapping[Union[str, bytes], Any]:
+ conversion_function: ConversionFunction,
+) -> StrMap:
"""
Return the attributes of an object as a Python dictionary.
@@ -143,7 +195,7 @@ def convert_attrs(
"""
if r_obj.attributes:
attrs = cast(
- Mapping[Union[str, bytes], Any],
+ StrMap,
conversion_function(r_obj.attributes),
)
else:
@@ -153,10 +205,9 @@ def convert_attrs(
def convert_vector(
r_vec: parser.RObject,
- conversion_function: Callable[
- [Union[parser.RData, parser.RObject]], Any]=lambda x: x,
- attrs: Optional[Mapping[Union[str, bytes], Any]] = None,
-) -> Union[List[Any], Mapping[Union[str, bytes], Any]]:
+ conversion_function: ConversionFunction,
+ attrs: Optional[StrMap] = None,
+) -> Union[List[Any], StrMap]:
"""
Convert a R vector to a Python list or dictionary.
@@ -186,11 +237,13 @@ def convert_vector(
if attrs is None:
attrs = {}
- if r_vec.info.type not in [parser.RObjectType.VEC,
- parser.RObjectType.EXPR]:
+ if r_vec.info.type not in {
+ parser.RObjectType.VEC,
+ parser.RObjectType.EXPR,
+ }:
raise TypeError("Must receive a VEC or EXPR object")
- value: Union[List[Any], Mapping[Union[str, bytes], Any]] = [
+ value: Union[List[Any], StrMap] = [
conversion_function(o) for o in r_vec.value
]
@@ -203,9 +256,7 @@ def convert_vector(
def safe_decode(byte_str: bytes, encoding: str) -> Union[str, bytes]:
- """
- Decode a (possibly malformed) string.
- """
+ """Decode a (possibly malformed) string."""
try:
return byte_str.decode(encoding)
except UnicodeDecodeError as e:
@@ -250,29 +301,37 @@ def convert_char(
assert isinstance(r_char.value, bytes)
+ encoding = None
+
if not force_default_encoding:
if r_char.info.gp & parser.CharFlags.UTF8:
- return safe_decode(r_char.value, "utf_8")
+ encoding = "utf_8"
elif r_char.info.gp & parser.CharFlags.LATIN1:
- return safe_decode(r_char.value, "latin_1")
+ encoding = "latin_1"
elif r_char.info.gp & parser.CharFlags.ASCII:
- return safe_decode(r_char.value, "ascii")
+ encoding = "ascii"
elif r_char.info.gp & parser.CharFlags.BYTES:
- return r_char.value
+ encoding = "bytes"
- if default_encoding:
- return safe_decode(r_char.value, default_encoding)
- else:
- # Assume ASCII if no encoding is marked
- warnings.warn(f"Unknown encoding. Assumed ASCII.")
- return safe_decode(r_char.value, "ascii")
+ if encoding is None:
+ if default_encoding:
+ encoding = default_encoding
+ else:
+ # Assume ASCII if no encoding is marked
+ warnings.warn("Unknown encoding. Assumed ASCII.")
+ encoding = "ascii"
+ return (
+ r_char.value
+ if encoding == "bytes"
+ else safe_decode(r_char.value, encoding)
+ )
-def convert_symbol(r_symbol: parser.RObject,
- conversion_function: Callable[
- [Union[parser.RData, parser.RObject]],
- Any]=lambda x: x
- ) -> Union[str, bytes]:
+
+def convert_symbol(
+ r_symbol: parser.RObject,
+ conversion_function: ConversionFunction,
+) -> Union[str, bytes]:
"""
Decode a R symbol to a Python string or bytes.
@@ -298,16 +357,14 @@ def convert_symbol(r_symbol: parser.RObject,
symbol = conversion_function(r_symbol.value)
assert isinstance(symbol, (str, bytes))
return symbol
- else:
- raise TypeError("Must receive a SYM object")
+
+ raise TypeError("Must receive a SYM object")
def convert_array(
r_array: RObject,
- conversion_function: Callable[
- [Union[parser.RData, parser.RObject]
- ], Any]=lambda x: x,
- attrs: Optional[Mapping[Union[str, bytes], Any]] = None,
+ conversion_function: ConversionFunction,
+ attrs: Optional[StrMap] = None,
) -> Union[np.ndarray, xarray.DataArray]:
"""
Convert a R array to a Numpy ndarray or a Xarray DataArray.
@@ -336,10 +393,12 @@ def convert_array(
if attrs is None:
attrs = {}
- if r_array.info.type not in {parser.RObjectType.LGL,
- parser.RObjectType.INT,
- parser.RObjectType.REAL,
- parser.RObjectType.CPLX}:
+ if r_array.info.type not in {
+ parser.RObjectType.LGL,
+ parser.RObjectType.INT,
+ parser.RObjectType.REAL,
+ parser.RObjectType.CPLX,
+ }:
raise TypeError("Must receive an array object")
value = r_array.value
@@ -349,28 +408,52 @@ def convert_array(
# R matrix order is like FORTRAN
value = np.reshape(value, shape, order='F')
+ dimension_names = None
+ coords = None
+
dimnames = attrs.get('dimnames')
if dimnames:
- dimension_names = ["dim_" + str(i) for i, _ in enumerate(dimnames)]
- coords: Mapping[Hashable, Any] = {
- dimension_names[i]: d
- for i, d in enumerate(dimnames) if d is not None}
-
- value = xarray.DataArray(value, dims=dimension_names, coords=coords)
+ if isinstance(dimnames, Mapping):
+ dimension_names = list(dimnames.keys())
+ coords = dimnames
+ else:
+ dimension_names = [f"dim_{i}" for i, _ in enumerate(dimnames)]
+ coords = {
+ dimension_names[i]: d
+ for i, d in enumerate(dimnames)
+ if d is not None
+ }
+
+ value = xarray.DataArray(
+ value,
+ dims=dimension_names,
+ coords=coords,
+ )
return value
def dataframe_constructor(
obj: Any,
- attrs: Mapping[Union[str, bytes], Any],
+ attrs: StrMap,
) -> pandas.DataFrame:
- return pandas.DataFrame(obj, columns=obj)
+
+ row_names = attrs["row.names"]
+
+ # Default row names are stored as [INT_MIN, -len]
+ INT_MIN = -2**31 # noqa: WPS432
+ index = (
+ pandas.RangeIndex(1, abs(row_names[1]) + 1)
+ if len(row_names) == 2 and row_names[0] == INT_MIN
+ else tuple(row_names)
+ )
+
+ return pandas.DataFrame(obj, columns=obj, index=index)
def _factor_constructor_internal(
obj: Any,
- attrs: Mapping[Union[str, bytes], Any],
+ attrs: StrMap,
ordered: bool,
) -> pandas.Categorical:
values = [attrs['levels'][i - 1] if i >= 0 else None for i in obj]
@@ -380,23 +463,25 @@ def _factor_constructor_internal(
def factor_constructor(
obj: Any,
- attrs: Mapping[Union[str, bytes], Any],
+ attrs: StrMap,
) -> pandas.Categorical:
+ """Construct a factor objects."""
return _factor_constructor_internal(obj, attrs, ordered=False)
def ordered_constructor(
obj: Any,
- attrs: Mapping[Union[str, bytes], Any],
+ attrs: StrMap,
) -> pandas.Categorical:
+ """Contruct an ordered factor."""
return _factor_constructor_internal(obj, attrs, ordered=True)
def ts_constructor(
obj: Any,
- attrs: Mapping[Union[str, bytes], Any],
+ attrs: StrMap,
) -> pandas.Series:
-
+ """Construct a time series object."""
start, end, frequency = attrs['tsp']
frequency = int(frequency)
@@ -404,8 +489,11 @@ def ts_constructor(
real_start = Fraction(int(round(start * frequency)), frequency)
real_end = Fraction(int(round(end * frequency)), frequency)
- index = np.arange(real_start, real_end + Fraction(1, frequency),
- Fraction(1, frequency))
+ index = np.arange(
+ real_start,
+ real_end + Fraction(1, frequency),
+ Fraction(1, frequency),
+ )
if frequency == 1:
index = index.astype(int)
@@ -413,13 +501,86 @@ def ts_constructor(
return pandas.Series(obj, index=index)
+@dataclass
+class SrcRef:
+ first_line: int
+ first_byte: int
+ last_line: int
+ last_byte: int
+ first_column: int
+ last_column: int
+ first_parsed: int
+ last_parsed: int
+ srcfile: SrcFile
+
+
+def srcref_constructor(
+ obj: Any,
+ attrs: StrMap,
+) -> SrcRef:
+ return SrcRef(*obj, srcfile=attrs["srcfile"])
+
+
+@dataclass
+class SrcFile:
+ filename: str
+ file_encoding: str | None
+ string_encoding: str | None
+
+
+def srcfile_constructor(
+ obj: Any,
+ attrs: StrMap,
+) -> SrcFile:
+
+ filename = obj.frame["filename"][0]
+ file_encoding = obj.frame.get("encoding")
+ string_encoding = obj.frame.get("Enc")
+
+ return SrcFile(
+ filename=filename,
+ file_encoding=file_encoding,
+ string_encoding=string_encoding,
+ )
+
+
+@dataclass
+class SrcFileCopy(SrcFile):
+ lines: Sequence[str]
+
+
+def srcfilecopy_constructor(
+ obj: Any,
+ attrs: StrMap,
+) -> SrcFile:
+
+ filename = obj.frame["filename"][0]
+ file_encoding = obj.frame.get("encoding", (None,))[0]
+ string_encoding = obj.frame.get("Enc", (None,))[0]
+ lines = obj.frame["lines"]
+
+ return SrcFileCopy(
+ filename=filename,
+ file_encoding=file_encoding,
+ string_encoding=string_encoding,
+ lines=lines,
+ )
+
+
Constructor = Callable[[Any, Mapping], Any]
+ConstructorDict = Mapping[
+ Union[str, bytes],
+ Constructor,
+]
default_class_map_dict: Mapping[Union[str, bytes], Constructor] = {
"data.frame": dataframe_constructor,
"factor": factor_constructor,
"ordered": ordered_constructor,
"ts": ts_constructor,
+ "srcref": srcref_constructor,
+ "srcfile": srcfile_constructor,
+ "srcfilecopy": srcfilecopy_constructor,
}
DEFAULT_CLASS_MAP = MappingProxyType(default_class_map_dict)
@@ -440,15 +601,11 @@ It has support for converting several commonly used R classes:
class Converter(abc.ABC):
- """
- Interface of a class converting R objects in Python objects.
- """
+ """Interface of a class converting R objects in Python objects."""
@abc.abstractmethod
def convert(self, data: Union[parser.RData, parser.RObject]) -> Any:
- """
- Convert a R object to a Python one.
- """
+ """Convert a R object to a Python one."""
pass
@@ -480,23 +637,20 @@ class SimpleConverter(Converter):
def __init__(
self,
- constructor_dict: Mapping[
- Union[str, bytes],
- Constructor,
- ] = DEFAULT_CLASS_MAP,
+ constructor_dict: ConstructorDict = DEFAULT_CLASS_MAP,
default_encoding: Optional[str] = None,
force_default_encoding: bool = False,
- global_environment: Optional[Mapping[Union[str, bytes], Any]] = None,
+ global_environment: MutableMapping[str | bytes, Any] | None = None,
) -> None:
self.constructor_dict = constructor_dict
self.default_encoding = default_encoding
self.force_default_encoding = force_default_encoding
- self.global_environment = ChainMap(
+ self.global_environment = REnvironment(
{} if global_environment is None
- else global_environment
+ else global_environment,
)
- self.empty_environment: Mapping[Union[str, bytes], Any] = ChainMap({})
+ self.empty_environment: StrMap = REnvironment({})
self._reset()
@@ -504,15 +658,15 @@ class SimpleConverter(Converter):
self.references: MutableMapping[int, Any] = {}
self.default_encoding_used = self.default_encoding
- def convert(self, data: Union[parser.RData, parser.RObject]) -> Any:
+ def convert( # noqa: D102
+ self,
+ data: Union[parser.RData, parser.RObject],
+ ) -> Any:
self._reset()
return self._convert_next(data)
def _convert_next(self, data: Union[parser.RData, parser.RObject]) -> Any:
- """
- Convert a R object to a Python one.
- """
-
+ """Convert a R object to a Python one."""
obj: RObject
if isinstance(data, parser.RData):
obj = data.object
@@ -540,6 +694,20 @@ class SimpleConverter(Converter):
# Expand the list and process the elements
value = convert_list(obj, self._convert_next)
+ elif obj.info.type == parser.RObjectType.CLO:
+ assert obj.tag is not None
+ environment = self._convert_next(obj.tag)
+ formals = self._convert_next(obj.value[0])
+ body = self._convert_next(obj.value[1])
+ attributes = self._convert_next(obj.attributes)
+
+ value = RFunction(
+ environment=environment,
+ formals=formals,
+ body=body,
+ attributes=attributes,
+ )
+
elif obj.info.type == parser.RObjectType.ENV:
# Return a ChainMap of the environments
@@ -551,8 +719,15 @@ class SimpleConverter(Converter):
# special object
rlanguage_list = convert_list(obj, self._convert_next)
assert isinstance(rlanguage_list, list)
+ attributes = self._convert_next(
+ obj.attributes,
+ ) if obj.attributes else {}
+
+ value = RLanguage(rlanguage_list, attributes)
- value = RLanguage(rlanguage_list)
+ elif obj.info.type in {parser.RObjectType.SPECIAL, parser.RObjectType.BUILTIN}:
+
+ value = RBuiltin(name=obj.value.decode("ascii"))
elif obj.info.type == parser.RObjectType.CHAR:
@@ -563,10 +738,12 @@ class SimpleConverter(Converter):
force_default_encoding=self.force_default_encoding,
)
- elif obj.info.type in {parser.RObjectType.LGL,
- parser.RObjectType.INT,
- parser.RObjectType.REAL,
- parser.RObjectType.CPLX}:
+ elif obj.info.type in {
+ parser.RObjectType.LGL,
+ parser.RObjectType.INT,
+ parser.RObjectType.REAL,
+ parser.RObjectType.CPLX,
+ }:
# Return the internal array
value = convert_array(obj, self._convert_next, attrs=attrs)
@@ -583,18 +760,39 @@ class SimpleConverter(Converter):
elif obj.info.type == parser.RObjectType.EXPR:
rexpression_list = convert_vector(
- obj, self._convert_next, attrs=attrs)
+ obj,
+ self._convert_next,
+ attrs=attrs,
+ )
assert isinstance(rexpression_list, list)
# Convert the internal objects returning a special object
value = RExpression(rexpression_list)
+ elif obj.info.type == parser.RObjectType.BCODE:
+
+ value = RBytecode(
+ code=self._convert_next(obj.value[0]),
+ constants=[self._convert_next(c) for c in obj.value[1]],
+ attributes=attrs,
+ )
+
+ elif obj.info.type == parser.RObjectType.EXTPTR:
+
+ value = RExternalPointer(
+ protected=self._convert_next(obj.value[0]),
+ tag=self._convert_next(obj.value[1]),
+ )
+
elif obj.info.type == parser.RObjectType.S4:
value = SimpleNamespace(**attrs)
elif obj.info.type == parser.RObjectType.EMPTYENV:
value = self.empty_environment
+ elif obj.info.type == parser.RObjectType.MISSINGARG:
+ value = NotImplemented
+
elif obj.info.type == parser.RObjectType.GLOBALENV:
value = self.global_environment
@@ -602,7 +800,6 @@ class SimpleConverter(Converter):
# Return the referenced value
value = self.references.get(id(obj.referenced_object))
- # value = self.references[id(obj.referenced_object)]
if value is None:
reference_id = id(obj.referenced_object)
assert obj.referenced_object is not None
@@ -615,8 +812,8 @@ class SimpleConverter(Converter):
else:
raise NotImplementedError(f"Type {obj.info.type} not implemented")
- if obj.info.object:
- classname = attrs["class"]
+ if obj.info.object and attrs is not None:
+ classname = attrs.get("class", ())
for i, c in enumerate(classname):
constructor = self.constructor_dict.get(c, None)
@@ -627,20 +824,26 @@ class SimpleConverter(Converter):
new_value = NotImplemented
if new_value is NotImplemented:
- missing_msg = (f"Missing constructor for R class "
- f"\"{c}\". ")
+ missing_msg = (
+ f"Missing constructor for R class \"{c}\". "
+ )
if len(classname) > (i + 1):
- solution_msg = (f"The constructor for class "
- f"\"{classname[i+1]}\" will be "
- f"used instead."
- )
+ solution_msg = (
+ f"The constructor for class "
+ f"\"{classname[i+1]}\" will be "
+ f"used instead."
+ )
else:
- solution_msg = ("The underlying R object is "
- "returned instead.")
-
- warnings.warn(missing_msg + solution_msg,
- stacklevel=1)
+ solution_msg = (
+ "The underlying R object is "
+ "returned instead."
+ )
+
+ warnings.warn(
+ missing_msg + solution_msg,
+ stacklevel=1,
+ )
else:
value = new_value
break
@@ -656,10 +859,9 @@ def convert(
**kwargs: Any,
) -> Any:
"""
- Uses the default converter (:func:`SimpleConverter`) to convert the data.
+ Use the default converter (:func:`SimpleConverter`) to convert the data.
Examples:
-
Parse one of the included examples, containing a vector
>>> import rdata
@@ -679,9 +881,9 @@ def convert(
>>> converted = rdata.conversion.convert(parsed)
>>> converted
{'test_dataframe': class value
- 0 a 1
- 1 b 2
- 2 b 3}
+ 1 a 1
+ 2 b 2
+ 3 b 3}
"""
return SimpleConverter(*args, **kwargs).convert(data)
diff --git a/rdata/parser/__init__.py b/rdata/parser/__init__.py
index 720979e..8af47f3 100644
--- a/rdata/parser/__init__.py
+++ b/rdata/parser/__init__.py
@@ -1,10 +1,12 @@
+"""Utilities for parsing a rdata file."""
+
from ._parser import (
- DEFAULT_ALTREP_MAP,
- CharFlags,
- RData,
- RObject,
- RObjectInfo,
- RObjectType,
- parse_data,
- parse_file,
+ DEFAULT_ALTREP_MAP as DEFAULT_ALTREP_MAP,
+ CharFlags as CharFlags,
+ RData as RData,
+ RObject as RObject,
+ RObjectInfo as RObjectInfo,
+ RObjectType as RObjectType,
+ parse_data as parse_data,
+ parse_file as parse_file,
)
diff --git a/rdata/parser/_parser.py b/rdata/parser/_parser.py
index df2009c..dec98b4 100644
--- a/rdata/parser/_parser.py
+++ b/rdata/parser/_parser.py
@@ -12,12 +12,14 @@ import xdrlib
from dataclasses import dataclass
from types import MappingProxyType
from typing import (
+ TYPE_CHECKING,
Any,
BinaryIO,
Callable,
List,
Mapping,
Optional,
+ Sequence,
Set,
TextIO,
Tuple,
@@ -28,9 +30,8 @@ import numpy as np
class FileTypes(enum.Enum):
- """
- Type of file containing a R file.
- """
+ """Type of file containing a R file."""
+
bzip2 = "bz2"
gzip = "gzip"
xz = "xz"
@@ -43,15 +44,12 @@ magic_dict = {
FileTypes.gzip: b"\x1f\x8b",
FileTypes.xz: b"\xFD7zXZ\x00",
FileTypes.rdata_binary_v2: b"RDX2\n",
- FileTypes.rdata_binary_v3: b"RDX3\n"
+ FileTypes.rdata_binary_v3: b"RDX3\n",
}
def file_type(data: memoryview) -> Optional[FileTypes]:
- """
- Returns the type of the file.
- """
-
+ """Return the type of the file."""
for filetype, magic in magic_dict.items():
if data[:len(magic)] == magic:
return filetype
@@ -59,9 +57,8 @@ def file_type(data: memoryview) -> Optional[FileTypes]:
class RdataFormats(enum.Enum):
- """
- Format of a R file.
- """
+ """Format of a R file."""
+
XDR = "XDR"
ASCII = "ASCII"
binary = "binary"
@@ -75,10 +72,7 @@ format_dict = {
def rdata_format(data: memoryview) -> Optional[RdataFormats]:
- """
- Returns the format of the data.
- """
-
+ """Return the format of the data."""
for format_type, magic in format_dict.items():
if data[:len(magic)] == magic:
return format_type
@@ -86,9 +80,8 @@ def rdata_format(data: memoryview) -> Optional[RdataFormats]:
class RObjectType(enum.Enum):
- """
- Type of a R object.
- """
+ """Type of a R object."""
+
NIL = 0 # NULL
SYM = 1 # symbols
LIST = 2 # pairlists
@@ -114,13 +107,31 @@ class RObjectType(enum.Enum):
RAW = 24 # raw vector
S4 = 25 # S4 classes not of simple type
ALTREP = 238 # Alternative representations
+ ATTRLIST = 239 # Bytecode attribute
+ ATTRLANG = 240 # Bytecode attribute
EMPTYENV = 242 # Empty environment
+ BCREPREF = 243 # Bytecode repetition reference
+ BCREPDEF = 244 # Bytecode repetition definition
+ MISSINGARG = 251 # Missinf argument
GLOBALENV = 253 # Global environment
NILVALUE = 254 # NIL value
REF = 255 # Reference
+BYTECODE_SPECIAL_SET = {
+ RObjectType.BCODE,
+ RObjectType.BCREPREF,
+ RObjectType.BCREPDEF,
+ RObjectType.LANG,
+ RObjectType.LIST,
+ RObjectType.ATTRLANG,
+ RObjectType.ATTRLIST,
+}
+
+
class CharFlags(enum.IntFlag):
+ """Flags for R objects of type char."""
+
HAS_HASH = 1
BYTES = 1 << 1
LATIN1 = 1 << 2
@@ -131,10 +142,9 @@ class CharFlags(enum.IntFlag):
@dataclass
class RVersions():
- """
- R versions.
- """
- format: int
+ """R versions."""
+
+ format: int # noqa: E701
serialized: int
minimum: int
@@ -145,15 +155,16 @@ class RExtraInfo():
Extra information.
Contains the default encoding (only in version 3).
+
"""
+
encoding: Optional[str] = None
@dataclass
class RObjectInfo():
- """
- Internal attributes of a R object.
- """
+ """Internal attributes of a R object."""
+
type: RObjectType
object: bool
attributes: bool
@@ -162,90 +173,134 @@ class RObjectInfo():
reference: int
+def _str_internal(
+ obj: RObject | Sequence[RObject],
+ indent: int = 0,
+ used_references: Optional[Set[int]] = None,
+) -> str:
+
+ if used_references is None:
+ used_references = set()
+
+ small_indent = indent + 2
+ big_indent = indent + 4
+
+ indent_spaces = ' ' * indent
+ small_indent_spaces = ' ' * small_indent
+ big_indent_spaces = ' ' * big_indent
+
+ string = ""
+
+ if isinstance(obj, Sequence):
+ string += f"{indent_spaces}[\n"
+ for elem in obj:
+ string += _str_internal(
+ elem,
+ big_indent,
+ used_references.copy(),
+ )
+ string += f"{indent_spaces}]\n"
+
+ return string
+
+ string += f"{indent_spaces}{obj.info.type}\n"
+
+ if obj.tag:
+ tag_string = _str_internal(
+ obj.tag,
+ big_indent,
+ used_references.copy(),
+ )
+ string += f"{small_indent_spaces}tag:\n{tag_string}\n"
+
+ if obj.info.reference:
+ assert obj.referenced_object
+ reference_string = (
+ f"{big_indent_spaces}..."
+ if obj.info.reference in used_references
+ else _str_internal(
+ obj.referenced_object,
+ indent + 4, used_references.copy())
+ )
+ string += (
+ f"{small_indent_spaces}reference: "
+ f"{obj.info.reference}\n{reference_string}\n"
+ )
+
+ string += f"{small_indent_spaces}value:\n"
+
+ if isinstance(obj.value, RObject):
+ string += _str_internal(
+ obj.value,
+ big_indent,
+ used_references.copy(),
+ )
+ elif isinstance(obj.value, (tuple, list)):
+ for elem in obj.value:
+ string += _str_internal(
+ elem,
+ big_indent,
+ used_references.copy(),
+ )
+ elif isinstance(obj.value, np.ndarray):
+ string += big_indent_spaces
+ if len(obj.value) > 4:
+ string += (
+ f"[{obj.value[0]}, {obj.value[1]} ... "
+ f"{obj.value[-2]}, {obj.value[-1]}]\n"
+ )
+ else:
+ string += f"{obj.value}\n"
+ else:
+ string += f"{big_indent_spaces}{obj.value}\n"
+
+ if obj.attributes:
+ attr_string = _str_internal(
+ obj.attributes,
+ big_indent,
+ used_references.copy(),
+ )
+ string += f"{small_indent_spaces}attributes:\n{attr_string}\n"
+
+ return string
+
+
@dataclass
class RObject():
- """
- Representation of a R object.
- """
+ """Representation of a R object."""
+
info: RObjectInfo
value: Any
attributes: Optional[RObject]
tag: Optional[RObject] = None
referenced_object: Optional[RObject] = None
- def _str_internal(
- self,
- indent: int = 0,
- used_references: Optional[Set[int]] = None
- ) -> str:
-
- if used_references is None:
- used_references = set()
-
- string = ""
-
- string += f"{' ' * indent}{self.info.type}\n"
-
- if self.tag:
- tag_string = self.tag._str_internal(indent + 4,
- used_references.copy())
- string += f"{' ' * (indent + 2)}tag:\n{tag_string}\n"
-
- if self.info.reference:
- assert self.referenced_object
- reference_string = (f"{' ' * (indent + 4)}..."
- if self.info.reference in used_references
- else self.referenced_object._str_internal(
- indent + 4, used_references.copy()))
- string += (f"{' ' * (indent + 2)}reference: "
- f"{self.info.reference}\n{reference_string}\n")
-
- string += f"{' ' * (indent + 2)}value:\n"
-
- if isinstance(self.value, RObject):
- string += self.value._str_internal(indent + 4,
- used_references.copy())
- elif isinstance(self.value, tuple) or isinstance(self.value, list):
- for elem in self.value:
- string += elem._str_internal(indent + 4,
- used_references.copy())
- elif isinstance(self.value, np.ndarray):
- string += " " * (indent + 4)
- if len(self.value) > 4:
- string += (f"[{self.value[0]}, {self.value[1]} ... "
- f"{self.value[-2]}, {self.value[-1]}]\n")
- else:
- string += f"{self.value}\n"
- else:
- string += f"{' ' * (indent + 4)}{self.value}\n"
-
- if(self.attributes):
- attr_string = self.attributes._str_internal(
- indent + 4,
- used_references.copy())
- string += f"{' ' * (indent + 2)}attributes:\n{attr_string}\n"
-
- return string
-
def __str__(self) -> str:
- return self._str_internal()
+ return _str_internal(self)
@dataclass
class RData():
- """
- Data contained in a R file.
- """
+ """Data contained in a R file."""
+
versions: RVersions
extra: RExtraInfo
object: RObject
+ def __str__(self) -> str:
+ return (
+ "RData(\n"
+ f" versions: {self.versions}\n"
+ f" extra: {self.extra}\n"
+ f" object: \n{_str_internal(self.object, indent=4)}\n"
+ ")\n"
+ )
+
@dataclass
class EnvironmentValue():
- """
- Value of an environment.
- """
+ """Value of an environment."""
+
locked: bool
enclosure: RObject
frame: RObject
@@ -260,11 +315,12 @@ AltRepConstructorMap = Mapping[bytes, AltRepConstructor]
def format_float_with_scipen(number: float, scipen: int) -> bytes:
+ """Format a floating point value as in R."""
fixed = np.format_float_positional(number, trim="-")
scientific = np.format_float_scientific(number, trim="-")
- assert(isinstance(fixed, str))
- assert(isinstance(scientific, str))
+ assert isinstance(fixed, str)
+ assert isinstance(scientific, str)
return (
scientific if len(fixed) - len(scientific) > scipen
@@ -275,7 +331,7 @@ def format_float_with_scipen(number: float, scipen: int) -> bytes:
def deferred_string_constructor(
state: RObject,
) -> Tuple[RObjectInfo, Any]:
-
+ """Expand a deferred string ALTREP."""
new_info = RObjectInfo(
type=RObjectType.STR,
object=False,
@@ -312,9 +368,9 @@ def deferred_string_constructor(
def compact_seq_constructor(
state: RObject,
*,
- is_int: bool = False
+ is_int: bool = False,
) -> Tuple[RObjectInfo, Any]:
-
+ """Expand a compact_seq ALTREP."""
new_info = RObjectInfo(
type=RObjectType.INT if is_int else RObjectType.REAL,
object=False,
@@ -341,19 +397,21 @@ def compact_seq_constructor(
def compact_intseq_constructor(
state: RObject,
) -> Tuple[RObjectInfo, Any]:
+ """Expand a compact_intseq ALTREP."""
return compact_seq_constructor(state, is_int=True)
def compact_realseq_constructor(
state: RObject,
) -> Tuple[RObjectInfo, Any]:
+ """Expand a compact_realseq ALTREP."""
return compact_seq_constructor(state, is_int=False)
def wrap_constructor(
state: RObject,
) -> Tuple[RObjectInfo, Any]:
-
+ """Expand any wrap_* ALTREP."""
new_info = RObjectInfo(
type=state.value[0].info.type,
object=False,
@@ -384,9 +442,7 @@ DEFAULT_ALTREP_MAP = MappingProxyType(default_altrep_map_dict)
class Parser(abc.ABC):
- """
- Parser interface for a R file.
- """
+ """Parser interface for a R file."""
def __init__(
self,
@@ -398,43 +454,30 @@ class Parser(abc.ABC):
self.altrep_constructor_dict = altrep_constructor_dict
def parse_bool(self) -> bool:
- """
- Parse a boolean.
- """
+ """Parse a boolean."""
return bool(self.parse_int())
@abc.abstractmethod
def parse_int(self) -> int:
- """
- Parse an integer.
- """
+ """Parse an integer."""
pass
@abc.abstractmethod
def parse_double(self) -> float:
- """
- Parse a double.
- """
+ """Parse a double."""
pass
def parse_complex(self) -> complex:
- """
- Parse a complex number.
- """
+ """Parse a complex number."""
return complex(self.parse_double(), self.parse_double())
@abc.abstractmethod
def parse_string(self, length: int) -> bytes:
- """
- Parse a string.
- """
+ """Parse a string."""
pass
def parse_all(self) -> RData:
- """
- Parse all the file.
- """
-
+ """Parse all the file."""
versions = self.parse_versions()
extra_info = self.parse_extra_info(versions)
obj = self.parse_R_object()
@@ -442,15 +485,12 @@ class Parser(abc.ABC):
return RData(versions, extra_info, obj)
def parse_versions(self) -> RVersions:
- """
- Parse the versions header.
- """
-
+ """Parse the versions header."""
format_version = self.parse_int()
r_version = self.parse_int()
minimum_r_version = self.parse_int()
- if format_version not in [2, 3]:
+ if format_version not in {2, 3}:
raise NotImplementedError(
f"Format version {format_version} unsupported",
)
@@ -459,18 +499,18 @@ class Parser(abc.ABC):
def parse_extra_info(self, versions: RVersions) -> RExtraInfo:
"""
- Parse the versions header.
- """
+ Parse the extra info.
+
+ Parses de encoding in version 3 format.
+ """
encoding = None
if versions.format >= 3:
encoding_len = self.parse_int()
encoding = self.parse_string(encoding_len).decode("ASCII")
- extra_info = RExtraInfo(encoding)
-
- return extra_info
+ return RExtraInfo(encoding)
def expand_altrep_to_object(
self,
@@ -478,7 +518,6 @@ class Parser(abc.ABC):
state: RObject,
) -> Tuple[RObjectInfo, Any]:
"""Expand alternative representation to normal object."""
-
assert info.info.type == RObjectType.LIST
class_sym = info.value[0]
@@ -494,26 +533,75 @@ class Parser(abc.ABC):
constructor = self.altrep_constructor_dict[altrep_name]
return constructor(state)
- def parse_R_object(
+ def _parse_bytecode_constant(
self,
- reference_list: Optional[List[RObject]] = None
+ reference_list: Optional[List[RObject]],
+ bytecode_rep_list: List[RObject | None] | None = None,
) -> RObject:
- """
- Parse a R object.
- """
+ obj_type = self.parse_int()
+
+ return self.parse_R_object(
+ reference_list,
+ bytecode_rep_list,
+ info_int=obj_type,
+ )
+
+ def _parse_bytecode(
+ self,
+ reference_list: Optional[List[RObject]],
+ bytecode_rep_list: List[RObject | None] | None = None,
+ ) -> Tuple[RObject, Sequence[RObject]]:
+ """Parse R bytecode."""
+ if bytecode_rep_list is None:
+ n_repeated = self.parse_int()
+
+ code = self.parse_R_object(reference_list, bytecode_rep_list)
+
+ if bytecode_rep_list is None:
+ bytecode_rep_list = [None] * n_repeated
+
+ n_constants = self.parse_int()
+ constants = [
+ self._parse_bytecode_constant(
+ reference_list,
+ bytecode_rep_list,
+ )
+ for _ in range(n_constants)
+ ]
+
+ return (code, constants)
+
+ def parse_R_object(
+ self,
+ reference_list: List[RObject] | None = None,
+ bytecode_rep_list: List[RObject | None] | None = None,
+ info_int: int | None = None,
+ ) -> RObject:
+ """Parse a R object."""
if reference_list is None:
# Index is 1-based, so we insert a dummy object
reference_list = []
- info_int = self.parse_int()
-
- info = parse_r_object_info(info_int)
+ original_info_int = info_int
+ if (
+ info_int is not None
+ and RObjectType(info_int) in BYTECODE_SPECIAL_SET
+ ):
+ info = parse_r_object_info(info_int)
+ info.tag = info.type not in {
+ RObjectType.BCREPREF,
+ RObjectType.BCODE,
+ }
+ else:
+ info_int = self.parse_int()
+ info = parse_r_object_info(info_int)
tag = None
attributes = None
referenced_object = None
+ bytecode_rep_position = -1
tag_read = False
attributes_read = False
add_reference = False
@@ -522,30 +610,66 @@ class Parser(abc.ABC):
value: Any
+ if info.type == RObjectType.BCREPDEF:
+ assert bytecode_rep_list
+ bytecode_rep_position = self.parse_int()
+ info.type = RObjectType(self.parse_int())
+
if info.type == RObjectType.NIL:
value = None
elif info.type == RObjectType.SYM:
# Read Char
- value = self.parse_R_object(reference_list)
+ value = self.parse_R_object(reference_list, bytecode_rep_list)
# Symbols can be referenced
add_reference = True
- elif info.type in [RObjectType.LIST, RObjectType.LANG]:
+ elif info.type in {
+ RObjectType.LIST,
+ RObjectType.LANG,
+ RObjectType.CLO,
+ RObjectType.PROM,
+ RObjectType.DOT,
+ RObjectType.ATTRLANG,
+ }:
+ if info.type is RObjectType.ATTRLANG:
+ info.type = RObjectType.LANG
+ info.attributes = True
+
tag = None
if info.attributes:
- attributes = self.parse_R_object(reference_list)
+ attributes = self.parse_R_object(
+ reference_list,
+ bytecode_rep_list,
+ )
attributes_read = True
- elif info.tag:
- tag = self.parse_R_object(reference_list)
+
+ if info.tag:
+ tag = self.parse_R_object(reference_list, bytecode_rep_list)
tag_read = True
# Read CAR and CDR
- car = self.parse_R_object(reference_list)
- cdr = self.parse_R_object(reference_list)
+ car = self.parse_R_object(
+ reference_list,
+ bytecode_rep_list,
+ info_int=(
+ None if original_info_int is None
+ else self.parse_int()
+ ),
+ )
+ cdr = self.parse_R_object(
+ reference_list,
+ bytecode_rep_list,
+ info_int=(
+ None if original_info_int is None
+ else self.parse_int()
+ ),
+ )
value = (car, cdr)
elif info.type == RObjectType.ENV:
+ info.object = True
+
result = RObject(
info=info,
tag=tag,
@@ -557,10 +681,10 @@ class Parser(abc.ABC):
reference_list.append(result)
locked = self.parse_bool()
- enclosure = self.parse_R_object(reference_list)
- frame = self.parse_R_object(reference_list)
- hash_table = self.parse_R_object(reference_list)
- attributes = self.parse_R_object(reference_list)
+ enclosure = self.parse_R_object(reference_list, bytecode_rep_list)
+ frame = self.parse_R_object(reference_list, bytecode_rep_list)
+ hash_table = self.parse_R_object(reference_list, bytecode_rep_list)
+ attributes = self.parse_R_object(reference_list, bytecode_rep_list)
value = EnvironmentValue(
locked=locked,
@@ -569,6 +693,11 @@ class Parser(abc.ABC):
hash_table=hash_table,
)
+ elif info.type in {RObjectType.SPECIAL, RObjectType.BUILTIN}:
+ length = self.parse_int()
+ if length > 0:
+ value = self.parse_string(length=length)
+
elif info.type == RObjectType.CHAR:
length = self.parse_int()
if length > 0:
@@ -579,7 +708,8 @@ class Parser(abc.ABC):
value = None
else:
raise NotImplementedError(
- f"Length of CHAR cannot be {length}")
+ f"Length of CHAR cannot be {length}",
+ )
elif info.type == RObjectType.LGL:
length = self.parse_int()
@@ -613,22 +743,61 @@ class Parser(abc.ABC):
for i in range(length):
value[i] = self.parse_complex()
- elif info.type in [RObjectType.STR,
- RObjectType.VEC, RObjectType.EXPR]:
+ elif info.type in {
+ RObjectType.STR,
+ RObjectType.VEC,
+ RObjectType.EXPR,
+ }:
length = self.parse_int()
value = [None] * length
for i in range(length):
- value[i] = self.parse_R_object(reference_list)
+ value[i] = self.parse_R_object(
+ reference_list, bytecode_rep_list)
+
+ elif info.type == RObjectType.BCODE:
+ value = self._parse_bytecode(reference_list, bytecode_rep_list)
+ tag_read = True
+
+ elif info.type == RObjectType.EXTPTR:
+
+ result = RObject(
+ info=info,
+ tag=tag,
+ attributes=attributes,
+ value=None,
+ referenced_object=referenced_object,
+ )
+
+ reference_list.append(result)
+ protected = self.parse_R_object(
+ reference_list,
+ bytecode_rep_list,
+ )
+ extptr_tag = self.parse_R_object(
+ reference_list,
+ bytecode_rep_list,
+ )
+
+ value = (protected, extptr_tag)
elif info.type == RObjectType.S4:
value = None
elif info.type == RObjectType.ALTREP:
- altrep_info = self.parse_R_object(reference_list)
- altrep_state = self.parse_R_object(reference_list)
- altrep_attr = self.parse_R_object(reference_list)
+ altrep_info = self.parse_R_object(
+ reference_list,
+ bytecode_rep_list,
+ )
+ altrep_state = self.parse_R_object(
+ reference_list,
+ bytecode_rep_list,
+ )
+ altrep_attr = self.parse_R_object(
+ reference_list,
+ bytecode_rep_list,
+ )
if self.expand_altrep:
info, value = self.expand_altrep_to_object(
@@ -642,6 +811,16 @@ class Parser(abc.ABC):
elif info.type == RObjectType.EMPTYENV:
value = None
+ elif info.type == RObjectType.BCREPREF:
+ assert bytecode_rep_list
+ position = self.parse_int()
+ result = bytecode_rep_list[position]
+ assert result
+ return result
+
+ elif info.type == RObjectType.MISSINGARG:
+ value = None
+
elif info.type == RObjectType.GLOBALENV:
value = None
@@ -657,10 +836,12 @@ class Parser(abc.ABC):
raise NotImplementedError(f"Type {info.type} not implemented")
if info.tag and not tag_read:
- warnings.warn(f"Tag not implemented for type {info.type} "
- "and ignored")
+ warnings.warn(
+ f"Tag not implemented for type {info.type} "
+ "and ignored",
+ )
if info.attributes and not attributes_read:
- attributes = self.parse_R_object(reference_list)
+ attributes = self.parse_R_object(reference_list, bytecode_rep_list)
if result is None:
result = RObject(
@@ -679,13 +860,15 @@ class Parser(abc.ABC):
if add_reference:
reference_list.append(result)
+ if bytecode_rep_position >= 0:
+ assert bytecode_rep_list
+ bytecode_rep_list[bytecode_rep_position] = result
+
return result
class ParserXDR(Parser):
- """
- Parser used when the integers and doubles are in XDR format.
- """
+ """Parser used when the integers and doubles are in XDR format."""
def __init__(
self,
@@ -703,50 +886,55 @@ class ParserXDR(Parser):
self.position = position
self.xdr_parser = xdrlib.Unpacker(data)
- def parse_int(self) -> int:
+ def parse_int(self) -> int: # noqa: D102
self.xdr_parser.set_position(self.position)
result = self.xdr_parser.unpack_int()
self.position = self.xdr_parser.get_position()
return result
- def parse_double(self) -> float:
+ def parse_double(self) -> float: # noqa: D102
self.xdr_parser.set_position(self.position)
result = self.xdr_parser.unpack_double()
self.position = self.xdr_parser.get_position()
return result
- def parse_string(self, length: int) -> bytes:
+ def parse_string(self, length: int) -> bytes: # noqa: D102
result = self.data[self.position:(self.position + length)]
self.position += length
return bytes(result)
+ def parse_all(self) -> RData:
+ rdata = super().parse_all()
+ assert self.position == len(self.data)
+ return rdata
+
def parse_file(
file_or_path: Union[BinaryIO, TextIO, 'os.PathLike[Any]', str],
*,
expand_altrep: bool = True,
altrep_constructor_dict: AltRepConstructorMap = DEFAULT_ALTREP_MAP,
+ extension: str | None = None,
) -> RData:
"""
Parse a R file (.rda or .rdata).
Parameters:
- file_or_path (file-like, str, bytes or path-like): File
- in the R serialization format.
- expand_altrep (bool): Wether to translate ALTREPs to normal objects.
+ file_or_path: File in the R serialization format.
+ expand_altrep: Wether to translate ALTREPs to normal objects.
altrep_constructor_dict: Dictionary mapping each ALTREP to
its constructor.
+ extension: Extension of the file.
Returns:
- RData: Data contained in the file (versions and object).
+ Data contained in the file (versions and object).
See Also:
:func:`parse_data`: Similar function that receives the data directly.
Examples:
-
Parse one of the included examples, containing a vector
>>> import rdata
@@ -809,6 +997,8 @@ def parse_file(
"""
if isinstance(file_or_path, (os.PathLike, str)):
path = pathlib.Path(file_or_path)
+ if extension is None:
+ extension = path.suffix
data = path.read_bytes()
else:
# file is a pre-opened file
@@ -823,6 +1013,7 @@ def parse_file(
data,
expand_altrep=expand_altrep,
altrep_constructor_dict=altrep_constructor_dict,
+ extension=extension,
)
@@ -831,24 +1022,25 @@ def parse_data(
*,
expand_altrep: bool = True,
altrep_constructor_dict: AltRepConstructorMap = DEFAULT_ALTREP_MAP,
+ extension: str | None = None,
) -> RData:
"""
Parse the data of a R file, received as a sequence of bytes.
Parameters:
- data (bytes): Data extracted of a R file.
- expand_altrep (bool): Wether to translate ALTREPs to normal objects.
+ data: Data extracted of a R file.
+ expand_altrep: Wether to translate ALTREPs to normal objects.
altrep_constructor_dict: Dictionary mapping each ALTREP to
its constructor.
+ extension: Extension of the file.
Returns:
- RData: Data contained in the file (versions and object).
+ Data contained in the file (versions and object).
See Also:
:func:`parse_file`: Similar function that parses a file directly.
Examples:
-
Parse one of the included examples, containing a vector
>>> import rdata
@@ -919,6 +1111,7 @@ def parse_data(
if filetype in {
FileTypes.rdata_binary_v2,
FileTypes.rdata_binary_v3,
+ None,
} else parse_data
)
@@ -929,15 +1122,23 @@ def parse_data(
elif filetype is FileTypes.xz:
new_data = lzma.decompress(data)
elif filetype in {FileTypes.rdata_binary_v2, FileTypes.rdata_binary_v3}:
+ if extension == ".rds":
+ warnings.warn(
+ f"Wrong extension {extension} for file in RDATA format",
+ )
+
view = view[len(magic_dict[filetype]):]
new_data = view
else:
- raise NotImplementedError("Unknown file type")
+ new_data = view
+ if extension != ".rds":
+ warnings.warn("Unknown file type: assumed RDS")
return parse_function(
new_data, # type: ignore
expand_altrep=expand_altrep,
altrep_constructor_dict=altrep_constructor_dict,
+ extension=extension,
)
@@ -945,10 +1146,9 @@ def parse_rdata_binary(
data: memoryview,
expand_altrep: bool = True,
altrep_constructor_dict: AltRepConstructorMap = DEFAULT_ALTREP_MAP,
+ extension: str | None = None,
) -> RData:
- """
- Select the appropiate parser and parse all the info.
- """
+ """Select the appropiate parser and parse all the info."""
format_type = rdata_format(data)
if format_type:
@@ -961,14 +1161,12 @@ def parse_rdata_binary(
altrep_constructor_dict=altrep_constructor_dict,
)
return parser.parse_all()
- else:
- raise NotImplementedError("Unknown file format")
+
+ raise NotImplementedError("Unknown file format")
def bits(data: int, start: int, stop: int) -> int:
- """
- Read bits [start, stop) of an integer.
- """
+ """Read bits [start, stop) of an integer."""
count = stop - start
mask = ((1 << count) - 1) << start
@@ -977,17 +1175,15 @@ def bits(data: int, start: int, stop: int) -> int:
def is_special_r_object_type(r_object_type: RObjectType) -> bool:
- """
- Check if a R type has a different serialization than the usual one.
- """
- return (r_object_type is RObjectType.NILVALUE
- or r_object_type is RObjectType.REF)
+ """Check if a R type has a different serialization than the usual one."""
+ return (
+ r_object_type is RObjectType.NILVALUE
+ or r_object_type is RObjectType.REF
+ )
def parse_r_object_info(info_int: int) -> RObjectInfo:
- """
- Parse the internal information of an object.
- """
+ """Parse the internal information of an object."""
type_exp = RObjectType(bits(info_int, 0, 8))
reference = 0
@@ -1000,11 +1196,11 @@ def parse_r_object_info(info_int: int) -> RObjectInfo:
else:
object_flag = bool(bits(info_int, 8, 9))
attributes = bool(bits(info_int, 9, 10))
- tag = bool(bits(info_int, 10, 11))
- gp = bits(info_int, 12, 28)
+ tag = bool(bits(info_int, 10, 11)) # noqa: WPS432
+ gp = bits(info_int, 12, 28) # noqa: WPS432
if type_exp == RObjectType.REF:
- reference = bits(info_int, 8, 32)
+ reference = bits(info_int, 8, 32) # noqa: WPS432
return RObjectInfo(
type=type_exp,
@@ -1012,5 +1208,5 @@ def parse_r_object_info(info_int: int) -> RObjectInfo:
attributes=attributes,
tag=tag,
gp=gp,
- reference=reference
+ reference=reference,
)
diff --git a/rdata/tests/data/test_builtin.rda b/rdata/tests/data/test_builtin.rda
new file mode 100644
index 0000000..48279c6
Binary files /dev/null and b/rdata/tests/data/test_builtin.rda differ
diff --git a/rdata/tests/data/test_dataframe.rds b/rdata/tests/data/test_dataframe.rds
new file mode 100644
index 0000000..b5f2382
Binary files /dev/null and b/rdata/tests/data/test_dataframe.rds differ
diff --git a/rdata/tests/data/test_dataframe_rownames.rda b/rdata/tests/data/test_dataframe_rownames.rda
new file mode 100644
index 0000000..4c791e2
Binary files /dev/null and b/rdata/tests/data/test_dataframe_rownames.rda differ
diff --git a/rdata/tests/data/test_dataframe_v3.rds b/rdata/tests/data/test_dataframe_v3.rds
new file mode 100644
index 0000000..6c2ada7
Binary files /dev/null and b/rdata/tests/data/test_dataframe_v3.rds differ
diff --git a/rdata/tests/data/test_empty_function.rda b/rdata/tests/data/test_empty_function.rda
new file mode 100644
index 0000000..d8dd79f
Binary files /dev/null and b/rdata/tests/data/test_empty_function.rda differ
diff --git a/rdata/tests/data/test_empty_function_uncompiled.rda b/rdata/tests/data/test_empty_function_uncompiled.rda
new file mode 100644
index 0000000..205628f
Binary files /dev/null and b/rdata/tests/data/test_empty_function_uncompiled.rda differ
diff --git a/rdata/tests/data/test_file.rda b/rdata/tests/data/test_file.rda
new file mode 100644
index 0000000..5cee314
Binary files /dev/null and b/rdata/tests/data/test_file.rda differ
diff --git a/rdata/tests/data/test_full_named_matrix.rda b/rdata/tests/data/test_full_named_matrix.rda
new file mode 100644
index 0000000..1b20735
Binary files /dev/null and b/rdata/tests/data/test_full_named_matrix.rda differ
diff --git a/rdata/tests/data/test_full_named_matrix.rds b/rdata/tests/data/test_full_named_matrix.rds
new file mode 100644
index 0000000..0dd6a3a
Binary files /dev/null and b/rdata/tests/data/test_full_named_matrix.rds differ
diff --git a/rdata/tests/data/test_function.rda b/rdata/tests/data/test_function.rda
new file mode 100644
index 0000000..3e0940f
Binary files /dev/null and b/rdata/tests/data/test_function.rda differ
diff --git a/rdata/tests/data/test_function_arg.rda b/rdata/tests/data/test_function_arg.rda
new file mode 100644
index 0000000..c97c3ce
Binary files /dev/null and b/rdata/tests/data/test_function_arg.rda differ
diff --git a/rdata/tests/data/test_half_named_matrix.rda b/rdata/tests/data/test_half_named_matrix.rda
new file mode 100644
index 0000000..557a765
Binary files /dev/null and b/rdata/tests/data/test_half_named_matrix.rda differ
diff --git a/rdata/tests/data/test_minimal_function.rda b/rdata/tests/data/test_minimal_function.rda
new file mode 100644
index 0000000..0c39c80
Binary files /dev/null and b/rdata/tests/data/test_minimal_function.rda differ
diff --git a/rdata/tests/data/test_minimal_function_uncompiled.rda b/rdata/tests/data/test_minimal_function_uncompiled.rda
new file mode 100644
index 0000000..df8d2a6
Binary files /dev/null and b/rdata/tests/data/test_minimal_function_uncompiled.rda differ
diff --git a/rdata/tests/data/test_named_matrix.rda b/rdata/tests/data/test_named_matrix.rda
new file mode 100644
index 0000000..401391e
Binary files /dev/null and b/rdata/tests/data/test_named_matrix.rda differ
diff --git a/rdata/tests/test_rdata.py b/rdata/tests/test_rdata.py
index 9c3333f..cb35604 100644
--- a/rdata/tests/test_rdata.py
+++ b/rdata/tests/test_rdata.py
@@ -1,3 +1,5 @@
+"""Tests of parsing and conversion."""
+
import unittest
from collections import ChainMap
from fractions import Fraction
@@ -6,6 +8,7 @@ from typing import Any, Dict
import numpy as np
import pandas as pd
+import xarray
import rdata
@@ -13,99 +16,399 @@ TESTDATA_PATH = rdata.TESTDATA_PATH
class SimpleTests(unittest.TestCase):
+ """Collection of simple test cases."""
def test_opened_file(self) -> None:
- parsed = rdata.parser.parse_file(open(TESTDATA_PATH /
- "test_vector.rda"))
- converted = rdata.conversion.convert(parsed)
+ """Test that an opened file can be passed to parse_file."""
+ with open(TESTDATA_PATH / "test_vector.rda") as f:
+ parsed = rdata.parser.parse_file(f)
+ converted = rdata.conversion.convert(parsed)
- self.assertIsInstance(converted, dict)
+ self.assertIsInstance(converted, dict)
def test_opened_string(self) -> None:
- parsed = rdata.parser.parse_file(str(TESTDATA_PATH /
- "test_vector.rda"))
+ """Test that a string can be passed to parse_file."""
+ parsed = rdata.parser.parse_file(
+ str(TESTDATA_PATH / "test_vector.rda"),
+ )
converted = rdata.conversion.convert(parsed)
self.assertIsInstance(converted, dict)
def test_logical(self) -> None:
+ """Test parsing of logical vectors."""
parsed = rdata.parser.parse_file(TESTDATA_PATH / "test_logical.rda")
converted = rdata.conversion.convert(parsed)
np.testing.assert_equal(converted, {
- "test_logical": np.array([True, True, False, True, False])
+ "test_logical": np.array([True, True, False, True, False]),
})
def test_vector(self) -> None:
+ """Test parsing of numerical vectors."""
parsed = rdata.parser.parse_file(TESTDATA_PATH / "test_vector.rda")
converted = rdata.conversion.convert(parsed)
np.testing.assert_equal(converted, {
- "test_vector": np.array([1., 2., 3.])
+ "test_vector": np.array([1.0, 2.0, 3.0]),
})
def test_empty_string(self) -> None:
+ """Test that the empty string is parsed correctly."""
parsed = rdata.parser.parse_file(TESTDATA_PATH / "test_empty_str.rda")
converted = rdata.conversion.convert(parsed)
np.testing.assert_equal(converted, {
- "test_empty_str": [""]
+ "test_empty_str": [""],
})
def test_na_string(self) -> None:
+ """Test that the NA string is parsed correctly."""
parsed = rdata.parser.parse_file(
- TESTDATA_PATH / "test_na_string.rda")
+ TESTDATA_PATH / "test_na_string.rda",
+ )
converted = rdata.conversion.convert(parsed)
np.testing.assert_equal(converted, {
- "test_na_string": [None]
+ "test_na_string": [None],
})
def test_complex(self) -> None:
+ """Test that complex numbers can be parsed."""
parsed = rdata.parser.parse_file(TESTDATA_PATH / "test_complex.rda")
converted = rdata.conversion.convert(parsed)
np.testing.assert_equal(converted, {
- "test_complex": np.array([1 + 2j, 2, 0, 1 + 3j, -1j])
+ "test_complex": np.array([1 + 2j, 2, 0, 1 + 3j, -1j]),
})
def test_matrix(self) -> None:
+ """Test that a matrix can be parsed."""
parsed = rdata.parser.parse_file(TESTDATA_PATH / "test_matrix.rda")
converted = rdata.conversion.convert(parsed)
np.testing.assert_equal(converted, {
- "test_matrix": np.array([[1., 2., 3.],
- [4., 5., 6.]])
+ "test_matrix": np.array([
+ [1.0, 2.0, 3.0],
+ [4.0, 5.0, 6.0],
+ ]),
})
+ def test_named_matrix(self) -> None:
+ """Test that a named matrix can be parsed."""
+ parsed = rdata.parser.parse_file(
+ TESTDATA_PATH / "test_named_matrix.rda",
+ )
+ converted = rdata.conversion.convert(parsed)
+ reference = xarray.DataArray(
+ [
+ [1.0, 2.0, 3.0],
+ [4.0, 5.0, 6.0],
+ ],
+ dims=["dim_0", "dim_1"],
+ coords={
+ "dim_0": ["dim0_0", "dim0_1"],
+ "dim_1": ["dim1_0", "dim1_1", "dim1_2"],
+ },
+ )
+
+ xarray.testing.assert_identical(
+ converted["test_named_matrix"],
+ reference,
+ )
+
+ def test_half_named_matrix(self) -> None:
+ """Test that a named matrix with no name for a dim can be parsed."""
+ parsed = rdata.parser.parse_file(
+ TESTDATA_PATH / "test_half_named_matrix.rda",
+ )
+ converted = rdata.conversion.convert(parsed)
+ reference = xarray.DataArray(
+ [
+ [1.0, 2.0, 3.0],
+ [4.0, 5.0, 6.0],
+ ],
+ dims=["dim_0", "dim_1"],
+ coords={
+ "dim_0": ["dim0_0", "dim0_1"],
+ },
+ )
+
+ xarray.testing.assert_identical(
+ converted["test_half_named_matrix"],
+ reference,
+ )
+
+ def test_full_named_matrix(self) -> None:
+ """Test that a named matrix with dim names can be parsed."""
+ parsed = rdata.parser.parse_file(
+ TESTDATA_PATH / "test_full_named_matrix.rda",
+ )
+ converted = rdata.conversion.convert(parsed)
+ reference = xarray.DataArray(
+ [
+ [1.0, 2.0, 3.0],
+ [4.0, 5.0, 6.0],
+ ],
+ dims=["my_dim_0", "my_dim_1"],
+ coords={
+ "my_dim_0": ["dim0_0", "dim0_1"],
+ "my_dim_1": ["dim1_0", "dim1_1", "dim1_2"],
+ },
+ )
+
+ xarray.testing.assert_identical(
+ converted["test_full_named_matrix"],
+ reference,
+ )
+
+ def test_full_named_matrix_rds(self) -> None:
+ """Test that a named matrix with dim names can be parsed."""
+ parsed = rdata.parser.parse_file(
+ TESTDATA_PATH / "test_full_named_matrix.rds",
+ )
+ converted = rdata.conversion.convert(parsed)
+ reference = xarray.DataArray(
+ [
+ [1.0, 2.0, 3.0],
+ [4.0, 5.0, 6.0],
+ ],
+ dims=["my_dim_0", "my_dim_1"],
+ coords={
+ "my_dim_0": ["dim0_0", "dim0_1"],
+ "my_dim_1": ["dim1_0", "dim1_1", "dim1_2"],
+ },
+ )
+
+ xarray.testing.assert_identical(
+ converted,
+ reference,
+ )
+
def test_list(self) -> None:
+ """Test that list can be parsed."""
parsed = rdata.parser.parse_file(TESTDATA_PATH / "test_list.rda")
converted = rdata.conversion.convert(parsed)
np.testing.assert_equal(converted, {
"test_list":
[
- np.array([1.]),
+ np.array([1.0]),
['a', 'b', 'c'],
- np.array([2., 3.]),
- ['hi']
- ]
+ np.array([2.0, 3.0]),
+ ['hi'],
+ ],
+ })
+
+ def test_file(self) -> None:
+ """Test that external pointers can be parsed."""
+ parsed = rdata.parser.parse_file(TESTDATA_PATH / "test_file.rda")
+ converted = rdata.conversion.convert(parsed)
+
+ np.testing.assert_equal(converted, {
+ "test_file": [5],
})
def test_expression(self) -> None:
+ """Test that expressions can be parsed."""
parsed = rdata.parser.parse_file(TESTDATA_PATH / "test_expression.rda")
converted = rdata.conversion.convert(parsed)
np.testing.assert_equal(converted, {
"test_expression": rdata.conversion.RExpression([
- rdata.conversion.RLanguage(['^', 'base', 'exponent'])])
+ rdata.conversion.RLanguage(
+ ['^', 'base', 'exponent'],
+ attributes={},
+ ),
+ ]),
})
- def test_encodings(self) -> None:
+ def test_builtin(self) -> None:
+ """Test that builtin functions can be parsed."""
+ parsed = rdata.parser.parse_file(TESTDATA_PATH / "test_builtin.rda")
+ converted = rdata.conversion.convert(parsed)
+
+ np.testing.assert_equal(converted, {
+ "test_builtin": rdata.conversion.RBuiltin(name="abs"),
+ })
+
+ def test_minimal_function_uncompiled(self) -> None:
+ """Test that a minimal function can be parsed."""
+ parsed = rdata.parser.parse_file(
+ TESTDATA_PATH / "test_minimal_function_uncompiled.rda")
+ converted = rdata.conversion.convert(parsed)
+
+ converted_fun = converted["test_minimal_function_uncompiled"]
+
+ self.assertIsInstance(
+ converted_fun,
+ rdata.conversion.RFunction,
+ )
+
+ np.testing.assert_equal(converted_fun.environment, ChainMap({}))
+ np.testing.assert_equal(converted_fun.formals, None)
+ np.testing.assert_equal(converted_fun.body, None)
+ np.testing.assert_equal(
+ converted_fun.source,
+ "test_minimal_function_uncompiled <- function() NULL\n",
+ )
+
+ def test_minimal_function(self) -> None:
+ """Test that a minimal function (compiled) can be parsed."""
+ parsed = rdata.parser.parse_file(
+ TESTDATA_PATH / "test_minimal_function.rda")
+ converted = rdata.conversion.convert(parsed)
+
+ converted_fun = converted["test_minimal_function"]
+
+ self.assertIsInstance(
+ converted_fun,
+ rdata.conversion.RFunction,
+ )
+
+ np.testing.assert_equal(converted_fun.environment, ChainMap({}))
+ np.testing.assert_equal(converted_fun.formals, None)
+
+ converted_body = converted_fun.body
+
+ self.assertIsInstance(
+ converted_body,
+ rdata.conversion.RBytecode,
+ )
+
+ np.testing.assert_equal(converted_body.code, np.array([12, 17, 1]))
+ np.testing.assert_equal(converted_body.attributes, {})
+
+ np.testing.assert_equal(
+ converted_fun.source,
+ "test_minimal_function <- function() NULL\n",
+ )
+
+ def test_empty_function_uncompiled(self) -> None:
+ """Test that a simple function can be parsed."""
+ parsed = rdata.parser.parse_file(
+ TESTDATA_PATH / "test_empty_function_uncompiled.rda")
+ converted = rdata.conversion.convert(parsed)
+
+ converted_fun = converted["test_empty_function_uncompiled"]
+
+ self.assertIsInstance(
+ converted_fun,
+ rdata.conversion.RFunction,
+ )
+
+ np.testing.assert_equal(converted_fun.environment, ChainMap({}))
+ np.testing.assert_equal(converted_fun.formals, None)
+ self.assertIsInstance(converted_fun.body, rdata.conversion.RLanguage)
+ np.testing.assert_equal(
+ converted_fun.source,
+ "test_empty_function_uncompiled <- function() {}\n",
+ )
+
+ def test_empty_function(self) -> None:
+ """Test that a simple function (compiled) can be parsed."""
+ parsed = rdata.parser.parse_file(
+ TESTDATA_PATH / "test_empty_function.rda")
+ converted = rdata.conversion.convert(parsed)
+
+ converted_fun = converted["test_empty_function"]
+
+ self.assertIsInstance(
+ converted_fun,
+ rdata.conversion.RFunction,
+ )
+
+ np.testing.assert_equal(converted_fun.environment, ChainMap({}))
+ np.testing.assert_equal(converted_fun.formals, None)
+
+ converted_body = converted_fun.body
+
+ self.assertIsInstance(
+ converted_body,
+ rdata.conversion.RBytecode,
+ )
+
+ np.testing.assert_equal(converted_body.code, np.array([12, 17, 1]))
+ np.testing.assert_equal(converted_body.attributes, {})
+ np.testing.assert_equal(
+ converted_fun.source,
+ "test_empty_function <- function() {}\n",
+ )
+
+ def test_function(self) -> None:
+ """Test that functions can be parsed."""
+ parsed = rdata.parser.parse_file(
+ TESTDATA_PATH / "test_function.rda")
+ converted = rdata.conversion.convert(parsed)
+
+ converted_fun = converted["test_function"]
+
+ self.assertIsInstance(
+ converted_fun,
+ rdata.conversion.RFunction,
+ )
+
+ np.testing.assert_equal(converted_fun.environment, ChainMap({}))
+ np.testing.assert_equal(converted_fun.formals, None)
+
+ converted_body = converted_fun.body
+
+ self.assertIsInstance(
+ converted_body,
+ rdata.conversion.RBytecode,
+ )
+
+ np.testing.assert_equal(
+ converted_body.code,
+ np.array([12, 23, 1, 34, 4, 38, 2, 1]),
+ )
+ np.testing.assert_equal(converted_body.attributes, {})
+
+ np.testing.assert_equal(
+ converted_fun.source,
+ "test_function <- function() {print(\"Hello\")}\n",
+ )
+
+ def test_function_arg(self) -> None:
+ """Test that functions can be parsed."""
+ parsed = rdata.parser.parse_file(
+ TESTDATA_PATH / "test_function_arg.rda")
+ converted = rdata.conversion.convert(parsed)
+
+ converted_fun = converted["test_function_arg"]
+
+ self.assertIsInstance(
+ converted_fun,
+ rdata.conversion.RFunction,
+ )
+
+ np.testing.assert_equal(converted_fun.environment, ChainMap({}))
+ np.testing.assert_equal(converted_fun.formals, {"a": NotImplemented})
+
+ converted_body = converted_fun.body
+
+ self.assertIsInstance(
+ converted_body,
+ rdata.conversion.RBytecode,
+ )
+
+ np.testing.assert_equal(
+ converted_body.code,
+ np.array([12, 23, 1, 29, 4, 38, 2, 1]),
+ )
+ np.testing.assert_equal(converted_body.attributes, {})
+
+ np.testing.assert_equal(
+ converted_fun.source,
+ "test_function_arg <- function(a) {print(a)}\n",
+ )
+
+ def test_encodings(self) -> None:
+ """Test of differents encodings."""
with self.assertWarns(
UserWarning,
- msg="Unknown encoding. Assumed ASCII."
+ msg="Unknown encoding. Assumed ASCII.",
):
parsed = rdata.parser.parse_file(
TESTDATA_PATH / "test_encodings.rda",
@@ -120,7 +423,7 @@ class SimpleTests(unittest.TestCase):
})
def test_encodings_v3(self) -> None:
-
+ """Test encodings in version 3 format."""
parsed = rdata.parser.parse_file(
TESTDATA_PATH / "test_encodings_v3.rda",
)
@@ -134,8 +437,8 @@ class SimpleTests(unittest.TestCase):
})
def test_dataframe(self) -> None:
-
- for f in {"test_dataframe.rda", "test_dataframe_v3.rda"}:
+ """Test dataframe conversion."""
+ for f in ("test_dataframe.rda", "test_dataframe_v3.rda"):
with self.subTest(file=f):
parsed = rdata.parser.parse_file(
TESTDATA_PATH / f,
@@ -144,25 +447,75 @@ class SimpleTests(unittest.TestCase):
pd.testing.assert_frame_equal(
converted["test_dataframe"],
- pd.DataFrame({
- "class": pd.Categorical(
- ["a", "b", "b"]),
- "value": [1, 2, 3],
- })
+ pd.DataFrame(
+ {
+ "class": pd.Categorical(
+ ["a", "b", "b"],
+ ),
+ "value": [1, 2, 3],
+ },
+ index=pd.RangeIndex(start=1, stop=4),
+ ),
)
+ def test_dataframe_rds(self) -> None:
+ """Test dataframe conversion."""
+ for f in ("test_dataframe.rds", "test_dataframe_v3.rds"):
+ with self.subTest(file=f):
+ parsed = rdata.parser.parse_file(
+ TESTDATA_PATH / f,
+ )
+ converted = rdata.conversion.convert(parsed)
+
+ pd.testing.assert_frame_equal(
+ converted,
+ pd.DataFrame(
+ {
+ "class": pd.Categorical(
+ ["a", "b", "b"],
+ ),
+ "value": [1, 2, 3],
+ },
+ index=pd.RangeIndex(start=1, stop=4),
+ ),
+ )
+
+ def test_dataframe_rownames(self) -> None:
+ """Test dataframe conversion."""
+ parsed = rdata.parser.parse_file(
+ TESTDATA_PATH / "test_dataframe_rownames.rda",
+ )
+ converted = rdata.conversion.convert(parsed)
+
+ pd.testing.assert_frame_equal(
+ converted["test_dataframe_rownames"],
+ pd.DataFrame(
+ {
+ "class": pd.Categorical(
+ ["a", "b", "b"],
+ ),
+ "value": [1, 2, 3],
+ },
+ index=('Madrid', 'Frankfurt', 'Herzberg am Harz'),
+ ),
+ )
+
def test_ts(self) -> None:
+ """Test time series conversion."""
parsed = rdata.parser.parse_file(TESTDATA_PATH / "test_ts.rda")
converted = rdata.conversion.convert(parsed)
- pd.testing.assert_series_equal(converted["test_ts"],
- pd.Series({
- 2000 + Fraction(2, 12): 1.,
- 2000 + Fraction(3, 12): 2.,
- 2000 + Fraction(4, 12): 3.,
- }))
+ pd.testing.assert_series_equal(
+ converted["test_ts"],
+ pd.Series({
+ 2000 + Fraction(2, 12): 1.0,
+ 2000 + Fraction(3, 12): 2.0,
+ 2000 + Fraction(4, 12): 3.0,
+ }),
+ )
def test_s4(self) -> None:
+ """Test parsing of S4 classes."""
parsed = rdata.parser.parse_file(TESTDATA_PATH / "test_s4.rda")
converted = rdata.conversion.convert(parsed)
@@ -170,20 +523,22 @@ class SimpleTests(unittest.TestCase):
"test_s4": SimpleNamespace(
age=np.array(28),
name=["Carlos"],
- **{'class': ["Person"]}
- )
+ **{'class': ["Person"]}, # noqa: WPS517
+ ),
})
def test_environment(self) -> None:
+ """Test parsing of environments."""
parsed = rdata.parser.parse_file(
- TESTDATA_PATH / "test_environment.rda")
+ TESTDATA_PATH / "test_environment.rda",
+ )
converted = rdata.conversion.convert(parsed)
dict_env = {'string': ['test']}
empty_global_env: Dict[str, Any] = {}
np.testing.assert_equal(converted, {
- "test_environment": ChainMap(dict_env, ChainMap(empty_global_env))
+ "test_environment": ChainMap(dict_env, ChainMap(empty_global_env)),
})
global_env = {"global": "test"}
@@ -194,24 +549,27 @@ class SimpleTests(unittest.TestCase):
)
np.testing.assert_equal(converted_global, {
- "test_environment": ChainMap(dict_env, ChainMap(global_env))
+ "test_environment": ChainMap(dict_env, ChainMap(global_env)),
})
def test_emptyenv(self) -> None:
+ """Test parsing the empty environment."""
parsed = rdata.parser.parse_file(
- TESTDATA_PATH / "test_emptyenv.rda")
+ TESTDATA_PATH / "test_emptyenv.rda",
+ )
converted = rdata.conversion.convert(parsed)
- np.testing.assert_equal(converted, {
- "test_emptyenv": ChainMap({})
+ self.assertEqual(converted, {
+ "test_emptyenv": ChainMap({}),
})
def test_list_attrs(self) -> None:
+ """Test that lists accept attributes."""
parsed = rdata.parser.parse_file(TESTDATA_PATH / "test_list_attrs.rda")
converted = rdata.conversion.convert(parsed)
np.testing.assert_equal(converted, {
- "test_list_attrs": [['list'], [5]]
+ "test_list_attrs": [['list'], [5]],
})
def test_altrep_compact_intseq(self) -> None:
@@ -244,7 +602,7 @@ class SimpleTests(unittest.TestCase):
converted = rdata.conversion.convert(parsed)
np.testing.assert_equal(converted, {
- "test_altrep_deferred_string": [
+ "test_altrep_deferred_string": [ # noqa: WPS317
"1", "2.3", "10000",
"1e+05", "-10000", "-1e+05",
"0.001", "1e-04", "1e-05",
@@ -286,5 +644,4 @@ class SimpleTests(unittest.TestCase):
if __name__ == "__main__":
- # import sys;sys.argv = ['', 'Test.testName']
unittest.main()
diff --git a/setup.cfg b/setup.cfg
index 46e0513..a53dbc5 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -11,6 +11,127 @@ include_trailing_comma = true
use_parentheses = true
combine_as_imports = 1
+[flake8]
+ignore =
+ # No docstring for magic methods
+ D105,
+ # No docstrings in __init__
+ D107,
+ # Ignore until https://github.com/terrencepreilly/darglint/issues/54 is closed
+ DAR202,
+ # Ignore until https://github.com/terrencepreilly/darglint/issues/144 is closed
+ DAR401,
+ # Non-explicit exceptions may be documented in raises
+ DAR402,
+ # Uppercase arguments like X are common in scikit-learn
+ N803,
+ # Uppercase variables like X are common in scikit-learn
+ N806,
+ # There are no bad quotes
+ Q000,
+ # Google Python style is not RST until after processed by Napoleon
+ # See https://github.com/peterjc/flake8-rst-docstrings/issues/17
+ RST201, RST203, RST301,
+ # assert is used by pytest tests
+ S101,
+ # Line break occurred before a binary operator (antipattern)
+ W503,
+ # Utils is used as a module name
+ WPS100,
+ # Short names like X or y are common in scikit-learn
+ WPS111,
+ # We do not like this underscored numbers convention
+ WPS114,
+ # Attributes in uppercase are used in enums
+ WPS115,
+ # Trailing underscores are a scikit-learn convention
+ WPS120,
+ # Cognitive complexity cannot be avoided at some modules
+ WPS232,
+ # The number of imported things may be large, especially for typing
+ WPS235,
+ # We like local imports, thanks
+ WPS300,
+ # Dotted imports are ok
+ WPS301,
+ # We love f-strings
+ WPS305,
+ # Implicit string concatenation is useful for exception messages
+ WPS306,
+ # No base class needed
+ WPS326,
+ # We allow multiline conditions
+ WPS337,
+ # We order methods differently
+ WPS338,
+ # We need multine loops
+ WPS352,
+ # Assign to a subcript slice is normal behaviour in numpy
+ WPS362,
+ # All keywords are beautiful
+ WPS420,
+ # We use nested imports sometimes, and it is not THAT bad
+ WPS433,
+ # We use list multiplication to allocate list with immutable values (None or numbers)
+ WPS435,
+ # Our private modules are fine to import
+ # (check https://github.com/wemake-services/wemake-python-styleguide/issues/1441)
+ WPS436,
+ # Our private objects are fine to import
+ WPS450,
+ # Numpy mixes bitwise and comparison operators
+ WPS465,
+ # Explicit len compare is better than implicit
+ WPS507,
+ # Comparison with not is not the same as with equality
+ WPS520,
+
+per-file-ignores =
+ __init__.py:
+ # Unused modules are allowed in `__init__.py`, to reduce imports
+ F401,
+ # Explicit re-exports allowed in __init__
+ WPS113,
+ # Import multiple names is allowed in `__init__.py`
+ WPS235,
+ # Logic is allowed in `__init__.py`
+ WPS412
+
+ # Tests benefit from overused expressions, magic numbers and fixtures
+ test_*.py: WPS204, WPS432, WPS442
+
+rst-directives =
+ # These are sorted alphabetically - but that does not matter
+ autosummary,data,currentmodule,deprecated,
+ glossary,moduleauthor,plot,testcode,
+ versionadded,versionchanged,
+
+rst-roles =
+ attr,class,func,meth,mod,obj,ref,term,
+
+allowed-domain-names = data, info, obj, result, results, val, value, values, var
+
+# Needs to be tuned
+max-arguments = 10
+max-attributes = 10
+max-cognitive-score = 30
+max-expressions = 15
+max-imports = 20
+max-line-complexity = 30
+max-local-variables = 15
+max-methods = 30
+max-module-expressions = 15
+max-module-members = 15
+max-string-usages = 10
+
+ignore-decorators = (property)|(overload)
+
+strictness = long
+
+# Beautify output and make it more informative
+format = wemake
+show-source = true
+
[mypy]
strict = True
strict_equality = True
diff --git a/setup.py b/setup.py
index ff0d6d5..2e8bfe4 100644
--- a/setup.py
+++ b/setup.py
@@ -7,6 +7,7 @@ This package parses .rda datasets used in R. It does not depend on the R
language or its libraries, and thus it is released under a MIT license.
"""
import os
+import pathlib
import sys
from setuptools import find_packages, setup
@@ -16,44 +17,51 @@ pytest_runner = ['pytest-runner'] if needs_pytest else []
DOCLINES = (__doc__ or '').split("\n")
-with open(os.path.join(os.path.dirname(__file__),
- 'VERSION'), 'r') as version_file:
+with open(
+ pathlib.Path(os.path.dirname(__file__)) / 'rdata' / 'VERSION',
+ 'r',
+) as version_file:
version = version_file.read().strip()
-setup(name='rdata',
- version=version,
- description=DOCLINES[1],
- long_description="\n".join(DOCLINES[3:]),
- url='https://github.com/vnmabus/rdata',
- author='Carlos Ramos Carreño',
- author_email='vnmabus@gmail.com',
- include_package_data=True,
- platforms=['any'],
- license='MIT',
- packages=find_packages(),
- python_requires='>=3.7, <4',
- classifiers=[
- 'Development Status :: 4 - Beta',
- 'Intended Audience :: Developers',
- 'Intended Audience :: Science/Research',
- 'License :: OSI Approved :: MIT License',
- 'Natural Language :: English',
- 'Operating System :: OS Independent',
- 'Programming Language :: Python :: 3',
- 'Programming Language :: Python :: 3.6',
- 'Programming Language :: Python :: 3.7',
- 'Programming Language :: Python :: 3.8',
- 'Topic :: Scientific/Engineering :: Mathematics',
- 'Topic :: Software Development :: Libraries :: Python Modules',
- 'Typing :: Typed',
- ],
- keywords=['rdata', 'r', 'dataset'],
- install_requires=['numpy',
- 'xarray',
- 'pandas'],
- setup_requires=pytest_runner,
- tests_require=['pytest-cov',
- 'numpy>=1.14' # The printing format for numpy changes
- ],
- test_suite='rdata.tests',
- zip_safe=False)
+setup(
+ name='rdata',
+ version=version,
+ description=DOCLINES[1],
+ long_description="\n".join(DOCLINES[3:]),
+ url='https://github.com/vnmabus/rdata',
+ author='Carlos Ramos Carreño',
+ author_email='vnmabus@gmail.com',
+ include_package_data=True,
+ platforms=['any'],
+ license='MIT',
+ packages=find_packages(),
+ python_requires='>=3.7, <4',
+ classifiers=[
+ 'Development Status :: 4 - Beta',
+ 'Intended Audience :: Developers',
+ 'Intended Audience :: Science/Research',
+ 'License :: OSI Approved :: MIT License',
+ 'Natural Language :: English',
+ 'Operating System :: OS Independent',
+ 'Programming Language :: Python :: 3',
+ 'Programming Language :: Python :: 3.6',
+ 'Programming Language :: Python :: 3.7',
+ 'Programming Language :: Python :: 3.8',
+ 'Topic :: Scientific/Engineering :: Mathematics',
+ 'Topic :: Software Development :: Libraries :: Python Modules',
+ 'Typing :: Typed',
+ ],
+ keywords=['rdata', 'r', 'dataset'],
+ install_requires=[
+ 'numpy',
+ 'xarray',
+ 'pandas',
+ ],
+ setup_requires=pytest_runner,
+ tests_require=[
+ 'pytest-cov',
+ 'numpy>=1.14', # The printing format for numpy changes
+ ],
+ test_suite='rdata.tests',
+ zip_safe=False,
+)
Debdiff
[The following lists of changes regard files as different if they have different names, permissions or owners.]
Files in second set of .debs but not in first
-rw-r--r-- root/root /usr/lib/python3/dist-packages/rdata/VERSION
No differences were encountered in the control files