Codebase list python-rdata / fresh-releases/main

Tree @fresh-releases/main (Download .tar.gz)


|build-status| |docs| |coverage| |landscape| |pypi| |zenodo|

Read R datasets from Python.

	Github does not support include in README for dubious security reasons, so
	we copy-paste instead. Also Github does not understand Sphinx directives.
	.. include:: docs/simpleusage.rst

rdata is on PyPi and can be installed using :code:`pip`:

.. code::

   pip install rdata

It is also available for :code:`conda` using the :code:`conda-forge` channel:

.. code::

   conda install -c conda-forge rdata


The documentation of rdata is in
`ReadTheDocs <>`_.
Simple usage

Read a R dataset

The common way of reading an R dataset is the following one:

>>> import rdata

>>> parsed = rdata.parser.parse_file(rdata.TESTDATA_PATH / "test_vector.rda")
>>> converted = rdata.conversion.convert(parsed)
>>> converted
{'test_vector': array([1., 2., 3.])}
This consists on two steps: 

#. First, the file is parsed using the function
   `parse_file`. This provides a literal description of the
   file contents as a hierarchy of Python objects representing the basic R
   objects. This step is unambiguous and always the same.
#. Then, each object must be converted to an appropriate Python object. In this
   step there are several choices on which Python type is the most appropriate
   as the conversion for a given R object. Thus, we provide a default
   `convert` routine, which tries to select Python
   objects that preserve most information of the original R object. For custom
   R classes, it is also possible to specify conversion routines to Python
Convert custom R classes

The basic `convert` routine only constructs a
`SimpleConverter` objects and calls its
`convert` method. All arguments of
`convert` are directly passed to the
`SimpleConverter` initialization method.

It is possible, although not trivial, to make a custom
`Converter` object to change the way in which the
basic R objects are transformed to Python objects. However, a more common
situation is that one does not want to change how basic R objects are
converted, but instead wants to provide conversions for specific R classes.
This can be done by passing a dictionary to the
`SimpleConverter` initialization method, containing
as keys the names of R classes and as values, callables that convert a
R object of that class to a Python object. By default, the dictionary used
is `DEFAULT_CLASS_MAP`, which can convert
commonly used R classes such as `data.frame` and `factor`.

As an example, here is how we would implement a conversion routine for the
factor class to `bytes` objects, instead of the default conversion to
Pandas `Categorical` objects:

>>> import rdata

>>> def factor_constructor(obj, attrs):
...     values = [bytes(attrs['levels'][i - 1], 'utf8')
...               if i >= 0 else None for i in obj]
...     return values

>>> new_dict = {
...         **rdata.conversion.DEFAULT_CLASS_MAP,
...         "factor": factor_constructor
...         }

>>> parsed = rdata.parser.parse_file(rdata.TESTDATA_PATH
...                                  / "test_dataframe.rda")
>>> converted = rdata.conversion.convert(parsed, new_dict)
>>> converted
{'test_dataframe':   class  value
    1     b'a'      1
    2     b'b'      2
    3     b'b'      3}

.. |build-status| image::
    :alt: build status
    :scale: 100%

.. |docs| image::
    :alt: Documentation Status
    :scale: 100%
.. |coverage| image::
    :alt: Coverage Status
    :scale: 100%
.. |landscape| image::
   :alt: Code Health
.. |pypi| image::
    :alt: Pypi version
    :scale: 100%
.. |zenodo| image::
    :alt: Zenodo DOI
    :scale: 100%