Metadata-Version: 2.1
Name: apertium-streamparser
Version: 5.0.2
Summary: Python library to parse Apertium stream format
Author: Sushain K. Cherivirala
License: GPLv3+
Keywords: apertium parsing linguistics
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Topic :: Text Processing :: Linguistic
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Programming Language :: Python :: Implementation :: CPython
Requires-Python: >=3.4
Description-Content-Type: text/markdown; charset=UTF-8
License-File: LICENSE

# Apertium Streamparser

Python 3 library to parse [Apertium stream format][1], generating `LexicalUnit`s.

## Installation

Streamparser is available through [PyPi][2]:

    $ pip install apertium-streamparser
    $ apertium-streamparser
    [[SReading(baseform='vino', tags=['n', 'm', 'sg'])], [SReading(baseform='venir', tags=['vblex', 'ifi', 'p3', 'sg'])]]

Installation through PyPi will also install the `streamparser` module.

## Usage

### As a library

#### With string input

>>> from streamparser import parse
>>> lexical_units = parse('^hypercholesterolemia/*hypercholesterolemia$\[\]\^\$[^ignoreme/yesreally$]^a\/s/a\/s<n><nt>$^vino/vino<n><m><sg>/venir<vblex><ifi><p3><sg>$.eefe^dímelo/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><nt>/decir<vblex><imp><p2><sg>+me<prn><enc><p1><mf><sg>+lo<prn><enc><p3><m><sg>$')
>>> for lexical_unit in lexical_units:
        print('%s (%s) → %s' % (lexical_unit.wordform, lexical_unit.knownness, lexical_unit.readings))

    hypercholesterolemia (<class 'streamparser.unknown'>) → [[SReading(baseform='*hypercholesterolemia', tags=[])]]
    a\/s (<class 'streamparser.known'>) → [[SReading(baseform='a\\/s', tags=['n', 'nt'])]]
    vino (<class 'streamparser.known'>) → [[SReading(baseform='vino', tags=['n', 'm', 'sg'])], [SReading(baseform='venir', tags=['vblex', 'ifi', 'p3', 'sg'])]]
    dímelo (<class 'streamparser.known'>) → [[SReading(baseform='decir', tags=['vblex', 'imp', 'p2', 'sg']), SReading(baseform='me', tags=['prn', 'enc', 'p1', 'mf', 'sg']), SReading(baseform='lo', tags=['prn', 'enc', 'p3', 'nt'])], [SReading(baseform='decir', tags=['vblex', 'imp', 'p2', 'sg']), SReading(baseform='me', tags=['prn', 'enc', 'p1', 'mf', 'sg']), SReading(baseform='lo', tags=['prn', 'enc', 'p3', 'm', 'sg'])]]

#### With file input

>>> from streamparser import parse_file
>>> lexical_units = parse_file(open('~/Downloads/analyzed.txt'))
>>> for lexical_unit in lexical_units:
        print('%s (%s) → %s' % (lexical_unit.wordform, lexical_unit.knownness, lexical_unit.readings))

    Høgre (<class 'streamparser.known'>) → [[SReading(baseform='Høgre', tags=['np'])], [SReading(baseform='høgre', tags=['n', 'nt', 'sp'])], [SReading(baseform='høg', tags=['un', 'sint', 'sp', 'comp', 'adj'])], [SReading(baseform='høgre', tags=['f', 'n', 'ind', 'sg'])], [SReading(baseform='høgre', tags=['f', 'n', 'ind', 'sg'])], [SReading(baseform='høgre', tags=['sg', 'nt', 'ind', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['mf', 'sg', 'ind', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['un', 'ind', 'pl', 'posi', 'adj'])], [SReading(baseform='høgre', tags=['un', 'def', 'sp', 'posi', 'adj'])]]
    kolonne (<class 'streamparser.known'>) → [[SReading(baseform='kolonne', tags=['m', 'n', 'ind', 'sg'])], [SReading(baseform='kolonne', tags=['m', 'n', 'ind', 'sg'])]]
    Grunnprinsipp (<class 'streamparser.known'>) → [[SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'ind', 'sg'])], S[Reading(baseform='grunnprinsipp', tags=['n', 'nt', 'pl', 'ind'])], [SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'ind', 'sg'])], [SReading(baseform='grunnprinsipp', tags=['n', 'nt', 'pl', 'ind'])]]
    7 (<class 'streamparser.known'>) → [[SReading(baseform='7', tags=['qnt', 'pl', 'det'])]]
    px (<class 'streamparser.unknown'>) → []

### From the terminal

#### With standard input

$ bzcat ~/corpora/nnclean2.txt.bz2 | apertium-deshtml | lt-proc -we /usr/share/apertium/apertium-nno/nno.automorf.bin | python3
[[SReading(baseform='Høgre', tags=['np'])],
 [SReading(baseform='høgre', tags=['n', 'sp', 'nt'])],
 [SReading(baseform='høg', tags=['un', 'sp', 'adj', 'comp', 'sint'])],
 [SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
 [SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'nt', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'mf', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'un', 'pl'])],
 [SReading(baseform='høgre', tags=['posi', 'def', 'sp', 'adj', 'un'])]]
[[SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])],
 [SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])]]

#### With file input in terminal

$ bzcat ~/corpora/nnclean2.txt.bz2 | apertium-deshtml | lt-proc -we /usr/share/apertium/apertium-nno/nno.automorf.bin > analyzed.txt
$ python3 analyzed.txt
[[SReading(baseform='Høgre', tags=['np'])],
 [SReading(baseform='høgre', tags=['n', 'sp', 'nt'])],
 [SReading(baseform='høg', tags=['un', 'sp', 'adj', 'comp', 'sint'])],
 [SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
 [SReading(baseform='høgre', tags=['n', 'f', 'ind', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'nt', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'mf', 'sg'])],
 [SReading(baseform='høgre', tags=['posi', 'ind', 'adj', 'un', 'pl'])],
 [SReading(baseform='høgre', tags=['posi', 'def', 'sp', 'adj', 'un'])]]
[[SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])],
 [SReading(baseform='kolonne', tags=['n', 'm', 'ind', 'sg'])]]

## Contributing

Streamparser uses [TravisCI][3] for continous integration. Locally, use
`make test` to run the same checks it does. Use `pipenv install --dev`
to install the requirements required for development, e.g. linters.
