New Upstream Release - hachoir
Ready changes
Summary
Merged new upstream version: 3.2.0+dfsg (was: 3.1.0+dfsg).
Resulting package
Built on 2023-01-31T19:01 (took 4m25s)
The resulting binary packages can be installed (if you have the apt repository enabled) by running one of:
apt install -t fresh-releases hachoir
Lintian Result
Diff
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
new file mode 100644
index 00000000..89b133ff
--- /dev/null
+++ b/.github/workflows/build.yml
@@ -0,0 +1,28 @@
+name: Build
+
+on:
+ push:
+ branches: [main]
+ pull_request:
+ branches: [main]
+
+jobs:
+ build:
+ runs-on: ${{ matrix.os }}
+ strategy:
+ matrix:
+ os: [ubuntu-latest]
+ python: ['3.11']
+
+ steps:
+ # https://github.com/actions/checkout
+ - uses: actions/checkout@v3
+ - name: Setup Python
+ # https://github.com/actions/setup-python
+ uses: actions/setup-python@v4
+ with:
+ python-version: ${{ matrix.python }}
+ - name: Install Tox and any other packages
+ run: pip install tox
+ - name: Run Tox
+ run: tox
diff --git a/.travis.yml b/.travis.yml
deleted file mode 100644
index b04fbdd5..00000000
--- a/.travis.yml
+++ /dev/null
@@ -1,7 +0,0 @@
-language: python
-env:
- - TOXENV=py36
- - TOXENV=doc
- - TOXENV=pep8
-install: pip install -U tox
-script: tox
diff --git a/README.rst b/README.rst
index fb2e8844..da6d8f33 100644
--- a/README.rst
+++ b/README.rst
@@ -6,9 +6,9 @@ Hachoir
:alt: Latest release on the Python Cheeseshop (PyPI)
:target: https://pypi.python.org/pypi/hachoir
-.. image:: https://travis-ci.org/vstinner/hachoir.svg?branch=master
- :alt: Build status of hachoir on Travis CI
- :target: https://travis-ci.org/vstinner/hachoir
+.. image:: https://github.com/vstinner/hachoir/actions/workflows/build.yml/badge.svg
+ :alt: Build status of hachoir on GitHub Actions
+ :target: https://github.com/vstinner/hachoir/actions
.. image:: http://unmaintained.tech/badge.svg
:target: http://unmaintained.tech/
diff --git a/debian/changelog b/debian/changelog
index f3d2b500..76cb8220 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,8 +1,9 @@
-hachoir (3.1.0+dfsg-6) UNRELEASED; urgency=medium
+hachoir (3.2.0+dfsg-1) UNRELEASED; urgency=medium
* Update standards version to 4.6.1, no changes needed.
+ * New upstream release.
- -- Debian Janitor <janitor@jelmer.uk> Mon, 07 Nov 2022 17:56:56 -0000
+ -- Debian Janitor <janitor@jelmer.uk> Tue, 31 Jan 2023 18:58:22 -0000
hachoir (3.1.0+dfsg-5) unstable; urgency=medium
diff --git a/doc/changelog.rst b/doc/changelog.rst
index d5468daa..0773e901 100644
--- a/doc/changelog.rst
+++ b/doc/changelog.rst
@@ -2,6 +2,38 @@
Changelog
+++++++++
+hachoir 3.2.0 (2022-11-27)
+==========================
+
+* Fix hachoir-grep command line parsing.
+* PYC parser supports Python 3.12.
+
+hachoir 3.1.3 (2022-04-04)
+==========================
+
+* The development branch ``master`` was renamed to ``main``.
+ See https://sfconservancy.org/news/2020/jun/23/gitbranchname/ for the
+ rationale.
+* Replace Travis CI with GitHub Actions.
+* ttf: Support OpenType magic number (OTTO).
+* hachoir-wx: Load darkdetect and test once, fallback if not found.
+* Add hachoir-wx docs.
+* jpeg: Set the size of a JpegImageData with no terminator to the
+ remaining length in the stream to avoid parsing subfields of the JpegImageData
+ if possible.
+* fit: Add parser of Garmin fit files.
+* lzx: Fix LZX decompression.
+
+hachoir 3.1.2 (2020-02-15)
+==========================
+
+* Fix a SyntaxWarning in the PDF parser.
+
+hachoir 3.1.1 (2020-01-06)
+==========================
+
+* Fix hachoir-wx
+
hachoir 3.1.0 (2019-10-28)
==========================
diff --git a/doc/conf.py b/doc/conf.py
index 808c86de..d0150450 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -55,7 +55,7 @@ copyright = u'2014, Victor Stinner'
#
# The short X.Y version.
# The full version, including alpha/beta/rc tags.
-version = release = '3.0a6'
+version = release = '3.2.0'
# The language for content autogenerated by Sphinx. Refer to documentation
diff --git a/doc/images/wx.png b/doc/images/wx.png
new file mode 100644
index 00000000..07782b33
Binary files /dev/null and b/doc/images/wx.png differ
diff --git a/doc/index.rst b/doc/index.rst
index 693c6fd7..2fd59257 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -16,6 +16,7 @@ Command line tools using Hachoir parsers:
* :ref:`hachoir-metadata <metadata>`: get metadata from binary files
* :ref:`hachoir-urwid <urwid>`: display the content of a binary file in text mode
+* :ref:`hachoir-wx <wx>`: display the content of a binary file in GUI mode
* :ref:`hachoir-grep <grep>`: find a text pattern in a binary file
* :ref:`hachoir-strip <strip>`: modify a file to remove metadata
@@ -32,6 +33,7 @@ User Guide
install
metadata
urwid
+ wx
subfile
grep
strip
diff --git a/doc/install.rst b/doc/install.rst
index 2a81b368..308b0fc9 100644
--- a/doc/install.rst
+++ b/doc/install.rst
@@ -7,7 +7,7 @@ To install Hachoir, type::
python3 -m pip install -U hachoir
To use hachoir-urwid, you will also need to install `urwid library
-<http://excess.org/urwid/>`_::
+<http://urwid.org/>`_::
python3 -m pip install -U urwid
diff --git a/doc/metadata.rst b/doc/metadata.rst
index 20e349e4..ffdb9126 100644
--- a/doc/metadata.rst
+++ b/doc/metadata.rst
@@ -140,25 +140,6 @@ Video
Command line options
====================
-Modes --mime and --type
-=======================
-
-Option --mime ask to just display file MIME type (works like UNIX
-"file --mime" program)::
-
- $ hachoir-metadata --mime logo-Kubuntu.png sheep_on_drugs.mp3 wormux_32x32_16c.ico
- logo-Kubuntu.png: image/png
- sheep_on_drugs.mp3: audio/mpeg
- wormux_32x32_16c.ico: image/x-ico
-
-Option --file display short description of file type (works like
-UNIX "file" program)::
-
- $ hachoir-metadata --type logo-Kubuntu.png sheep_on_drugs.mp3 wormux_32x32_16c.ico
- logo-Kubuntu.png: PNG picture: 331x90x8 (alpha layer)
- sheep_on_drugs.mp3: MPEG v1 layer III, 128.0 Kbit/sec, 44.1 KHz, Joint stereo
- wormux_32x32_16c.ico: Microsoft Windows icon: 16x16x32
-
Modes --mime and --type
-----------------------
@@ -171,7 +152,7 @@ Option ``--mime`` ask to just display file MIME type::
(it works like UNIX "file --mime" program)
-Option ``--file`` display short description of file type::
+Option ``--type`` display short description of file type::
$ hachoir-metadata --type logo-Kubuntu.png sheep_on_drugs.mp3 wormux_32x32_16c.ico
logo-Kubuntu.png: PNG picture: 331x90x8 (alpha layer)
diff --git a/doc/wx.rst b/doc/wx.rst
new file mode 100644
index 00000000..429ade8e
--- /dev/null
+++ b/doc/wx.rst
@@ -0,0 +1,25 @@
+.. _wx:
+
+++++++++++++++++++
+hachoir-wx program
+++++++++++++++++++
+
+hachoir-wx is a graphical binary file explorer and hex viewer, which uses the
+Hachoir library to parse the files and the WxPython library to create the user
+interface.
+
+Before use, make sure to install the required dependencies with ``pip install
+hachoir[wx]``. On Mac OS and Windows, this will install WxPython. On Linux, you
+may need to install a version of WxPython using your distribution's package manager
+or from the `WxPython Download page <https://www.wxpython.org/pages/downloads/>`_.
+
+.. image:: images/wx.png
+ :alt: hachoir-wx screenshot (MP3 audio file)
+
+Command line options
+====================
+
+* ``--preload=10``: Load 10 fields when loading a new field set
+* ``--path="/header/bpp"``: Open the specified path and focus on the field
+* ``--parser=PARSERID``: Force a parser (and skip parser validation)
+* ``--help``: Show all command line options
diff --git a/hachoir/__init__.py b/hachoir/__init__.py
index d34f5e55..4e2da0af 100644
--- a/hachoir/__init__.py
+++ b/hachoir/__init__.py
@@ -1,2 +1,2 @@
-VERSION = (3, 1, 0)
+VERSION = (3, 2, 0)
__version__ = ".".join(map(str, VERSION))
diff --git a/hachoir/core/dict.py b/hachoir/core/dict.py
index 053bd4b2..c55e3f4c 100644
--- a/hachoir/core/dict.py
+++ b/hachoir/core/dict.py
@@ -168,7 +168,7 @@ class Dict(object):
_index = index
if index < 0:
index += len(self._value_list)
- if not(0 <= index <= len(self._value_list)):
+ if not (0 <= index <= len(self._value_list)):
raise IndexError("Insert error: index '%s' is invalid" % _index)
for item_key, item_index in self._index.items():
if item_index >= index:
diff --git a/hachoir/core/tools.py b/hachoir/core/tools.py
index 7655c0e8..43575f22 100644
--- a/hachoir/core/tools.py
+++ b/hachoir/core/tools.py
@@ -493,7 +493,7 @@ def timestampUNIX(value):
"""
if not isinstance(value, (float, int)):
raise TypeError("timestampUNIX(): an integer or float is required")
- if not(0 <= value <= 2147483647):
+ if not (0 <= value <= 2147483647):
raise ValueError("timestampUNIX(): value have to be in 0..2147483647")
return UNIX_TIMESTAMP_T0 + timedelta(seconds=value)
@@ -514,7 +514,7 @@ def timestampMac32(value):
"""
if not isinstance(value, (float, int)):
raise TypeError("an integer or float is required")
- if not(0 <= value <= 4294967295):
+ if not (0 <= value <= 4294967295):
return "invalid Mac timestamp (%s)" % value
return MAC_TIMESTAMP_T0 + timedelta(seconds=value)
diff --git a/hachoir/editor/field.py b/hachoir/editor/field.py
index 0724c230..f3d43c0d 100644
--- a/hachoir/editor/field.py
+++ b/hachoir/editor/field.py
@@ -63,7 +63,7 @@ class FakeField(object):
addr = self._parent._getFieldInputAddress(self._name)
input = self._parent.input
stream = input.stream
- if size % 8:
+ if size % 8 or addr % 8:
output.copyBitsFrom(stream, addr, size, input.endian)
else:
output.copyBytesFrom(stream, addr, size // 8)
diff --git a/hachoir/editor/typed_field.py b/hachoir/editor/typed_field.py
index 38d63792..fe646962 100644
--- a/hachoir/editor/typed_field.py
+++ b/hachoir/editor/typed_field.py
@@ -101,7 +101,7 @@ class EditableBits(EditableFixedField):
self._is_altered = True
def _setValue(self, value):
- if not(0 <= value < (1 << self._size)):
+ if not (0 <= value < (1 << self._size)):
raise ValueError("Invalid value, must be in range %s..%s"
% (0, (1 << self._size) - 1))
self._value = value
@@ -248,7 +248,7 @@ class EditableInteger(EditableFixedField):
else:
valid = self.VALID_VALUE_UNSIGNED
minval, maxval = valid[self._size]
- if not(minval <= value <= maxval):
+ if not (minval <= value <= maxval):
raise ValueError("Invalid value, must be in range %s..%s"
% (minval, maxval))
self._value = value
@@ -274,7 +274,7 @@ class EditableTimestampMac32(EditableFixedField):
EditableFixedField.__init__(self, parent, name, value, 32)
def _setValue(self, value):
- if not(self.minval <= value <= self.maxval):
+ if not (self.minval <= value <= self.maxval):
raise ValueError("Invalid value, must be in range %s..%s"
% (self.minval, self.maxval))
self._value = value
diff --git a/hachoir/field/__init__.py b/hachoir/field/__init__.py
index aea37526..cbc2d937 100644
--- a/hachoir/field/__init__.py
+++ b/hachoir/field/__init__.py
@@ -34,7 +34,8 @@ from hachoir.field.vector import GenericVector, UserVector # noqa
# Complex types
from hachoir.field.float import Float32, Float64, Float80 # noqa
-from hachoir.field.timestamp import (GenericTimestamp, # noqa
+from hachoir.field.timestamp import ( # noqa
+ GenericTimestamp,
TimestampUnix32, TimestampUnix64, TimestampMac32, TimestampUUID60,
TimestampWin64, TimedeltaMillisWin64,
DateTimeMSDOS32, TimeDateMSDOS32, TimedeltaWin64)
diff --git a/hachoir/field/byte_field.py b/hachoir/field/byte_field.py
index c372ad83..e0bdb083 100644
--- a/hachoir/field/byte_field.py
+++ b/hachoir/field/byte_field.py
@@ -20,7 +20,7 @@ class RawBytes(Field):
def __init__(self, parent, name, length, description="Raw data"):
assert issubclass(parent.__class__, Field)
- if not(0 < length <= MAX_LENGTH):
+ if not (0 < length <= MAX_LENGTH):
raise FieldError("Invalid RawBytes length (%s)!" % length)
Field.__init__(self, parent, name, length * 8, description)
self._display = None
diff --git a/hachoir/field/generic_field_set.py b/hachoir/field/generic_field_set.py
index e67e4b56..74d8898f 100644
--- a/hachoir/field/generic_field_set.py
+++ b/hachoir/field/generic_field_set.py
@@ -117,7 +117,7 @@ class GenericFieldSet(BasicFieldSet):
_getSize, doc="Size in bits, may create all fields to get size")
def _getCurrentSize(self):
- assert not(self.done)
+ assert not (self.done)
return self._current_size
current_size = property(_getCurrentSize)
diff --git a/hachoir/field/padding.py b/hachoir/field/padding.py
index 80b082dc..4c7265c8 100644
--- a/hachoir/field/padding.py
+++ b/hachoir/field/padding.py
@@ -23,7 +23,7 @@ class PaddingBits(Bits):
self._display_pattern = self.checkPattern()
def checkPattern(self):
- if not(config.check_padding_pattern):
+ if not (config.check_padding_pattern):
return False
if self.pattern != 0:
return False
@@ -72,7 +72,7 @@ class PaddingBytes(Bytes):
self._display_pattern = self.checkPattern()
def checkPattern(self):
- if not(config.check_padding_pattern):
+ if not (config.check_padding_pattern):
return False
if self.pattern is None:
return False
diff --git a/hachoir/field/string_field.py b/hachoir/field/string_field.py
index 41e47d28..742634d2 100644
--- a/hachoir/field/string_field.py
+++ b/hachoir/field/string_field.py
@@ -244,7 +244,7 @@ class GenericString(Bytes):
and err.end == len(text) \
and self._charset == "UTF-16-LE":
try:
- text = str(text + "\0", self._charset, "strict")
+ text = str(text + b"\0", self._charset, "strict")
self.warning(
"Fix truncated %s string: add missing nul byte" % self._charset)
return text
diff --git a/hachoir/field/timestamp.py b/hachoir/field/timestamp.py
index 0f7d3a56..e45b8320 100644
--- a/hachoir/field/timestamp.py
+++ b/hachoir/field/timestamp.py
@@ -61,7 +61,7 @@ class TimeDateMSDOS32(FieldSet):
def createValue(self):
return datetime(
- 1980 + self["year"].value, self["month"].value, self["day"].value,
+ 1980 + self["year"].value, self["month"].value or 1, self["day"].value or 1,
self["hour"].value, self["minute"].value, 2 * self["second"].value)
def createDisplay(self):
diff --git a/hachoir/field/vector.py b/hachoir/field/vector.py
index 8b5474e6..fabb70e1 100644
--- a/hachoir/field/vector.py
+++ b/hachoir/field/vector.py
@@ -7,7 +7,7 @@ class GenericVector(FieldSet):
# Sanity checks
assert issubclass(item_class, Field)
assert isinstance(item_class.static_size, int)
- if not(0 < nb_items):
+ if not (0 < nb_items):
raise ParserError('Unable to create empty vector "%s" in %s'
% (name, parent.path))
size = nb_items * item_class.static_size
diff --git a/hachoir/grep.py b/hachoir/grep.py
index 4a84d93f..b1c46cc2 100644
--- a/hachoir/grep.py
+++ b/hachoir/grep.py
@@ -63,7 +63,7 @@ def parseOptions():
if len(arguments) < 2:
parser.print_help()
sys.exit(1)
- pattern = str(arguments[0], "ascii")
+ pattern = arguments[0]
filenames = arguments[1:]
return values, pattern, filenames
@@ -169,11 +169,11 @@ class ConsoleGrep(Grep):
def runGrep(values, pattern, filenames):
grep = ConsoleGrep()
grep.display_filename = (1 < len(filenames))
- grep.display_address = not(values.no_addr)
+ grep.display_address = not values.no_addr
grep.display_path = values.path
- grep.display_value = not(values.no_value)
+ grep.display_value = not values.no_value
grep.display_percent = values.percent
- grep.display = not(values.bench)
+ grep.display = not values.bench
for filename in filenames:
grep.searchFile(filename, pattern, case_sensitive=values.case)
diff --git a/hachoir/metadata/main.py b/hachoir/metadata/main.py
index b652f9ec..7f2e9873 100644
--- a/hachoir/metadata/main.py
+++ b/hachoir/metadata/main.py
@@ -85,7 +85,7 @@ def processFile(values, filename,
with parser:
# Extract metadata
- extract_metadata = not(values.mime or values.type)
+ extract_metadata = not (values.mime or values.type)
if extract_metadata:
try:
metadata = extractMetadata(parser, values.quality)
@@ -124,7 +124,7 @@ def processFile(values, filename,
def processFiles(values, filenames, display=True):
- human = not(values.raw)
+ human = not values.raw
ok = True
priority = int(values.level) * 100 + 99
display_filename = (1 < len(filenames))
diff --git a/hachoir/parser/archive/bzip2_parser.py b/hachoir/parser/archive/bzip2_parser.py
index 9c2b9211..2e91d690 100644
--- a/hachoir/parser/archive/bzip2_parser.py
+++ b/hachoir/parser/archive/bzip2_parser.py
@@ -57,8 +57,8 @@ class ZeroTerminatedNumber(Field):
return self._value
-def move_to_front(l, c):
- l[:] = l[c:c + 1] + l[0:c] + l[c + 1:]
+def move_to_front(seq, index):
+ seq[:] = seq[index:index + 1] + seq[0:index] + seq[index + 1:]
class Bzip2Bitmap(FieldSet):
@@ -218,7 +218,7 @@ class Bzip2Parser(Parser):
def validate(self):
if self.stream.readBytes(0, 3) != b'BZh':
return "Wrong file signature"
- if not("1" <= self["blocksize"].value <= "9"):
+ if not ("1" <= self["blocksize"].value <= "9"):
return "Wrong blocksize"
return True
diff --git a/hachoir/parser/archive/lzx.py b/hachoir/parser/archive/lzx.py
index 8db16f00..9d6baf6e 100644
--- a/hachoir/parser/archive/lzx.py
+++ b/hachoir/parser/archive/lzx.py
@@ -13,6 +13,7 @@ from hachoir.field import (FieldSet,
from hachoir.core.endian import MIDDLE_ENDIAN, LITTLE_ENDIAN
from hachoir.core.tools import paddingSize
from hachoir.parser.archive.zlib import build_tree, HuffmanCode, extend_data
+import struct
class LZXPreTreeEncodedTree(FieldSet):
@@ -146,6 +147,8 @@ class LZXBlock(FieldSet):
self.window_size = self.WINDOW_SIZE[self.compression_level]
self.block_type = self["block_type"].value
curlen = len(self.parent.uncompressed_data)
+ intel_started = False # Do we perform Intel jump fixups on this block?
+
if self.block_type in (1, 2): # Verbatim or aligned offset block
if self.block_type == 2:
for i in range(8):
@@ -156,6 +159,8 @@ class LZXBlock(FieldSet):
yield LZXPreTreeEncodedTree(self, "main_tree_rest", self.window_size * 8)
main_tree = build_tree(
self["main_tree_start"].lengths + self["main_tree_rest"].lengths)
+ if self["main_tree_start"].lengths[0xE8]:
+ intel_started = True
yield LZXPreTreeEncodedTree(self, "length_tree", 249)
length_tree = build_tree(self["length_tree"].lengths)
current_decoded_size = 0
@@ -169,7 +174,7 @@ class LZXBlock(FieldSet):
field._description = "Literal value %r" % chr(
field.realvalue)
current_decoded_size += 1
- self.parent.uncompressed_data += chr(field.realvalue)
+ self.parent.uncompressed_data.append(field.realvalue)
yield field
continue
position_header, length_header = divmod(
@@ -243,8 +248,7 @@ class LZXBlock(FieldSet):
self.parent.r2 = self.parent.r1
self.parent.r1 = self.parent.r0
self.parent.r0 = position
- self.parent.uncompressed_data = extend_data(
- self.parent.uncompressed_data, length, position)
+ extend_data(self.parent.uncompressed_data, length, position)
current_decoded_size += length
elif self.block_type == 3: # Uncompressed block
padding = paddingSize(self.address + self.current_size, 16)
@@ -253,6 +257,7 @@ class LZXBlock(FieldSet):
else:
yield PaddingBits(self, "padding[]", 16)
self.endian = LITTLE_ENDIAN
+ intel_started = True # apparently intel fixup may be needed on uncompressed blocks?
yield UInt32(self, "r[]", "New value of R0")
yield UInt32(self, "r[]", "New value of R1")
yield UInt32(self, "r[]", "New value of R2")
@@ -266,12 +271,40 @@ class LZXBlock(FieldSet):
else:
raise ParserError("Unknown block type %d!" % self.block_type)
+ # Fixup Intel jumps if necessary
+ if (
+ intel_started
+ and self.parent["filesize_indicator"].value
+ and self.parent["filesize"].value > 0
+ ):
+ # Note that we're decoding a block-at-a-time instead of a frame-at-a-time,
+ # so we need to handle the frame boundaries carefully.
+ filesize = self.parent["filesize"].value
+ start_pos = max(0, curlen - 10) # We may need to correct something from the last block
+ end_pos = len(self.parent.uncompressed_data) - 10
+ while 1:
+ jmp_pos = self.parent.uncompressed_data.find(b"\xE8", start_pos, end_pos)
+ if jmp_pos == -1:
+ break
+ if (jmp_pos % 32768) >= (32768 - 10):
+ # jumps at the end of frames are not fixed up
+ start_pos = jmp_pos + 1
+ continue
+ abs_off, = struct.unpack("<i", self.parent.uncompressed_data[jmp_pos + 1:jmp_pos + 5])
+ if -jmp_pos <= abs_off < filesize:
+ if abs_off < 0:
+ rel_off = abs_off + filesize
+ else:
+ rel_off = abs_off - jmp_pos
+ self.parent.uncompressed_data[jmp_pos + 1:jmp_pos + 5] = struct.pack("<i", rel_off)
+ start_pos = jmp_pos + 5
+
class LZXStream(Parser):
endian = MIDDLE_ENDIAN
def createFields(self):
- self.uncompressed_data = ""
+ self.uncompressed_data = bytearray()
self.r0 = 1
self.r1 = 1
self.r2 = 1
@@ -291,6 +324,6 @@ class LZXStream(Parser):
def lzx_decompress(stream, window_bits):
data = LZXStream(stream)
data.compr_level = window_bits
- for unused in data:
+ for _ in data:
pass
return data.uncompressed_data
diff --git a/hachoir/parser/archive/mar.py b/hachoir/parser/archive/mar.py
index be71607b..a6efb381 100644
--- a/hachoir/parser/archive/mar.py
+++ b/hachoir/parser/archive/mar.py
@@ -44,7 +44,7 @@ class MarFile(Parser):
return "Invalid magic"
if self["version"].value != 3:
return "Invalid version"
- if not(1 <= self["nb_file"].value <= MAX_NB_FILE):
+ if not (1 <= self["nb_file"].value <= MAX_NB_FILE):
return "Invalid number of file"
return True
diff --git a/hachoir/parser/archive/zlib.py b/hachoir/parser/archive/zlib.py
index ef55ca0b..4c6e0d28 100644
--- a/hachoir/parser/archive/zlib.py
+++ b/hachoir/parser/archive/zlib.py
@@ -14,13 +14,13 @@ from hachoir.core.text_handler import textHandler, hexadecimal
from hachoir.core.tools import paddingSize, alignValue
-def extend_data(data, length, offset):
- """Extend data using a length and an offset."""
+def extend_data(data: bytearray, length, offset):
+ """Extend data using a length and an offset, LZ-style."""
if length >= offset:
new_data = data[-offset:] * (alignValue(length, offset) // offset)
- return data + new_data[:length]
+ data += new_data[:length]
else:
- return data + data[-offset:-offset + length]
+ data += data[-offset:-offset + length]
def build_tree(lengths):
@@ -136,9 +136,9 @@ class DeflateBlock(FieldSet):
CODE_LENGTH_ORDER = [16, 17, 18, 0, 8, 7, 9,
6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15]
- def __init__(self, parent, name, uncomp_data="", *args, **kwargs):
+ def __init__(self, parent, name, uncomp_data=b"", *args, **kwargs):
FieldSet.__init__(self, parent, name, *args, **kwargs)
- self.uncomp_data = uncomp_data
+ self.uncomp_data = bytearray(uncomp_data)
def createFields(self):
yield Bit(self, "final", "Is this the final block?") # BFINAL
@@ -227,7 +227,7 @@ class DeflateBlock(FieldSet):
field._description = "Literal Code %r (Huffman Code %i)" % (
chr(value), field.value)
yield field
- self.uncomp_data += chr(value)
+ self.uncomp_data.append(value)
if value == 256:
field._description = "Block Terminator Code (256) (Huffman Code %i)" % field.value
yield field
@@ -267,15 +267,14 @@ class DeflateBlock(FieldSet):
extrafield._description = "Distance Extra Bits (%i), total length %i" % (
extrafield.value, distance)
yield extrafield
- self.uncomp_data = extend_data(
- self.uncomp_data, length, distance)
+ extend_data(self.uncomp_data, length, distance)
class DeflateData(GenericFieldSet):
endian = LITTLE_ENDIAN
def createFields(self):
- uncomp_data = ""
+ uncomp_data = bytearray()
blk = DeflateBlock(self, "compressed_block[]", uncomp_data)
yield blk
uncomp_data = blk.uncomp_data
@@ -326,11 +325,11 @@ class ZlibData(Parser):
yield textHandler(UInt32(self, "data_checksum", "ADLER32 checksum of compressed data"), hexadecimal)
-def zlib_inflate(stream, wbits=None, prevdata=""):
+def zlib_inflate(stream, wbits=None):
if wbits is None or wbits >= 0:
return ZlibData(stream)["data"].uncompressed_data
else:
data = DeflateData(None, "root", stream, "", stream.askSize(None))
- for unused in data:
+ for _ in data:
pass
return data.uncompressed_data
diff --git a/hachoir/parser/audio/id3.py b/hachoir/parser/audio/id3.py
index 99195935..e6f11312 100644
--- a/hachoir/parser/audio/id3.py
+++ b/hachoir/parser/audio/id3.py
@@ -451,7 +451,7 @@ class ID3_Chunk(FieldSet):
if size:
cls = None
- if not(is_compressed):
+ if not is_compressed:
tag = self["tag"].value
if tag in ID3_Chunk.handler:
cls = ID3_Chunk.handler[tag]
diff --git a/hachoir/parser/audio/itunesdb.py b/hachoir/parser/audio/itunesdb.py
index 92871e8e..095679dc 100644
--- a/hachoir/parser/audio/itunesdb.py
+++ b/hachoir/parser/audio/itunesdb.py
@@ -128,7 +128,7 @@ class DataObject(FieldSet):
yield padding
for i in range(self["entry_count"].value):
yield UInt32(self, "index[" + str(i) + "]", "Index of the " + str(i) + "nth mhit")
- elif(self["type"].value < 15) or (self["type"].value > 17) or (self["type"].value >= 200):
+ elif (self["type"].value < 15) or (self["type"].value > 17) or (self["type"].value >= 200):
yield UInt32(self, "unknown[]")
yield UInt32(self, "unknown[]")
yield UInt32(self, "position", "Position")
diff --git a/hachoir/parser/audio/midi.py b/hachoir/parser/audio/midi.py
index 03f93ec6..b9ed1338 100644
--- a/hachoir/parser/audio/midi.py
+++ b/hachoir/parser/audio/midi.py
@@ -29,7 +29,7 @@ class Integer(Bits):
while True:
bits = stream.readBits(addr, 8, parent.endian)
value = (value << 7) + (bits & 127)
- if not(bits & 128):
+ if not (bits & 128):
break
addr += 8
self._size += 8
diff --git a/hachoir/parser/file_system/ext2.py b/hachoir/parser/file_system/ext2.py
index 3a1f973a..7f0e5443 100644
--- a/hachoir/parser/file_system/ext2.py
+++ b/hachoir/parser/file_system/ext2.py
@@ -747,7 +747,7 @@ class EXT2_FS(HachoirParser, RootSeekableFieldSet):
def validate(self):
if self.stream.readBytes((1024 + 56) * 8, 2) != b"\x53\xEF":
return "Invalid magic number"
- if not(0 <= self["superblock/log_block_size"].value <= 2):
+ if not (0 <= self["superblock/log_block_size"].value <= 2):
return "Invalid (log) block size"
if self["superblock/inode_size"].value not in (0, 128):
return "Unsupported inode size"
diff --git a/hachoir/parser/guess.py b/hachoir/parser/guess.py
index 0ec323f6..97536088 100644
--- a/hachoir/parser/guess.py
+++ b/hachoir/parser/guess.py
@@ -127,10 +127,14 @@ def createParser(filename, real_filename=None, tags=None):
Create a parser from a file or returns None on error.
Options:
- - filename (unicode): Input file name ;
- - real_filename (str|unicode): Real file name.
+ - file (str|io.IOBase): Input file name or
+ a byte io.IOBase stream ;
+ - real_filename (str): Real file name.
"""
if not tags:
tags = []
stream = FileInputStream(filename, real_filename, tags=tags)
- return guessParser(stream)
+ guess = guessParser(stream)
+ if guess is None:
+ stream.close()
+ return guess
diff --git a/hachoir/parser/image/gif.py b/hachoir/parser/image/gif.py
index e97a7fbc..a33b1283 100644
--- a/hachoir/parser/image/gif.py
+++ b/hachoir/parser/image/gif.py
@@ -27,7 +27,7 @@ MAX_HEIGHT = MAX_WIDTH
MAX_FILE_SIZE = 100 * 1024 * 1024
-def rle_repr(l):
+def rle_repr(chain):
"""Run-length encode a list into an "eval"-able form
Example:
@@ -46,7 +46,7 @@ def rle_repr(l):
result[-1] = '[%s, %s]' % (result[-1][1:-1], previous)
else:
result.append('[%s]' % previous)
- iterable = iter(l)
+ iterable = iter(chain)
runlen = 1
result = []
try:
diff --git a/hachoir/parser/image/jpeg.py b/hachoir/parser/image/jpeg.py
index 64d50c10..419d6f48 100644
--- a/hachoir/parser/image/jpeg.py
+++ b/hachoir/parser/image/jpeg.py
@@ -205,7 +205,7 @@ class SOSComponent(FieldSet):
def createFields(self):
comp_id = UInt8(self, "component_id")
yield comp_id
- if not(1 <= comp_id.value <= self["../nr_components"].value):
+ if not (1 <= comp_id.value <= self["../nr_components"].value):
raise ParserError("JPEG error: Invalid component-id")
yield Bits(self, "dc_coding_table", 4, "DC entropy coding table destination selector")
yield Bits(self, "ac_coding_table", 4, "AC entropy coding table destination selector")
@@ -387,7 +387,10 @@ class JpegImageData(FieldSet):
end = self.stream.searchBytes(b"\xff", start, MAX_FILESIZE * 8)
if end is None:
# this is a bad sign, since it means there is no terminator
- # we ignore this; it likely means a truncated image
+ # this likely means a truncated image:
+ # set the size to the remaining length of the stream
+ # to avoid being forced to parse subfields to calculate size
+ self._size = self.stream._size - self.absolute_address
break
if self.stream.readBytes(end, 2) == b'\xff\x00':
# padding: false alarm
diff --git a/hachoir/parser/image/wmf.py b/hachoir/parser/image/wmf.py
index 7541c1c0..2a60951a 100644
--- a/hachoir/parser/image/wmf.py
+++ b/hachoir/parser/image/wmf.py
@@ -597,7 +597,7 @@ class WMF_File(Parser):
yield UInt32(self, "max_record_size", "The size of largest record in 16-bit words")
yield UInt16(self, "nb_params", "Not Used (always 0)")
- while not(self.eof):
+ while not self.eof:
yield Function(self, "func[]")
def isEMF(self):
diff --git a/hachoir/parser/misc/__init__.py b/hachoir/parser/misc/__init__.py
index ccb72fb2..208ffe06 100644
--- a/hachoir/parser/misc/__init__.py
+++ b/hachoir/parser/misc/__init__.py
@@ -16,3 +16,4 @@ from hachoir.parser.misc.word_doc import WordDocumentParser # noqa
from hachoir.parser.misc.word_2 import Word2DocumentParser # noqa
from hachoir.parser.misc.mstask import MSTaskFile # noqa
from hachoir.parser.misc.mapsforge_map import MapsforgeMapFile # noqa
+from hachoir.parser.misc.fit import FITFile # noqa
diff --git a/hachoir/parser/misc/fit.py b/hachoir/parser/misc/fit.py
new file mode 100644
index 00000000..8e51f877
--- /dev/null
+++ b/hachoir/parser/misc/fit.py
@@ -0,0 +1,173 @@
+"""
+Garmin fit file Format parser.
+
+Author: Sebastien Ponce <sebastien.ponce@cern.ch>
+"""
+
+from hachoir.parser import Parser
+from hachoir.field import FieldSet, Int8, UInt8, Int16, UInt16, Int32, UInt32, Int64, UInt64, RawBytes, Bit, Bits, Bytes, String, Float32, Float64
+from hachoir.core.endian import BIG_ENDIAN, LITTLE_ENDIAN
+
+field_types = {
+ 0: UInt8, # enum
+ 1: Int8, # signed int of 8 bits
+ 2: UInt8, # unsigned int of 8 bits
+ 131: Int16, # signed int of 16 bits
+ 132: UInt16, # unsigned int of 16 bits
+ 133: Int32, # signed int of 32 bits
+ 134: UInt32, # unsigned int of 32 bits
+ 7: String, # string
+ 136: Float32, # float
+ 137: Float64, # double
+ 10: UInt8, # unsigned int of 8 bits with 0 as invalid value
+ 139: UInt16, # unsigned int of 16 bits with 0 as invalid value
+ 140: UInt32, # unsigned int of 32 bits with 0 as invalid value
+ 13: Bytes, # bytes
+ 142: Int64, # signed int of 64 bits
+ 143: UInt64, # unsigned int of 64 bits
+ 144: UInt64 # unsigned int of 64 bits with 0 as invalid value
+}
+
+
+class Header(FieldSet):
+ endian = LITTLE_ENDIAN
+
+ def createFields(self):
+ yield UInt8(self, "size", "Header size")
+ yield UInt8(self, "protocol", "Protocol version")
+ yield UInt16(self, "profile", "Profile version")
+ yield UInt32(self, "datasize", "Data size")
+ yield RawBytes(self, "datatype", 4)
+ yield UInt16(self, "crc", "CRC of first 11 bytes or 0x0")
+
+ def createDescription(self):
+ return "Header of fit file. Data size is %d" % (self["datasize"].value)
+
+
+class NormalRecordHeader(FieldSet):
+
+ def createFields(self):
+ yield Bit(self, "normal", "Normal header (0)")
+ yield Bit(self, "type", "Message type (0 data, 1 definition")
+ yield Bit(self, "typespecific", "0")
+ yield Bit(self, "reserved", "0")
+ yield Bits(self, "msgType", 4, description="Message type")
+
+ def createDescription(self):
+ return "Record header, this is a %s message" % ("definition" if self["type"].value else "data")
+
+
+class FieldDefinition(FieldSet):
+
+ def createFields(self):
+ yield UInt8(self, "number", "Field definition number")
+ yield UInt8(self, "size", "Size in bytes")
+ yield UInt8(self, "type", "Base type")
+
+ def createDescription(self):
+ return "Field Definition. Number %d, Size %d" % (self["number"].value, self["size"].value)
+
+
+class DefinitionMessage(FieldSet):
+
+ def createFields(self):
+ yield NormalRecordHeader(self, "RecordHeader")
+ yield UInt8(self, "reserved", "Reserved (0)")
+ yield UInt8(self, "architecture", "Architecture (0 little, 1 big endian")
+ self.endian = BIG_ENDIAN if self["architecture"].value else LITTLE_ENDIAN
+ yield UInt16(self, "msgNumber", "Message Number")
+ yield UInt8(self, "nbFields", "Number of fields")
+ for n in range(self["nbFields"].value):
+ yield FieldDefinition(self, "fieldDefinition[]")
+
+ def createDescription(self):
+ return "Definition Message. Contains %d fields" % (self["nbFields"].value)
+
+
+class DataMessage(FieldSet):
+
+ def createFields(self):
+ hdr = NormalRecordHeader(self, "RecordHeader")
+ yield hdr
+ msgType = self["RecordHeader"]["msgType"].value
+ msgDef = self.parent.msgDefs[msgType]
+ for n in range(msgDef["nbFields"].value):
+ desc = msgDef["fieldDefinition[%d]" % n]
+ typ = field_types[desc["type"].value]
+ self.endian = BIG_ENDIAN if msgDef["architecture"].value else LITTLE_ENDIAN
+ if typ == String or typ == Bytes:
+ yield typ(self, "field%d" % n, desc["size"].value)
+ else:
+ if typ.static_size // 8 == desc["size"].value:
+ yield typ(self, "field%d" % n, desc["size"].value)
+ else:
+ for p in range(desc["size"].value * 8 // typ.static_size):
+ yield typ(self, "field%d[]" % n)
+
+ def createDescription(self):
+ return "Data Message"
+
+
+class TimeStamp(FieldSet):
+
+ def createFields(self):
+ yield Bit(self, "timestamp", "TimeStamp (1)")
+ yield Bits(self, "msgType", 3, description="Message type")
+ yield Bits(self, "time", 4, description="TimeOffset")
+
+ def createDescription(self):
+ return "TimeStamp"
+
+
+class CRC(FieldSet):
+
+ def createFields(self):
+ yield UInt16(self, "crc", "CRC")
+
+ def createDescription(self):
+ return "CRC"
+
+
+class FITFile(Parser):
+ endian = BIG_ENDIAN
+ PARSER_TAGS = {
+ "id": "fit",
+ "category": "misc",
+ "file_ext": ("fit",),
+ "mime": ("application/fit",),
+ "min_size": 14 * 8,
+ "description": "Garmin binary fit format"
+ }
+
+ def __init__(self, *args, **kwargs):
+ Parser.__init__(self, *args, **kwargs)
+ self.msgDefs = {}
+
+ def validate(self):
+ s = self.stream.readBytes(0, 12)
+ if s[8:12] != b'.FIT':
+ return "Invalid header %d %d %d %d" % tuple([int(b) for b in s[8:12]])
+ return True
+
+ def createFields(self):
+ yield Header(self, "header")
+ while self.current_size < self["header"]["datasize"].value * 8:
+ b = self.stream.readBits(self.absolute_address + self.current_size, 2, self.endian)
+ if b == 1:
+ defMsg = DefinitionMessage(self, "definition[]")
+ msgType = defMsg["RecordHeader"]["msgType"].value
+ sizes = ''
+ ts = 0
+ for n in range(defMsg["nbFields"].value):
+ fname = "fieldDefinition[%d]" % n
+ size = defMsg[fname]["size"].value
+ ts += size
+ sizes += "%d/" % size
+ sizes += "%d" % ts
+ self.msgDefs[msgType] = defMsg
+ yield defMsg
+ elif b == 0:
+ yield DataMessage(self, "data[]")
+ else:
+ yield TimeStamp(self, "timestamp[]")
+ yield CRC(self, "crc")
diff --git a/hachoir/parser/misc/mapsforge_map.py b/hachoir/parser/misc/mapsforge_map.py
index 906e95c1..a393942b 100644
--- a/hachoir/parser/misc/mapsforge_map.py
+++ b/hachoir/parser/misc/mapsforge_map.py
@@ -41,7 +41,7 @@ class UIntVbe(Field):
size += 1
assert size < 100, "UIntVBE is too large"
- if not(haveMoreData):
+ if not haveMoreData:
break
self._size = size * 8
@@ -71,7 +71,7 @@ class IntVbe(Field):
size += 1
assert size < 100, "IntVBE is too large"
- if not(haveMoreData):
+ if not haveMoreData:
break
if isNegative:
@@ -142,7 +142,7 @@ class TileHeader(FieldSet):
def createFields(self):
numLevels = int(self.zoomIntervalCfg[
"max_zoom_level"].value - self.zoomIntervalCfg["min_zoom_level"].value) + 1
- assert(numLevels < 50)
+ assert (numLevels < 50)
for i in range(numLevels):
yield TileZoomTable(self, "zoom_table_entry[]")
yield UIntVbe(self, "first_way_offset")
diff --git a/hachoir/parser/misc/ole2.py b/hachoir/parser/misc/ole2.py
index 74b2168e..bfc1f7d8 100644
--- a/hachoir/parser/misc/ole2.py
+++ b/hachoir/parser/misc/ole2.py
@@ -211,7 +211,7 @@ class OLE2_File(HachoirParser, RootSeekableFieldSet):
return "Unknown major version (%s)" % self["header/ver_maj"].value
if self["header/endian"].value not in (b"\xFF\xFE", b"\xFE\xFF"):
return "Unknown endian (%s)" % self["header/endian"].raw_display
- if not(MIN_BIG_BLOCK_LOG2 <= self["header/bb_shift"].value <= MAX_BIG_BLOCK_LOG2):
+ if not (MIN_BIG_BLOCK_LOG2 <= self["header/bb_shift"].value <= MAX_BIG_BLOCK_LOG2):
return "Invalid (log 2 of) big block size (%s)" % self["header/bb_shift"].value
if self["header/bb_shift"].value < self["header/sb_shift"].value:
return "Small block size (log2=%s) is bigger than big block size (log2=%s)!" \
diff --git a/hachoir/parser/misc/pdf.py b/hachoir/parser/misc/pdf.py
index dc934bfe..2fccc6a1 100644
--- a/hachoir/parser/misc/pdf.py
+++ b/hachoir/parser/misc/pdf.py
@@ -392,7 +392,7 @@ class CrossReferenceTable(FieldSet):
FieldSet.__init__(self, parent, name, description=desc)
pos = self.stream.searchBytesLength(Trailer.MAGIC, False)
if pos is None:
- raise ParserError("Can't find '%s' starting at %u"
+ raise ParserError("Can't find '%s' starting at %u" %
(Trailer.MAGIC, self.absolute_address // 8))
self._size = 8 * pos - self.absolute_address
diff --git a/hachoir/parser/misc/ttf.py b/hachoir/parser/misc/ttf.py
index ac374658..ca5e7c49 100644
--- a/hachoir/parser/misc/ttf.py
+++ b/hachoir/parser/misc/ttf.py
@@ -2,6 +2,8 @@
TrueType Font parser.
Documents:
+ - "The OpenType Specification"
+ https://docs.microsoft.com/en-us/typography/opentype/spec/
- "An Introduction to TrueType Fonts: A look inside the TTF format"
written by "NRSI: Computers & Writing Systems"
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IWS-Chapter08
@@ -11,11 +13,26 @@ Creation date: 2007-02-08
"""
from hachoir.parser import Parser
-from hachoir.field import (FieldSet, ParserError,
- UInt16, UInt32, Bit, Bits,
- PaddingBits, NullBytes,
- String, RawBytes, Bytes, Enum,
- TimestampMac32)
+from hachoir.field import (
+ FieldSet,
+ ParserError,
+ UInt8,
+ UInt16,
+ UInt24,
+ UInt32,
+ Int16,
+ Bit,
+ Bits,
+ PaddingBits,
+ NullBytes,
+ String,
+ RawBytes,
+ Bytes,
+ Enum,
+ TimestampMac32,
+ GenericVector,
+ PascalString8,
+)
from hachoir.core.endian import BIG_ENDIAN
from hachoir.core.text_handler import textHandler, hexadecimal, filesizeHandler
@@ -69,11 +86,65 @@ CHARSET_MAP = {
3: {1: "UTF-16-BE"},
}
+PERMISSIONS = {
+ 0: "Installable embedding",
+ 2: "Restricted License embedding",
+ 4: "Preview & Print embedding",
+ 8: "Editable embedding",
+}
-class TableHeader(FieldSet):
+FWORD = Int16
+UFWORD = UInt16
+
+
+class Tag(String):
+ def __init__(self, parent, name, description=None):
+ String.__init__(self, parent, name, 4, description)
+
+
+class Version16Dot16(FieldSet):
+ static_size = 32
def createFields(self):
- yield String(self, "tag", 4)
+ yield UInt16(self, "major")
+ yield UInt16(self, "minor")
+
+ def createValue(self):
+ return float("%u.%x" % (self["major"].value, self["minor"].value))
+
+
+class Fixed(FieldSet):
+ def createFields(self):
+ yield UInt16(self, "int_part")
+ yield UInt16(self, "float_part")
+
+ def createValue(self):
+ return self["int_part"].value + float(self["float_part"].value) / 65536
+
+
+class Tuple(FieldSet):
+ def __init__(self, parent, name, axisCount):
+ super().__init__(parent, name, description="Tuple Record")
+ self.axisCount = axisCount
+
+ def createFields(self):
+ for _ in range(self.axisCount):
+ yield (Fixed(self, "coordinate[]"))
+
+
+class F2DOT14(FieldSet):
+ static_size = 16
+
+ def createFields(self):
+ yield Int16(self, "int_part")
+
+ def createValue(self):
+ return self["int_part"].value / 16384
+
+
+class TableHeader(FieldSet):
+ def createFields(self):
+ yield Tag(self, "tag")
yield textHandler(UInt32(self, "checksum"), hexadecimal)
yield UInt32(self, "offset")
yield filesizeHandler(UInt32(self, "size"))
@@ -83,7 +154,6 @@ class TableHeader(FieldSet):
class NameHeader(FieldSet):
-
def createFields(self):
yield Enum(UInt16(self, "platformID"), PLATFORM_NAME)
yield UInt16(self, "encodingID")
@@ -135,7 +205,7 @@ def parseFontHeader(self):
yield Bits(self, "adobe", 2, "(used by Adobe)")
yield UInt16(self, "unit_per_em", "Units per em")
- if not(16 <= self["unit_per_em"].value <= 16384):
+ if not (16 <= self["unit_per_em"].value <= 16384):
raise ParserError("TTF: Invalid unit/em value")
yield UInt32(self, "created_high")
yield TimestampMac32(self, "created")
@@ -162,17 +232,273 @@ def parseFontHeader(self):
yield UInt16(self, "glyph_format", "(=0)")
+class AxisValueMap(FieldSet):
+ static_size = 32
+
+ def createFields(self):
+ yield F2DOT14(self, "fromCoordinate")
+ yield F2DOT14(self, "toCoordinate")
+
+
+class SegmentMaps(FieldSet):
+ def createFields(self):
+ yield UInt16(
+ self, "positionMapCount", "The number of correspondence pairs for this axis"
+ )
+ for _ in range(self["positionMapCount"].value):
+ yield (AxisValueMap(self, "axisValueMaps[]"))
+
+
+def parseAvar(self):
+ yield UInt16(self, "majorVersion", "Major version")
+ yield UInt16(self, "minorVersion", "Minor version")
+ yield PaddingBits(self, "reserved[]", 16)
+ yield UInt16(self, "axisCount", "The number of variation axes for this font")
+ for _ in range(self["axisCount"].value):
+ yield (SegmentMaps(self, "segmentMaps[]"))
+
+
+class VariationAxisRecord(FieldSet):
+ def createFields(self):
+ yield Tag(self, "axisTag", "Tag identifying the design variation for the axis")
+ yield Fixed(self, "minValue", "The minimum coordinate value for the axis")
+ yield Fixed(self, "defaultValue", "The default coordinate value for the axis")
+ yield Fixed(self, "maxValue", "The maximum coordinate value for the axis")
+ yield PaddingBits(self, "reservedFlags", 15)
+ yield Bit(
+ self, "hidden", "The axis should not be exposed directly in user interfaces"
+ )
+ yield UInt16(
+ self,
+ "axisNameID",
+ "The name ID for entries in the 'name' table that provide a display name for this axis",
+ )
+
+
+class InstanceRecord(FieldSet):
+ def __init__(self, parent, name, axisCount, hasPSNameID=False):
+ super().__init__(parent, name, description="Instance record")
+ self.axisCount = axisCount
+ self.hasPSNameID = hasPSNameID
+
+ def createFields(self):
+ yield UInt16(
+ self, "subfamilyNameID", "Name ID for subfamily names for this instance"
+ )
+ yield PaddingBits(self, "reservedFlags", 16)
+ yield Tuple(self, "coordinates", axisCount=self.axisCount)
+ if self.hasPSNameID:
+ yield UInt16(
+ self,
+ "postScriptNameID",
+ "Name ID for PostScript names for this instance",
+ )
+
+
+def parseFvar(self):
+ yield UInt16(self, "majorVersion", "Major version")
+ yield UInt16(self, "minorVersion", "Minor version")
+ yield UInt16(
+ self, "axisArrayOffset", "Offset to the start of the VariationAxisRecord array."
+ )
+ yield PaddingBits(self, "reserved[]", 16)
+ yield UInt16(self, "axisCount", "The number of variation axes for this font")
+ yield UInt16(self, "axisSize", "The size in bytes of each VariationAxisRecord")
+ yield UInt16(self, "instanceCount", "The number of named instances for this font")
+ yield UInt16(self, "instanceSize", "The size in bytes of each InstanceRecord")
+ if self["axisArrayOffset"].value > 16:
+ yield PaddingBits(self, "padding", 8 * (self["axisArrayOffset"].value - 16))
+ for _ in range(self["axisCount"].value):
+ yield (VariationAxisRecord(self, "axes[]"))
+ for _ in range(self["instanceCount"].value):
+ yield (
+ InstanceRecord(
+ self,
+ "instances[]",
+ axisCount=self["axisCount"].value,
+ hasPSNameID=(
+ self["instanceSize"].value == (2 * self["axisCount"].value + 6)
+ ),
+ )
+ )
+
+
+class EncodingRecord(FieldSet):
+ static_size = 64
+
+ def createFields(self):
+ yield Enum(UInt16(self, "platformID"), PLATFORM_NAME)
+ yield UInt16(self, "encodingID")
+ self.offset = UInt32(self, "subtableOffset")
+ yield self.offset
+
+
+class CmapTable0(FieldSet):
+ def createFields(self):
+ yield UInt16(self, "format", "Table format")
+ yield UInt16(self, "length", "Length in bytes")
+ yield UInt16(self, "language", "Language ID")
+ yield GenericVector(self, "mapping", 256, UInt8)
+
+
+class CmapTable4(FieldSet):
+ def createFields(self):
+ yield UInt16(self, "format", "Table format")
+ yield UInt16(self, "length", "Length in bytes")
+ yield UInt16(self, "language", "Language ID")
+ yield UInt16(self, "segCountX2", "Twice the number of segments")
+ segments = self["segCountX2"].value // 2
+ yield UInt16(self, "searchRange")
+ yield UInt16(self, "entrySelector")
+ yield UInt16(self, "rangeShift")
+ yield GenericVector(self, "endCode", segments, UInt16)
+ yield PaddingBits(self, "reserved[]", 16)
+ yield GenericVector(self, "startCode", segments, UInt16)
+ yield GenericVector(self, "idDelta", segments, Int16)
+ yield GenericVector(self, "idRangeOffsets", segments, UInt16)
+ remainder = (self["length"].value - (self.current_size / 8)) / 2
+ if remainder:
+ yield GenericVector(self, "glyphIdArray", remainder, UInt16)
+
+
+class CmapTable6(FieldSet):
+ def createFields(self):
+ yield UInt16(self, "format", "Table format")
+ yield UInt16(self, "length", "Length in bytes")
+ yield UInt16(self, "language", "Language ID")
+ yield UInt16(self, "firstCode", "First character code of subrange")
+ yield UInt16(self, "entryCount", "Number of character codes in subrange")
+ yield GenericVector(self, "glyphIdArray", self["entryCount"].value, UInt16)
+
+
+class SequentialMapGroup(FieldSet):
+ def createFields(self):
+ yield UInt32(self, "startCharCode", "First character code in this group")
+ yield UInt32(self, "endCharCode", "First character code in this group")
+ yield UInt32(
+ self,
+ "startGlyphID",
+ "Glyph index corresponding to the starting character code",
+ )
+
+
+class CmapTable12(FieldSet):
+ def createFields(self):
+ yield UInt16(self, "format", "Table format")
+ yield PaddingBits(self, "reserved[]", 16)
+ yield UInt32(self, "length", "Length in bytes")
+ yield UInt32(self, "language", "Language ID")
+ yield UInt32(self, "numGroups", "Number of groupings which follow")
+ for i in range(self["numGroups"].value):
+ yield SequentialMapGroup(self, "mapgroup[]")
+
+
+class VariationSelector(FieldSet):
+ def createFields(self):
+ yield UInt24(self, "varSelector", "Variation selector")
+ yield UInt32(self, "defaultUVSOffset", "Offset to default UVS table")
+ yield UInt32(self, "nonDefaultUVSOffset", "Offset to non-default UVS table")
+
+
+class CmapTable14(FieldSet):
+ def createFields(self):
+ yield UInt16(self, "format", "Table format")
+ yield UInt32(self, "length", "Length in bytes")
+ yield UInt32(
+ self, "numVarSelectorRecords", "Number of variation selector records"
+ )
+ for i in range(self["numVarSelectorRecords"].value):
+ yield VariationSelector(self, "variationSelector[]")
+
+
+def parseCmap(self):
+ yield UInt16(self, "version")
+ numTables = UInt16(self, "numTables", "Number of encoding tables")
+ yield numTables
+ encodingRecords = []
+ for index in range(numTables.value):
+ entry = EncodingRecord(self, "encodingRecords[]")
+ yield entry
+ encodingRecords.append(entry)
+ encodingRecords.sort(key=lambda field: field["subtableOffset"].value)
+ last = None
+ for er in encodingRecords:
+ offset = er["subtableOffset"].value
+ if last and last == offset:
+ continue
+ last = offset
+
+ # Add padding if any
+ padding = self.seekByte(offset, relative=True, null=False)
+ if padding:
+ yield padding
+ format = UInt16(self, "format").value
+ if format == 0:
+ yield CmapTable0(self, "cmap table format 0")
+ elif format == 4:
+ yield CmapTable4(self, "cmap table format 4")
+ elif format == 6:
+ yield CmapTable6(self, "cmap table format 6")
+ elif format == 12:
+ yield CmapTable12(self, "cmap table format 12")
+ elif format == 14:
+ yield CmapTable14(self, "cmap table format 14")
+
+
+class SignatureRecord(FieldSet):
+ def createFields(self):
+ yield UInt16(self, "format", "Table format")
+ yield UInt16(self, "length", "Length of signature")
+ yield UInt16(self, "signatureBlockOffset", "Offset to signature block")
+
+
+class SignatureBlock(FieldSet):
+ def createFields(self):
+ yield PaddingBits(self, "reserved[]", 32)
+ yield UInt32(
+ self,
+ "length",
+ "Length (in bytes) of the PKCS#7 packet in the signature field",
+ )
+ yield String(self, "signature", self["length"].value, "Signature block")
+
+
+def parseDSIG(self):
+ yield UInt32(self, "version")
+ yield UInt16(self, "numSignatures", "Number of signatures in the table")
+ yield Bit(self, "flag", "Cannot be resigned")
+ yield PaddingBits(self, "reserved[]", 7)
+ entries = []
+ for i in range(self["numSignatures"].value):
+ record = SignatureRecord(self, "signatureRecords[]")
+ entries.append(record)
+ yield record
+ entries.sort(key=lambda field: field["signatureBlockOffset"].value)
+ last = None
+ for entry in entries:
+ offset = entry["signatureBlockOffset"].value
+ if last and last == offset:
+ continue
+ last = offset
+ # Add padding if any
+ padding = self.seekByte(offset, relative=True, null=False)
+ if padding:
+ yield padding
+
+ padding = (self.size - self.current_size) // 8
+ if padding:
+ yield NullBytes(self, "padding_end", padding)
+
+
def parseNames(self):
# Read header
yield UInt16(self, "format")
if self["format"].value != 0:
- raise ParserError("TTF (names): Invalid format (%u)" %
- self["format"].value)
+ raise ParserError("TTF (names): Invalid format (%u)" % self["format"].value)
yield UInt16(self, "count")
yield UInt16(self, "offset")
if MAX_NAME_COUNT < self["count"].value:
- raise ParserError("Invalid number of names (%s)"
- % self["count"].value)
+ raise ParserError("Invalid number of names (%s)" % self["count"].value)
# Read name index
entries = []
@@ -208,17 +534,210 @@ def parseNames(self):
# Read value
size = entry["length"].value
if size:
- yield String(self, "value[]", size, entry.description, charset=entry.getCharset())
+ yield String(
+ self, "value[]", size, entry.description, charset=entry.getCharset()
+ )
padding = (self.size - self.current_size) // 8
if padding:
yield NullBytes(self, "padding_end", padding)
+def parseMaxp(self):
+ # Read header
+ yield Version16Dot16(self, "format", "format version")
+ yield UInt16(self, "numGlyphs", "Number of glyphs")
+ if self["format"].value >= 1:
+ yield UInt16(self, "maxPoints", "Maximum points in a non-composite glyph")
+ yield UInt16(self, "maxContours", "Maximum contours in a non-composite glyph")
+ yield UInt16(self, "maxCompositePoints", "Maximum points in a composite glyph")
+ yield UInt16(
+ self, "maxCompositeContours", "Maximum contours in a composite glyph"
+ )
+ yield UInt16(self, "maxZones", "Do instructions use the twilight zone?")
+ yield UInt16(self, "maxTwilightPoints", "Maximum points used in Z0")
+ yield UInt16(self, "maxStorage", "Number of Storage Area locations")
+ yield UInt16(self, "maxFunctionDefs", "Number of function definitions")
+ yield UInt16(self, "maxInstructionDefs", "Number of instruction definitions")
+ yield UInt16(self, "maxStackElements", "Maximum stack depth")
+ yield UInt16(
+ self, "maxSizeOfInstructions", "Maximum byte count for glyph instructions"
+ )
+ yield UInt16(
+ self,
+ "maxComponentElements",
+ "Maximum number of components at glyph top level",
+ )
+ yield UInt16(self, "maxComponentDepth", "Maximum level of recursion")
+
+
+def parseHhea(self):
+ yield UInt16(self, "majorVersion", "Major version")
+ yield UInt16(self, "minorVersion", "Minor version")
+ yield FWORD(self, "ascender", "Typographic ascent")
+ yield FWORD(self, "descender", "Typographic descent")
+ yield FWORD(self, "lineGap", "Typographic linegap")
+ yield UFWORD(self, "advanceWidthMax", "Maximum advance width")
+ yield FWORD(self, "minLeftSideBearing", "Minimum left sidebearing value")
+ yield FWORD(self, "minRightSideBearing", "Minimum right sidebearing value")
+ yield FWORD(self, "xMaxExtent", "Maximum X extent")
+ yield Int16(self, "caretSlopeRise", "Caret slope rise")
+ yield Int16(self, "caretSlopeRun", "Caret slope run")
+ yield Int16(self, "caretOffset", "Caret offset")
+ yield GenericVector(self, "reserved", 4, Int16)
+ yield Int16(self, "metricDataFormat", "Metric data format")
+ yield UInt16(self, "numberOfHMetrics", "Number of horizontal metrics")
+
+
+class fsType(FieldSet):
+ def createFields(self):
+ yield Enum(Bits(self, "usage_permissions", 4), PERMISSIONS)
+ yield PaddingBits(self, "reserved[]", 4)
+ yield Bit(self, "no_subsetting", "Font may not be subsetted prior to embedding")
+ yield Bit(
+ self,
+ "bitmap_embedding",
+ "Only bitmaps contained in the font may be embedded",
+ )
+ yield PaddingBits(self, "reserved[]", 6)
+
+
+def parseOS2(self):
+ yield UInt16(self, "version", "Table version")
+ yield Int16(self, "xAvgCharWidth")
+ yield UInt16(self, "usWeightClass")
+ yield UInt16(self, "usWidthClass")
+ yield fsType(self, "fsType")
+ yield Int16(self, "ySubscriptXSize")
+ yield Int16(self, "ySubscriptYSize")
+ yield Int16(self, "ySubscriptXOffset")
+ yield Int16(self, "ySubscriptYOffset")
+ yield Int16(self, "ySuperscriptXSize")
+ yield Int16(self, "ySuperscriptYSize")
+ yield Int16(self, "ySuperscriptXOffset")
+ yield Int16(self, "ySuperscriptYOffset")
+ yield Int16(self, "yStrikeoutSize")
+ yield Int16(self, "yStrikeoutPosition")
+ yield Int16(self, "sFamilyClass")
+ yield GenericVector(self, "panose", 10, UInt8)
+ yield UInt32(self, "ulUnicodeRange1")
+ yield UInt32(self, "ulUnicodeRange2")
+ yield UInt32(self, "ulUnicodeRange3")
+ yield UInt32(self, "ulUnicodeRange4")
+ yield Tag(self, "achVendID", "Vendor ID")
+ yield UInt16(self, "fsSelection")
+ yield UInt16(self, "usFirstCharIndex")
+ yield UInt16(self, "usLastCharIndex")
+ yield Int16(self, "sTypoAscender")
+ yield Int16(self, "sTypoDescender")
+ yield Int16(self, "sTypoLineGap")
+ yield UInt16(self, "usWinAscent")
+ yield UInt16(self, "usWinDescent")
+ if self["version"].value >= 1:
+ yield UInt32(self, "ulCodePageRange1")
+ yield UInt32(self, "ulCodePageRange2")
+ if self["version"].value >= 2:
+ yield Int16(self, "sxHeight")
+ yield Int16(self, "sCapHeight")
+ yield UInt16(self, "usDefaultChar")
+ yield UInt16(self, "usBreakChar")
+ yield UInt16(self, "usMaxContext")
+ if self["version"].value >= 5:
+ yield UInt16(self, "usLowerOpticalPointSize")
+ yield UInt16(self, "usUpperOpticalPointSize")
+
+
+def parsePost(self):
+ yield Version16Dot16(self, "version", "Table version")
+ yield Fixed(
+ self,
+ "italicAngle",
+ "Italic angle in counter-clockwise degrees from the vertical.",
+ )
+ yield FWORD(self, "underlinePosition", "Top of underline to baseline")
+ yield FWORD(self, "underlineThickness", "Suggested underline thickness")
+ yield UInt32(self, "isFixedPitch", "Is the font fixed pitch?")
+ yield UInt32(self, "minMemType42", "Minimum memory usage (OpenType)")
+ yield UInt32(self, "maxMemType42", "Maximum memory usage (OpenType)")
+ yield UInt32(self, "minMemType1", "Minimum memory usage (Type 1)")
+ yield UInt32(self, "maxMemType1", "Maximum memory usage (Type 1)")
+ if self["version"].value == 2.0:
+ yield UInt16(self, "numGlyphs")
+ indices = GenericVector(
+ self,
+ "Array of indices into the string data",
+ self["numGlyphs"].value,
+ UInt16,
+ "glyphNameIndex",
+ )
+ yield indices
+ for gid, index in enumerate(indices):
+ if index.value >= 258:
+ yield PascalString8(self, "glyphname[%i]" % gid)
+ elif self["version"].value == 2.0:
+ yield UInt16(self, "numGlyphs")
+ indices = GenericVector(
+ self,
+ "Difference between graphic index and standard order of glyph",
+ self["numGlyphs"].value,
+ UInt16,
+ "offset",
+ )
+ yield indices
+
+
+# This is work-in-progress until I work out good ways to do random-access on offsets
+parseScriptList = (
+ parseFeatureList
+) = parseLookupList = parseFeatureVariationsTable = lambda x: None
+
+
+def parseGSUB(self):
+ yield UInt16(self, "majorVersion", "Major version")
+ yield UInt16(self, "minorVersion", "Minor version")
+ SUBTABLES = [
+ ("script list", parseScriptList),
+ ("feature list", parseFeatureList),
+ ("lookup list", parseLookupList),
+ ]
+ offsets = []
+ for description, parser in SUBTABLES:
+ name = description.title().replace(" ", "")
+ offset = UInt16(
+ self, name[0].lower() + name[1:], "Offset to %s table" % description
+ )
+ yield offset
+ offsets.append((offset.value, parser))
+ if self["min_ver"].value == 1:
+ offset = UInt32(
+ self, "featureVariationsOffset", "Offset to feature variations table"
+ )
+ offsets.append((offset.value, parseFeatureVariationsTable))
+
+ offsets.sort(key=lambda field: field[0])
+ padding = self.seekByte(offsets[0][0], null=True)
+ if padding:
+ yield padding
+ lastOffset, first_parser = offsets[0]
+ for offset, parser in offsets[1:]:
+ # yield parser(self)
+ yield RawBytes(self, "content", offset - lastOffset)
+ lastOffset = offset
+
+
class Table(FieldSet):
TAG_INFO = {
+ "DSIG": ("DSIG", "Digital Signature", parseDSIG),
+ "GSUB": ("GSUB", "Glyph Substitutions", parseGSUB),
+ "avar": ("avar", "Axis variation table", parseAvar),
+ "cmap": ("cmap", "Character to Glyph Index Mapping", parseCmap),
+ "fvar": ("fvar", "Font variations table", parseFvar),
"head": ("header", "Font header", parseFontHeader),
+ "hhea": ("hhea", "Horizontal Header", parseHhea),
+ "maxp": ("maxp", "Maximum Profile", parseMaxp),
"name": ("names", "Names", parseNames),
+ "OS/2": ("OS_2", "OS/2 and Windows Metrics", parseOS2),
+ "post": ("post", "PostScript", parsePost),
}
def __init__(self, parent, name, table, **kw):
@@ -251,10 +770,15 @@ class TrueTypeFontFile(Parser):
}
def validate(self):
- if self["maj_ver"].value != 1:
- return "Invalid major version (%u)" % self["maj_ver"].value
- if self["min_ver"].value != 0:
- return "Invalid minor version (%u)" % self["min_ver"].value
+ if self["maj_ver"].value == 1 and self["min_ver"].value == 0:
+ pass
+ elif self["maj_ver"].value == 0x4F54 and self["min_ver"].value == 0x544F:
+ pass
+ else:
+ return "Invalid version (%u.%u)" % (
+ self["maj_ver"].value,
+ self["min_ver"].value,
+ )
if not (MIN_NB_TABLE <= self["nb_table"].value <= MAX_NB_TABLE):
return "Invalid number of table (%u)" % self["nb_table"].value
return True
diff --git a/hachoir/parser/parser.py b/hachoir/parser/parser.py
index 1ec1b5e8..a00bf76f 100644
--- a/hachoir/parser/parser.py
+++ b/hachoir/parser/parser.py
@@ -13,7 +13,7 @@ class HachoirParser(object):
"""
A parser is the root of all other fields. It create first level of fields
and have special attributes and methods:
- - tags: dictionnary with keys:
+ - tags: dictionary with keys:
- "file_ext": classical file extensions (string or tuple of strings) ;
- "mime": MIME type(s) (string or tuple of strings) ;
- "description": String describing the parser.
diff --git a/hachoir/parser/program/python.py b/hachoir/parser/program/python.py
index f2d3127c..bd2d905f 100644
--- a/hachoir/parser/program/python.py
+++ b/hachoir/parser/program/python.py
@@ -10,10 +10,12 @@ Creation: 25 march 2005
"""
from hachoir.parser import Parser
-from hachoir.field import (FieldSet, UInt8,
- UInt16, Int32, UInt32, Int64, ParserError, Float64,
- Character, RawBytes, PascalString8, TimestampUnix32,
- Bit, String)
+from hachoir.field import (
+ FieldSet, UInt8,
+ UInt16, Int32, UInt32, Int64, UInt64,
+ ParserError, Float64,
+ Character, RawBytes, PascalString8, TimestampUnix32,
+ Bit, String, NullBits)
from hachoir.core.endian import LITTLE_ENDIAN
from hachoir.core.bits import long2raw
from hachoir.core.text_handler import textHandler, hexadecimal
@@ -152,13 +154,17 @@ def parseShortASCII(parent):
def parseCode(parent):
- if 0x3000000 <= parent.root.getVersion():
+ version = parent.root.getVersion()
+ if 0x3000000 <= version:
yield UInt32(parent, "arg_count", "Argument count")
+ if 0x3080000 <= version:
+ yield UInt32(parent, "posonlyargcount", "Positional only argument count")
yield UInt32(parent, "kwonlyargcount", "Keyword only argument count")
- yield UInt32(parent, "nb_locals", "Number of local variables")
+ if version < 0x30B0000:
+ yield UInt32(parent, "nb_locals", "Number of local variables")
yield UInt32(parent, "stack_size", "Stack size")
yield UInt32(parent, "flags")
- elif 0x2030000 <= parent.root.getVersion():
+ elif 0x2030000 <= version:
yield UInt32(parent, "arg_count", "Argument count")
yield UInt32(parent, "nb_locals", "Number of local variables")
yield UInt32(parent, "stack_size", "Stack size")
@@ -168,20 +174,34 @@ def parseCode(parent):
yield UInt16(parent, "nb_locals", "Number of local variables")
yield UInt16(parent, "stack_size", "Stack size")
yield UInt16(parent, "flags")
+
yield Object(parent, "compiled_code")
yield Object(parent, "consts")
yield Object(parent, "names")
- yield Object(parent, "varnames")
- if 0x2000000 <= parent.root.getVersion():
- yield Object(parent, "freevars")
- yield Object(parent, "cellvars")
+ if 0x30B0000 <= version:
+ yield Object(parent, "co_localsplusnames")
+ yield Object(parent, "co_localspluskinds")
+ else:
+ yield Object(parent, "varnames")
+ if 0x2000000 <= version:
+ yield Object(parent, "freevars")
+ yield Object(parent, "cellvars")
+
yield Object(parent, "filename")
yield Object(parent, "name")
- if 0x2030000 <= parent.root.getVersion():
+ if 0x30B0000 <= version:
+ yield Object(parent, "qualname")
+
+ if 0x2030000 <= version:
yield UInt32(parent, "firstlineno", "First line number")
else:
yield UInt16(parent, "firstlineno", "First line number")
- yield Object(parent, "lnotab")
+ if 0x30A0000 <= version:
+ yield Object(parent, "linetable")
+ if 0x30B0000 <= version:
+ yield Object(parent, "exceptiontable")
+ else:
+ yield Object(parent, "lnotab")
class Object(FieldSet):
@@ -301,6 +321,16 @@ class BytecodeChar(Character):
static_size = 7
+PY_RELEASE_LEVEL_ALPHA = 0xA
+PY_RELEASE_LEVEL_FINAL = 0xF
+
+
+def VERSION(major, minor, release_level=PY_RELEASE_LEVEL_FINAL, serial=0):
+ micro = 0
+ return ((major << 24) + (minor << 16) + (micro << 8)
+ + (release_level << 4) + (serial << 0))
+
+
class PythonCompiledFile(Parser):
PARSER_TAGS = {
"id": "python",
@@ -394,7 +424,90 @@ class PythonCompiledFile(Parser):
3377: ("Python 3.6b1 ", 0x3060000),
3378: ("Python 3.6b2 ", 0x3060000),
3379: ("Python 3.6rc1", 0x3060000),
- 3390: ("Python 3.7a0 ", 0x3070000),
+ 3390: ("Python 3.7a1", 0x30700A1),
+ 3391: ("Python 3.7a2", 0x30700A2),
+ 3392: ("Python 3.7a4", 0x30700A4),
+ 3393: ("Python 3.7b1", 0x30700B1),
+ 3394: ("Python 3.7b5", 0x30700B5),
+ 3400: ("Python 3.8a1", VERSION(3, 8)),
+ 3401: ("Python 3.8a1", VERSION(3, 8)),
+ 3410: ("Python 3.8a1", VERSION(3, 8)),
+ 3411: ("Python 3.8b2", VERSION(3, 8)),
+ 3412: ("Python 3.8b2", VERSION(3, 8)),
+ 3413: ("Python 3.8b4", VERSION(3, 8)),
+ 3420: ("Python 3.9a0", VERSION(3, 9)),
+ 3421: ("Python 3.9a0", VERSION(3, 9)),
+ 3422: ("Python 3.9a0", VERSION(3, 9)),
+ 3423: ("Python 3.9a2", VERSION(3, 9)),
+ 3424: ("Python 3.9a2", VERSION(3, 9)),
+ 3425: ("Python 3.9a2", VERSION(3, 9)),
+ 3430: ("Python 3.10a1", VERSION(3, 10)),
+ 3431: ("Python 3.10a1", VERSION(3, 10)),
+ 3432: ("Python 3.10a2", VERSION(3, 10)),
+ 3433: ("Python 3.10a2", VERSION(3, 10)),
+ 3434: ("Python 3.10a6", VERSION(3, 10)),
+ 3435: ("Python 3.10a7", VERSION(3, 10)),
+ 3436: ("Python 3.10b1", VERSION(3, 10)),
+ 3437: ("Python 3.10b1", VERSION(3, 10)),
+ 3438: ("Python 3.10b1", VERSION(3, 10)),
+ 3439: ("Python 3.10b1", VERSION(3, 10)),
+ 3450: ("Python 3.11a1", VERSION(3, 11)),
+ 3451: ("Python 3.11a1", VERSION(3, 11)),
+ 3452: ("Python 3.11a1", VERSION(3, 11)),
+ 3453: ("Python 3.11a1", VERSION(3, 11)),
+ 3454: ("Python 3.11a1", VERSION(3, 11)),
+ 3455: ("Python 3.11a1", VERSION(3, 11)),
+ 3456: ("Python 3.11a1", VERSION(3, 11)),
+ 3457: ("Python 3.11a1", VERSION(3, 11)),
+ 3458: ("Python 3.11a1", VERSION(3, 11)),
+ 3459: ("Python 3.11a1", VERSION(3, 11)),
+ 3460: ("Python 3.11a1", VERSION(3, 11)),
+ 3461: ("Python 3.11a1", VERSION(3, 11)),
+ 3462: ("Python 3.11a2", VERSION(3, 11)),
+ 3463: ("Python 3.11a3", VERSION(3, 11)),
+ 3464: ("Python 3.11a3", VERSION(3, 11)),
+ 3465: ("Python 3.11a3", VERSION(3, 11)),
+ 3466: ("Python 3.11a4", VERSION(3, 11)),
+ 3467: ("Python 3.11a4", VERSION(3, 11)),
+ 3468: ("Python 3.11a4", VERSION(3, 11)),
+ 3469: ("Python 3.11a4", VERSION(3, 11)),
+ 3470: ("Python 3.11a4", VERSION(3, 11)),
+ 3471: ("Python 3.11a4", VERSION(3, 11)),
+ 3472: ("Python 3.11a4", VERSION(3, 11)),
+ 3473: ("Python 3.11a4", VERSION(3, 11)),
+ 3474: ("Python 3.11a4", VERSION(3, 11)),
+ 3475: ("Python 3.11a5", VERSION(3, 11)),
+ 3476: ("Python 3.11a5", VERSION(3, 11)),
+ 3477: ("Python 3.11a5", VERSION(3, 11)),
+ 3478: ("Python 3.11a5", VERSION(3, 11)),
+ 3479: ("Python 3.11a5", VERSION(3, 11)),
+ 3480: ("Python 3.11a5", VERSION(3, 11)),
+ 3481: ("Python 3.11a5", VERSION(3, 11)),
+ 3482: ("Python 3.11a5", VERSION(3, 11)),
+ 3483: ("Python 3.11a5", VERSION(3, 11)),
+ 3484: ("Python 3.11a5", VERSION(3, 11)),
+ 3485: ("Python 3.11a5", VERSION(3, 11)),
+ 3486: ("Python 3.11a6", VERSION(3, 11)),
+ 3487: ("Python 3.11a6", VERSION(3, 11)),
+ 3488: ("Python 3.11a6", VERSION(3, 11)),
+ 3489: ("Python 3.11a6", VERSION(3, 11)),
+ 3490: ("Python 3.11a6", VERSION(3, 11)),
+ 3491: ("Python 3.11a6", VERSION(3, 11)),
+ 3492: ("Python 3.11a7", VERSION(3, 11)),
+ 3493: ("Python 3.11a7", VERSION(3, 11)),
+ 3494: ("Python 3.11a7", VERSION(3, 11)),
+ 3500: ("Python 3.12a1", VERSION(3, 12)),
+ 3501: ("Python 3.12a1", VERSION(3, 12)),
+ 3502: ("Python 3.12a1", VERSION(3, 12)),
+ 3503: ("Python 3.12a1", VERSION(3, 12)),
+ 3504: ("Python 3.12a1", VERSION(3, 12)),
+ 3505: ("Python 3.12a1", VERSION(3, 12)),
+ 3506: ("Python 3.12a1", VERSION(3, 12)),
+ 3507: ("Python 3.12a1", VERSION(3, 12)),
+ 3508: ("Python 3.12a1", VERSION(3, 12)),
+ 3509: ("Python 3.12a1", VERSION(3, 12)),
+ 3510: ("Python 3.12a1", VERSION(3, 12)),
+ 3511: ("Python 3.12a1", VERSION(3, 12)),
}
# Dictionnary which associate the pyc signature (4-byte long string)
@@ -411,13 +524,7 @@ class PythonCompiledFile(Parser):
if self["magic_string"].value != "\r\n":
return r"Wrong magic string (\r\n)"
- version = self.getVersion()
- if version >= 0x3030000 and self['magic_number'].value >= 3200:
- offset = 12
- else:
- offset = 8
- value = self.stream.readBits(offset * 8, 7, self.endian)
- if value != ord(b'c'):
+ if self["content/bytecode"].value != "c":
return "First object bytecode is not code"
return True
@@ -430,8 +537,22 @@ class PythonCompiledFile(Parser):
def createFields(self):
yield UInt16(self, "magic_number", "Magic number")
yield String(self, "magic_string", 2, r"Magic string \r\n", charset="ASCII")
- yield TimestampUnix32(self, "timestamp", "Timestamp")
version = self.getVersion()
- if version >= 0x3030000 and self['magic_number'].value >= 3200:
- yield UInt32(self, "filesize", "Size of the Python source file (.py) modulo 2**32")
+
+ # PEP 552: Deterministic pycs #31650 (Python 3.7a4); magic=3392
+ if version >= 0x30700A4:
+ yield Bit(self, "use_hash", "Is hash based?")
+ yield Bit(self, "checked")
+ yield NullBits(self, "reserved", 30)
+ use_hash = self['use_hash'].value
+ else:
+ use_hash = False
+
+ if use_hash:
+ yield UInt64(self, "hash")
+ else:
+ yield TimestampUnix32(self, "timestamp", "Timestamp modulo 2**32")
+ if version >= 0x3030000 and self['magic_number'].value >= 3200:
+ yield UInt32(self, "filesize", "Size of the Python source file (.py) modulo 2**32")
+
yield Object(self, "content")
diff --git a/hachoir/parser/video/asf.py b/hachoir/parser/video/asf.py
index 8da2d1ac..fc41624b 100644
--- a/hachoir/parser/video/asf.py
+++ b/hachoir/parser/video/asf.py
@@ -355,7 +355,7 @@ class AsfFile(Parser):
if self.stream.readBytes(0, len(magic)) != magic:
return "Invalid magic"
header = self[0]
- if not(30 <= header["size"].value <= MAX_HEADER_SIZE):
+ if not (30 <= header["size"].value <= MAX_HEADER_SIZE):
return "Invalid header size (%u)" % header["size"].value
return True
diff --git a/hachoir/parser/video/mpeg_ts.py b/hachoir/parser/video/mpeg_ts.py
index 8e4e8701..e626e70c 100644
--- a/hachoir/parser/video/mpeg_ts.py
+++ b/hachoir/parser/video/mpeg_ts.py
@@ -134,7 +134,7 @@ class MPEG_TS(Parser):
# FIXME: detect using file content, not file name
# maybe detect sync at offset+4 bytes?
source = self.stream.source
- if not(source and source.startswith("file:")):
+ if not (source and source.startswith("file:")):
return True
filename = source[5:].lower()
return filename.endswith((".m2ts", ".mts"))
diff --git a/hachoir/parser/video/mpeg_video.py b/hachoir/parser/video/mpeg_video.py
index 4ddc37f0..d77d758c 100644
--- a/hachoir/parser/video/mpeg_video.py
+++ b/hachoir/parser/video/mpeg_video.py
@@ -244,7 +244,7 @@ class PacketElement(FieldSet):
yield Bits(self, "sync[]", 4) # =2, or 3 if has_dts=True
yield Timestamp(self, "pts")
if self["has_dts"].value:
- if not(self["has_pts"].value):
+ if not self["has_pts"].value:
raise ParserError("Invalid PTS/DTS values")
yield Bits(self, "sync[]", 4) # =1
yield Timestamp(self, "dts")
diff --git a/hachoir/regex/parser.py b/hachoir/regex/parser.py
index f381459a..234c935f 100644
--- a/hachoir/regex/parser.py
+++ b/hachoir/regex/parser.py
@@ -164,7 +164,7 @@ def _parse(text, start=0, until=None):
if char == 'b':
new_regex = RegexWord()
else:
- if not(char in REGEX_COMMAND_CHARACTERS or char in " '"):
+ if not (char in REGEX_COMMAND_CHARACTERS or char in " '"):
raise SyntaxError(
"Operator '\\%s' is not supported" % char)
new_regex = RegexString(char)
diff --git a/hachoir/stream/input_helper.py b/hachoir/stream/input_helper.py
index 132dd670..ed9263e9 100644
--- a/hachoir/stream/input_helper.py
+++ b/hachoir/stream/input_helper.py
@@ -4,18 +4,23 @@ from hachoir.stream import InputIOStream, InputSubStream, InputStreamError
def FileInputStream(filename, real_filename=None, **args):
"""
- Create an input stream of a file. filename must be unicode.
+ Create an input stream of a file. filename must be unicode or a file
+ object.
real_filename is an optional argument used to specify the real filename,
its type can be 'str' or 'unicode'. Use real_filename when you are
not able to convert filename to real unicode string (ie. you have to
use unicode(name, 'replace') or unicode(name, 'ignore')).
"""
- assert isinstance(filename, str)
if not real_filename:
- real_filename = filename
+ real_filename = (filename if isinstance(filename, str)
+ else getattr(filename, 'name', ''))
try:
- inputio = open(real_filename, 'rb')
+ if isinstance(filename, str):
+ inputio = open(real_filename, 'rb')
+ else:
+ inputio = filename
+ filename = getattr(filename, 'name', '')
except IOError as err:
errmsg = str(err)
raise InputStreamError(
diff --git a/hachoir/stream/output.py b/hachoir/stream/output.py
index 6f62671c..4a9e1514 100644
--- a/hachoir/stream/output.py
+++ b/hachoir/stream/output.py
@@ -2,6 +2,7 @@ from io import StringIO
from hachoir.core.endian import BIG_ENDIAN, LITTLE_ENDIAN
from hachoir.core.bits import long2raw
from hachoir.stream import StreamError
+from hachoir.core import config
from errno import EBADF
MAX_READ_NBYTES = 2 ** 16
@@ -111,12 +112,12 @@ class OutputStream(object):
self.writeBytes(raw)
def copyBitsFrom(self, input, address, nb_bits, endian):
- if (nb_bits % 8) == 0:
+ if (nb_bits % 8) == 0 and (address % 8) == 0 and (self._bit_pos % 8) == 0:
self.copyBytesFrom(input, address, nb_bits // 8)
else:
# Arbitrary limit (because we should use a buffer, like copyBytesFrom(),
# but with endianess problem
- assert nb_bits <= 128
+ assert nb_bits <= config.max_bit_length
data = input.readBits(address, nb_bits, endian)
self.writeBits(nb_bits, data, endian)
diff --git a/hachoir/strip.py b/hachoir/strip.py
index 9b33cdeb..5db2868f 100644
--- a/hachoir/strip.py
+++ b/hachoir/strip.py
@@ -278,7 +278,7 @@ def main():
if parser:
editor = createEditor(parser)
ok &= stripEditor(editor, filename + ".new",
- level, not(values.quiet))
+ level, not values.quiet)
else:
ok = False
if ok:
diff --git a/hachoir/subfile/main.py b/hachoir/subfile/main.py
index a4c477e5..fe895819 100644
--- a/hachoir/subfile/main.py
+++ b/hachoir/subfile/main.py
@@ -85,7 +85,7 @@ def main():
stream = FileInputStream(filename)
with stream:
subfile = SearchSubfile(stream, values.offset, values.size)
- subfile.verbose = not(values.quiet)
+ subfile.verbose = not values.quiet
subfile.debug = values.debug
if output:
subfile.setOutput(output)
diff --git a/hachoir/subfile/search.py b/hachoir/subfile/search.py
index 9cb9ad98..f7ae929d 100644
--- a/hachoir/subfile/search.py
+++ b/hachoir/subfile/search.py
@@ -95,7 +95,7 @@ class SearchSubfile:
print("[!] Memory error!", file=stderr)
self.mainFooter()
self.stream.close()
- return not(main_error)
+ return (not main_error)
def mainHeader(self):
# Fix slice size if needed
@@ -149,7 +149,7 @@ class SearchSubfile:
if parser.content_size is not None:
text += " size=%s (%s)" % (parser.content_size //
8, humanFilesize(parser.content_size // 8))
- if not(parser.content_size) or parser.content_size // 8 < FILE_MAX_SIZE:
+ if not parser.content_size or parser.content_size // 8 < FILE_MAX_SIZE:
text += ": " + parser.description
else:
text += ": " + parser.__class__.__name__
diff --git a/hachoir/urwid.py b/hachoir/urwid.py
index 7839031c..7be693ed 100644
--- a/hachoir/urwid.py
+++ b/hachoir/urwid.py
@@ -295,7 +295,7 @@ class Walker(ListWalker):
text += "= %s" % display
if node.field.description and self.flags & self.display_description:
description = node.field.description
- if not(self.flags & self.human_size):
+ if not (self.flags & self.human_size):
description = makePrintable(description, "ASCII")
text += ": %s" % description
if self.flags & self.display_size and node.field.size or self.flags & self.display_type:
diff --git a/hachoir/wx/app.py b/hachoir/wx/app.py
index 1bd62d92..06b4b079 100644
--- a/hachoir/wx/app.py
+++ b/hachoir/wx/app.py
@@ -8,12 +8,12 @@ from hachoir.wx.dispatcher import dispatcher_t
from hachoir.wx import frame_view, field_view, hex_view, tree_view
from hachoir.wx.dialogs import file_open_dialog
from hachoir.wx.unicode import force_unicode
-from hachoir.version import VERSION
+from hachoir import __version__
class app_t(App):
def __init__(self, filename=None):
- print("[+] Run hachoir-wx version %s" % VERSION)
+ print("[+] Run hachoir-wx version %s" % __version__)
self.filename = filename
App.__init__(self, False)
diff --git a/hachoir/wx/field_view/stubs.py b/hachoir/wx/field_view/stubs.py
index fae03182..1f9e9e82 100644
--- a/hachoir/wx/field_view/stubs.py
+++ b/hachoir/wx/field_view/stubs.py
@@ -32,7 +32,7 @@ def field_type_name(field):
def convert_size(from_field, to_type):
- if not(('Byte' in field_type_name(from_field)) ^ ('Byte' in to_type.__name__)):
+ if not (('Byte' in field_type_name(from_field)) ^ ('Byte' in to_type.__name__)):
return from_field.size
elif 'Byte' in field_type_name(from_field):
return from_field.size * 8
diff --git a/hachoir/wx/hex_view/hex_view.py b/hachoir/wx/hex_view/hex_view.py
index 69d06c7f..7cfde491 100644
--- a/hachoir/wx/hex_view/hex_view.py
+++ b/hachoir/wx/hex_view/hex_view.py
@@ -1,6 +1,12 @@
import wx
from .file_cache import FileCache
+try:
+ import darkdetect
+ darkmode = darkdetect.isDark()
+except ImportError:
+ darkmode = False
+
textchars = set('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ ')
text_view_transtable = bytes([c if chr(c) in textchars else ord('.') for c in range(256)])
@@ -168,7 +174,10 @@ class hex_view_t(wx.ScrolledWindow):
# Draw "textbox" rects under the hex and text views
dc.SetPen(wx.NullPen)
- dc.SetBrush(wx.WHITE_BRUSH)
+ if darkmode:
+ dc.SetBrush(wx.BLACK_BRUSH)
+ else:
+ dc.SetBrush(wx.WHITE_BRUSH)
dc.DrawRectangle(lo.boxstart('hex'), 0, lo.boxwidth('hex'), h)
dc.DrawRectangle(lo.boxstart('text'), 0, lo.boxwidth('text'), h)
diff --git a/hachoir/wx/main.py b/hachoir/wx/main.py
index 8b39519b..b902023a 100755
--- a/hachoir/wx/main.py
+++ b/hachoir/wx/main.py
@@ -1,7 +1,7 @@
#!/usr/bin/env python3
from hachoir.wx.app import app_t
-from hachoir.version import PACKAGE, VERSION, WEBSITE
+from hachoir import __version__
from hachoir.core.cmd_line import getHachoirOptions, configureHachoir
from optparse import OptionParser
import sys
@@ -24,8 +24,7 @@ def parseOptions():
def main():
- print("%s version %s" % (PACKAGE, VERSION))
- print(WEBSITE)
+ print("hachoir version %s" % __version__)
print()
values, filename = parseOptions()
configureHachoir(values)
diff --git a/setup.py b/setup.py
index 3f181148..1f5ec9f2 100755
--- a/setup.py
+++ b/setup.py
@@ -2,26 +2,26 @@
#
# Prepare a release:
#
-# - check version: hachoir/version.py and doc/conf.py
+# - check version: hachoir/__init__.py and doc/conf.py
# - set the release date: edit doc/changelog.rst
# - run: git commit -a
# - Remove untracked files/dirs: git clean -fdx
-# - run tests: tox
+# - run tests: tox --parallel auto
# - run: git push
-# - check Travis CI status:
-# https://travis-ci.org/vstinner/hachoir
-# - run: git tag x.y.z
-# - Remove untracked files/dirs: git clean -fdx
-# - run: python3 setup.py sdist bdist_wheel
+# - check GitHub Actions status:
+# https://github.com/vstinner/hachoir/actions
#
# Release a new version:
#
+# - git tag x.y.z
+# - git clean -fdx # Remove untracked files/dirs
+# - python3 setup.py sdist bdist_wheel
# - git push --tags
# - twine upload dist/*
#
# After the release:
#
-# - set version to N+1: hachoir/version.py and doc/conf.py
+# - set version to N+1: hachoir/__init__.py and doc/conf.py
ENTRY_POINTS = {
'console_scripts': [
@@ -72,6 +72,9 @@ def main():
"name": "hachoir",
"version": hachoir.__version__,
"url": 'http://hachoir.readthedocs.io/',
+ "project_urls": {
+ "Source": "https://github.com/vstinner/hachoir",
+ },
"author": "Hachoir team (see AUTHORS file)",
"description": "Package of Hachoir parsers used to open binary files",
"long_description": long_description,
@@ -83,6 +86,10 @@ def main():
"extras_require": {
"urwid": [
"urwid==1.3.1"
+ ],
+ "wx": [
+ "darkdetect",
+ "wxPython==4.*"
]
},
"zip_safe": True,
diff --git a/tests/test_editor.py b/tests/test_editor.py
new file mode 100644
index 00000000..1a00eb59
--- /dev/null
+++ b/tests/test_editor.py
@@ -0,0 +1,44 @@
+import unittest
+from io import BytesIO
+from hachoir.core.endian import BIG_ENDIAN
+from hachoir.editor import createEditor
+from hachoir.field import Parser, Bits
+from hachoir.stream import StringInputStream, OutputStream
+from hachoir.test import setup_tests
+
+
+class TestEditor(unittest.TestCase):
+ def test_bit_alignment(self):
+ data = bytes([255, 255, 255, 254])
+ stream = StringInputStream(data)
+ parser = TestParser(stream)
+ editor = createEditor(parser)
+
+ # Cause a change in a non-byte-aligned field
+ editor['flags[2]'].value -= 1
+
+ # Generate output and verify operation
+ output_io = BytesIO()
+ output_stream = OutputStream(output_io)
+
+ editor.writeInto(output_stream)
+ output_bits = "{0:b}".format(int.from_bytes(output_io.getvalue(), 'big'))
+
+ # X is the modified bit
+ # .....,,,,,,,,,,,,,,,,..X,,,,,,,,
+ self.assertEqual(output_bits, "11111111111111111111111011111110")
+
+
+class TestParser(Parser):
+ endian = BIG_ENDIAN
+
+ def createFields(self):
+ yield Bits(self, 'flags[]', 5)
+ yield Bits(self, 'flags[]', 16)
+ yield Bits(self, 'flags[]', 3)
+ yield Bits(self, 'flags[]', 8)
+
+
+if __name__ == "__main__":
+ setup_tests()
+ unittest.main()
diff --git a/tests/test_metadata.py b/tests/test_metadata.py
index 8677ba13..345aa5aa 100755
--- a/tests/test_metadata.py
+++ b/tests/test_metadata.py
@@ -126,7 +126,7 @@ class TestMetadata(unittest.TestCase):
# Check type
if type(read) != type(value) \
- and not(isinstance(value, int) and isinstance(value, int)):
+ and not (isinstance(value, int) and isinstance(value, int)):
if self.verbose:
sys.stdout.write("wrong type (%s instead of %s)!\n"
% (type(read).__name__, type(value).__name__))
diff --git a/tests/test_parser.py b/tests/test_parser.py
index fc10f342..958935fb 100755
--- a/tests/test_parser.py
+++ b/tests/test_parser.py
@@ -236,6 +236,12 @@ class TestParsers(unittest.TestCase):
parser = self.parse("python.cpython-37.pyc.bin")
self.checkValue(parser, "/content/consts/item[0]/name/text", "f")
+ def check_pyc_312(self, parser):
+ parser = self.parse("python.cpython-312.pyc.bin")
+ self.checkValue(parser, "/content/consts/item[0]/value", 1)
+ self.checkValue(parser, "/content/names/item[0]/text", "x")
+ self.checkValue(parser, "/content/name/text", "<module>")
+
def test_java(self):
parser = self.parse("ReferenceMap.class")
self.checkValue(parser, "/minor_version", 3)
@@ -836,6 +842,23 @@ class TestParsers(unittest.TestCase):
self.checkValue(parser, "/bpp", 32)
self.checkDisplay(parser, "/codec", "True-color RLE")
+ def test_ttf(self):
+ parser = self.parse("deja_vu_serif-2.7.ttf")
+ self.checkValue(parser, "/hhea/ascender", 1901)
+ self.checkValue(parser, "/maxp/maxCompositePoints", 101)
+ self.checkValue(parser, "/cmap/encodingRecords[1]/platformID", 1)
+ self.checkValue(parser, "/OS_2/achVendID", "Deja")
+
+ def test_fit(self):
+ parser = self.parse("test_file.fit")
+ self.checkValue(parser, "/header/datasize", 8148)
+ self.checkValue(parser, "/definition[18]/msgNumber", 325)
+ self.checkValue(parser, "/definition[18]/fieldDefinition[6]/number", 5)
+ self.checkValue(parser, "/definition[18]/RecordHeader/msgType", 2)
+ self.checkValue(parser, "/data[50]/field0", 1000231166)
+ self.checkValue(parser, "/data[50]/field1", 111)
+ self.checkValue(parser, "/data[50]/field2", 96)
+
class TestParserRandomStream(unittest.TestCase):
diff --git a/tox.ini b/tox.ini
index 190a2c13..32abdbc9 100644
--- a/tox.ini
+++ b/tox.ini
@@ -13,10 +13,14 @@ commands =
sh tools/flake8.sh
[flake8]
+# E121 continuation line under-indented for hanging indent
+# hachoir/parser/network/ouid.py
+# E131 continuation line unaligned for hanging indent
+# parser/container/mp4.py
# E501 line too long (88 > 79 characters)
# W503 line break before binary operator
# W504 line break after binary operator
-ignore = E501,W503,W504
+ignore = E121,E131,E501,W503,W504
[testenv:doc]
deps=