coming in the next release:
6.0.8 - 22 June 2021
* Fix the name and link to the chardet module in the documentation. (#280)
No code changed in this hotfix, only documentation.
6.0.7 - 21 June 2021
* Catch ``urllib.error.URLError`` to prevent crashes. (#239)
6.0.6 - 15 June 2021
* Prevent an AttributeError that occurs when a server returns HTTP 3xx
but doesn't include a Location header as well. (#267)
6.0.5 - 14 June 2021
* Prevent a TypeError crash that may occur when including a
username and password in the feed URL. (#276)
6.0.4 - 13 June 2021
* Prevent a UnicodeDecodeError crash that may occur when
the title element's type attribute exists but is empty. (#277)
* Prevent a UnicodeEncodeError crash that may occur if
the URL contains Unicode characters in the path. (#273)
6.0.3 - 12 June 2021
* Fix an issue with the HTTP request status on Python >= 3.9.
6.0.2 - 25 October 2020
* Stop building Python wheels with ``universal=1`` set. (#251)
This was causing pip to find and install the feedparser 6.x wheels
on Python 2 even though Python 2 is no longer supported.
* Fix a bug that put a trailing quote in the documentation version. (#232)
* Update the documentation URL to point to ReadTheDocs.
6.0.1 - 15 September 2020 [YANKED]
* Remove all Python 2 compatibility code (#228)
* Add *python_requires* to ``setup.py`` (#231)
6.0.0 - 12 September 2020 [YANKED]
* Support Python 3.6, 3.7, 3.8 and 3.9
* Drop support for Python 2.4 through 2.7, and Python 3.0 through 3.5 (#169)
* Convert feedparser from a monolithic file to a package
* ``feedparser.parse(sanitize_html=bool)`` argument replaces the ``feedparser.SANITIZE_HTML`` global
* ``feedparser.parse(resolve_relative_uris=bool)`` replaces the ``feedparser.RESOLVE_RELATIVE_URIS`` global
* Unify the codebase so that 2to3 conversion is no longer required
* Remove references to iconv_codecs
* Update the Creative Commons namespace URI's
* Update the default User-Agent name and URL
* Support Middle European (Summer) Time timezones (#20)
* Pass ``data`` to ``lazy_chardet_encoding()`` (#50)
* Document that datetimes are returned in UTC (#51)
* Remove cjkpython references in the documentation (#57)
* Resolve ResourceWarnings thrown during unit tests (#170)
* Fix tox build failures (#213)
* Use ``base64.decodebytes()`` directly to support Python 3.9 (#201)
* Fix Python 3.8 ``urllib.parse.splittype()`` deprecation warning (#211)
* Support parsing colons in RFC822 timezones (#144)
* Add `chardet` as an optional tox environment dependency
* Fix the Big5 unit test that fails when chardet is installed (#184)
5.2.1 - July 23, 2015
* Fix #22 (pip package keeps upgrading all the time)
5.2.0 - April 16, 2015
* Support PyPy
* Remove the HTTP Status 9001 test that caused unit test tracebacks
* Remove the completely-untested HTML tidy code
* Remove BeautifulSoup as a dependency
* Remove the XFN microformat parsing code
* Remove the rel_enclosure microformat parsing code
* Remove the rel_hcard microformat parsing code
* Remove the rel_tag microformat parsing code
* Replace the regex-based RFC 822 date parser with a procedural one
* Replace the Python-licensed W3DTF date parser
* Support HTML5 audio/source/video element relative URL's
* Remove the unparsed itunes_keywords key from the result dictionary
* Fix issue 321 just a little more (yet another code path was missed)
* Issue 62 (support georss and gml namespaces)
* Issue 296 (GUID's are always treated like relative URI's)
* Issue 334 (media:restriction element content is not returned)
* Issue 335 (sub-elements of media:group are not parsed and returned)
* Issue 342 (support multiple dc:creator elements)
* Issue 357 (loose parser breaks ampersands in link element URL's)
* Issue 374 (support the Podlove Simple Chapters namespace)
* Issue 380 (support media:rating element)
* Issue 384 (fix chardet support in Python 3)
* Issue 389 (elements in unknown uppercase namespaces are ignored)
* Issue 392 (tags element subverts 'tags' key in result dictionary)
* Issue 396 (Podlove Simple Chapters version 1.0 causes a KeyError)
* Issue 399 (docs call `request_headers` parameter `extra_headers`)
* Issue 401 (support additional dcterms and media namespaces elements)
* Issue 404 (support asctime datetime strings with timezone information)
* Issue 407 (decode forward slashes encoded as character entities)
* Issue 421 (delay chardet invocation as long as possible)
* Issue 422 (add return types docstrings)
* Issue 433 (update the list of allowed MathML elements and attributes)
5.1.3 - December 9, 2012
* Consolidated and simplified the character encoding detection code
* Issue 346 (the gb2312 encoding isn't always upgraded to gb18030)
* Issue 350 (HTTP Last-Modified example is incorrect in documentation)
* Issue 352 (importing lxml.etree changes what exceptions libxml2 throws)
* Issue 356 (add support for the HTML5 attributes `poster` and `preload`)
* Issue 364 (enclosure-sniffing microformat code can throw ValueError)
* Issue 373 (support RFC822-ish dates with swapped days and months)
* Issue 376 (uppercase 'X' in hex character references cause ValueError)
* Issue 382 (don't strip inline user:password credentials from FTP URL's)
5.1.2 - May 3, 2012
* Minor changes to the documentation
* Strip potentially dangerous ENTITY declarations in encoded feeds
* feedparser will now try to continue parsing despite compression errors
* Fix issue 321 a little more (the initial fix missed a code path)
* Issue 337 (`_parse_date_rfc822()` returns None on single-digit days)
* Issue 343 (add magnet links to the ACCEPTABLE_URI_SCHEMES)
* Issue 344 (handle deflated data with no headers nor checksums)
* Issue 347 (support `itunes:image` elements with a `url` attribute)
5.1.1 - March 20, 2011
* Fix mistakes, typos, and bugs in the unit test code
* Fix crash in Python 2.4 and 2.5 if the feed has a UTF_32 byte order mark
* Replace the RFC822 date parser for more extensibility
* Issue 304 (handle RFC822 dates with timezones like GMT+00:00)
* Issue 309 (itunes:keywords should be split by commas, not whitespace)
* Issue 310 (pubDate should map to `published`, not `updated`)
* Issue 313 (include the compression test files in MANIFEST.in)
* Issue 314 (far-flung RFC822 dates don't throw OverflowError on x64)
* Issue 315 (HTTP server for unit tests runs on 0.0.0.0)
* Issue 321 (malformed URIs can cause ValueError to be thrown)
* Issue 322 (HTTP redirect to HTTP 304 causes SAXParseException)
* Issue 323 (installing chardet causes 11 unit test failures)
* Issue 325 (map `description_detail` to `summary_detail`)
* Issue 326 (Unicode filename causes UnicodeEncodeError if locale is ASCII)
* Issue 327 (handle RFC822 dates with extraneous commas)
* Issue 328 (temporarily map `updated` to `published` due to issue 310)
* Issue 329 (escape backslashes in Windows path in docs/introduction.rst)
* Issue 331 (don't escape backslashes that are in raw strings in the docs)
5.1 - December 2, 2011
* Extensive, extensive unit test refactoring
* Convert the Docbook documentation to ReST
* Include the documentation in the source distribution
* Consolidate the disparate README files into one
* Support Jython somewhat (almost all unit tests pass)
* Support Python 3.2
* Fix Python 3 issues exposed by improved unit tests
* Fix international domain name issues exposed by improved unit tests
* Issue 148 (loose parser doesn't always return unicode strings)
* Issue 204 (FeedParserDict behavior should not be controlled by `assert`)
* Issue 247 (mssql date parser uses hardcoded tokyo timezone)
* Issue 249 (KeyboardInterrupt and SystemExit exceptions being caught)
* Issue 250 (`updated` can be a 9-tuple or a string, depending on context)
* Issue 252 (running setup.py in Python 3 fails due to missing sgmllib)
* Issue 253 (document that text/plain content isn't sanitized)
* Issue 260 (Python 3 doesn't decompress gzip'ed or deflate'd content)
* Issue 261 (popping from empty tag list)
* Issue 262 (docs are missing from distribution files)
* Issue 264 (vcard parser crashes on non-ascii characters)
* Issue 265 (http header comparisons are case sensitive)
* Issue 271 (monkey-patching sgmllib breaks other libraries)
* Issue 272 (can't pass bytes or str to `parse()` in Python 3)
* Issue 275 (`_parse_date()` doesn't catch OverflowError)
* Issue 276 (mutable types used as default values in `parse()`)
* Issue 277 (`python3 setup.py install` fails)
* Issue 281 (`_parse_date()` doesn't catch ValueError)
* Issue 282 (`_parse_date()` crashes when passed `None`)
* Issue 285 (crash on empty xmlns attribute)
* Issue 286 ('apos' character entity not handled properly)
* Issue 289 (add an option to disable microformat parsing)
* Issue 290 (Blogger's invalid img tags are unparseable)
* Issue 292 (atom id element not explicitly supported)
* Issue 294 ('categories' key exists but raises KeyError)
* Issue 297 (unresolvable external doctype causes crash)
* Issue 298 (nested nodes clobber actual values)
* Issue 300 (performance improvements)
* Issue 303 (unicode characters cause crash during relative uri resolution)
* Remove "Hot RSS" support since the format doesn't actually exist
* Remove the old feedparser.org website files from the source
* Remove the feedparser command line interface
* Remove the Zope interoperability hack
* Remove extraneous whitespace
5.0.1 - February 20, 2011
* Fix issue 91 (invalid text in XML declaration causes sanitizer to crash)
* Fix issue 254 (sanitization can be bypassed by malformed XML comments)
* Fix issue 255 (sanitizer doesn't strip unsafe URI schemes)
5.0 - January 25, 2011
* Improved MathML support
* Support microformats (rel-tag, rel-enclosure, xfn, hcard)
* Support IRIs
* Allow safe CSS through sanitization
* Allow safe HTML5 through sanitization
* Support SVG
* Support inline XML entity declarations
* Support unescaped quotes and angle brackets in attributes
* Support additional date formats
* Added the `request_headers` argument to parse()
* Added the `response_headers` argument to parse()
* Support multiple entry, feed, and source authors
* Officially make Python 2.4 the earliest supported version
* Support Python 3
* Bug fixes, bug fixes, bug fixes
===============================================================================
1.0 - 9/27/2002 - MAP - fixed namespace processing on prefixed RSS 2.0 elements,
added Simon Fell's test suite
1.1 - 9/29/2002 - MAP - fixed infinite loop on incomplete CDATA sections
2.0 - 10/19/2002
JD - use inchannel to watch out for image and textinput elements which can
also contain title, link, and description elements
JD - check for isPermaLink='false' attribute on guid elements
JD - replaced openAnything with open_resource supporting ETag and
If-Modified-Since request headers
JD - parse now accepts etag, modified, agent, and referrer optional
arguments
JD - modified parse to return a dictionary instead of a tuple so that any
etag or modified information can be returned and cached by the caller
2.0.1 - 10/21/2002 - MAP - changed parse() so that if we don't get anything
because of etag/modified, return the old etag/modified to the caller to
indicate why nothing is being returned
2.0.2 - 10/21/2002 - JB - added the inchannel to the if statement, otherwise its
useless. Fixes the problem JD was addressing by adding it.
2.1 - 11/14/2002 - MAP - added gzip support
2.2 - 1/27/2003 - MAP - added attribute support, admin:generatorAgent.
start_admingeneratoragent is an example of how to handle elements with
only attributes, no content.
2.3 - 6/11/2003 - MAP - added USER_AGENT for default (if caller doesn't specify);
also, make sure we send the User-Agent even if urllib2 isn't available.
Match any variation of backend.userland.com/rss namespace.
2.3.1 - 6/12/2003 - MAP - if item has both link and guid, return both as-is.
2.4 - 7/9/2003 - MAP - added preliminary Pie/Atom/Echo support based on Sam Ruby's
snapshot of July 1 <http://www.intertwingly.net/blog/1506.html>; changed
project name
2.5 - 7/25/2003 - MAP - changed to Python license (all contributors agree);
removed unnecessary urllib code -- urllib2 should always be available anyway;
return actual url, status, and full HTTP headers (as result['url'],
result['status'], and result['headers']) if parsing a remote feed over HTTP --
this should pass all the HTTP tests at <http://diveintomark.org/tests/client/http/>;
added the latest namespace-of-the-week for RSS 2.0
2.5.1 - 7/26/2003 - RMK - clear opener.addheaders so we only send our custom
User-Agent (otherwise urllib2 sends two, which confuses some servers)
2.5.2 - 7/28/2003 - MAP - entity-decode inline xml properly; added support for
inline <xhtml:body> and <xhtml:div> as used in some RSS 2.0 feeds
2.5.3 - 8/6/2003 - TvdV - patch to track whether we're inside an image or
textInput, and also to return the character encoding (if specified)
2.6 - 1/1/2004 - MAP - dc:author support (MarekK); fixed bug tracking
nested divs within content (JohnD); fixed missing sys import (JohanS);
fixed regular expression to capture XML character encoding (Andrei);
added support for Atom 0.3-style links; fixed bug with textInput tracking;
added support for cloud (MartijnP); added support for multiple
category/dc:subject (MartijnP); normalize content model: 'description' gets
description (which can come from description, summary, or full content if no
description), 'content' gets dict of base/language/type/value (which can come
from content:encoded, xhtml:body, content, or fullitem);
fixed bug matching arbitrary Userland namespaces; added xml:base and xml:lang
tracking; fixed bug tracking unknown tags; fixed bug tracking content when
<content> element is not in default namespace (like Pocketsoap feed);
resolve relative URLs in link, guid, docs, url, comments, wfw:comment,
wfw:commentRSS; resolve relative URLs within embedded HTML markup in
description, xhtml:body, content, content:encoded, title, subtitle,
summary, info, tagline, and copyright; added support for pingback and
trackback namespaces
2.7 - 1/5/2004 - MAP - really added support for trackback and pingback
namespaces, as opposed to 2.6 when I said I did but didn't really;
sanitize HTML markup within some elements; added mxTidy support (if
installed) to tidy HTML markup within some elements; fixed indentation
bug in _parse_date (FazalM); use socket.setdefaulttimeout if available
(FazalM); universal date parsing and normalization (FazalM): 'created', modified',
'issued' are parsed into 9-tuple date format and stored in 'created_parsed',
'modified_parsed', and 'issued_parsed'; 'date' is duplicated in 'modified'
and vice-versa; 'date_parsed' is duplicated in 'modified_parsed' and vice-versa
2.7.1 - 1/9/2004 - MAP - fixed bug handling " and '. fixed memory
leak not closing url opener (JohnD); added dc:publisher support (MarekK);
added admin:errorReportsTo support (MarekK); Python 2.1 dict support (MarekK)
2.7.4 - 1/14/2004 - MAP - added workaround for improperly formed <br/> tags in
encoded HTML (skadz); fixed unicode handling in normalize_attrs (ChrisL);
fixed relative URI processing for guid (skadz); added ICBM support; added
base64 support
2.7.5 - 1/15/2004 - MAP - added workaround for malformed DOCTYPE (seen on many
blogspot.com sites); added _debug variable
2.7.6 - 1/16/2004 - MAP - fixed bug with StringIO importing
3.0b3 - 1/23/2004 - MAP - parse entire feed with real XML parser (if available);
added several new supported namespaces; fixed bug tracking naked markup in
description; added support for enclosure; added support for source; re-added
support for cloud which got dropped somehow; added support for expirationDate
3.0b4 - 1/26/2004 - MAP - fixed xml:lang inheritance; fixed multiple bugs tracking
xml:base URI, one for documents that don't define one explicitly and one for
documents that define an outer and an inner xml:base that goes out of scope
before the end of the document
3.0b5 - 1/26/2004 - MAP - fixed bug parsing multiple links at feed level
3.0b6 - 1/27/2004 - MAP - added feed type and version detection, result['version']
will be one of SUPPORTED_VERSIONS.keys() or empty string if unrecognized;
added support for creativeCommons:license and cc:license; added support for
full Atom content model in title, tagline, info, copyright, summary; fixed bug
with gzip encoding (not always telling server we support it when we do)
3.0b7 - 1/28/2004 - MAP - support Atom-style author element in author_detail
(dictionary of 'name', 'url', 'email'); map author to author_detail if author
contains name + email address
3.0b8 - 1/28/2004 - MAP - added support for contributor
3.0b9 - 1/29/2004 - MAP - fixed check for presence of dict function; added
support for summary
3.0b10 - 1/31/2004 - MAP - incorporated ISO-8601 date parsing routines from
xml.util.iso8601
3.0b11 - 2/2/2004 - MAP - added 'rights' to list of elements that can contain
dangerous markup; fiddled with decodeEntities (not right); liberalized
date parsing even further
3.0b12 - 2/6/2004 - MAP - fiddled with decodeEntities (still not right);
added support to Atom 0.2 subtitle; added support for Atom content model
in copyright; better sanitizing of dangerous HTML elements with end tags
(script, frameset)
3.0b13 - 2/8/2004 - MAP - better handling of empty HTML tags (br, hr, img,
etc.) in embedded markup, in either HTML or XHTML form (<br>, <br/>, <br />)
3.0b14 - 2/8/2004 - MAP - fixed CDATA handling in non-wellformed feeds under
Python 2.1
3.0b15 - 2/11/2004 - MAP - fixed bug resolving relative links in wfw:commentRSS;
fixed bug capturing author and contributor URL; fixed bug resolving relative
links in author and contributor URL; fixed bug resolving relative links in
generator URL; added support for recognizing RSS 1.0; passed Simon Fell's
namespace tests, and included them permanently in the test suite with his
permission; fixed namespace handling under Python 2.1
3.0b16 - 2/12/2004 - MAP - fixed support for RSS 0.90 (broken in b15)
3.0b17 - 2/13/2004 - MAP - determine character encoding as per RFC 3023
3.0b18 - 2/17/2004 - MAP - always map description to summary_detail (Andrei);
use libxml2 (if available)
3.0b19 - 3/15/2004 - MAP - fixed bug exploding author information when author
name was in parentheses; removed ultra-problematic mxTidy support; patch to
workaround crash in PyXML/expat when encountering invalid entities
(MarkMoraes); support for textinput/textInput
3.0b20 - 4/7/2004 - MAP - added CDF support
3.0b21 - 4/14/2004 - MAP - added Hot RSS support
3.0b22 - 4/19/2004 - MAP - changed 'channel' to 'feed', 'item' to 'entries' in
results dict; changed results dict to allow getting values with results.key
as well as results[key]; work around embedded illformed HTML with half
a DOCTYPE; work around malformed Content-Type header; if character encoding
is wrong, try several common ones before falling back to regexes (if this
works, bozo_exception is set to CharacterEncodingOverride); fixed character
encoding issues in BaseHTMLProcessor by tracking encoding and converting
from Unicode to raw strings before feeding data to sgmllib.SGMLParser;
convert each value in results to Unicode (if possible), even if using
regex-based parsing
3.0b23 - 4/21/2004 - MAP - fixed UnicodeDecodeError for feeds that contain
high-bit characters in attributes in embedded HTML in description (thanks
Thijs van de Vossen); moved guid, date, and date_parsed to mapped keys in
FeedParserDict; tweaked FeedParserDict.has_key to return True if asking
about a mapped key
3.0fc1 - 4/23/2004 - MAP - made results.entries[0].links[0] and
results.entries[0].enclosures[0] into FeedParserDict; fixed typo that could
cause the same encoding to be tried twice (even if it failed the first time);
fixed DOCTYPE stripping when DOCTYPE contained entity declarations;
better textinput and image tracking in illformed RSS 1.0 feeds
3.0fc2 - 5/10/2004 - MAP - added and passed Sam's amp tests; added and passed
my blink tag tests
3.0fc3 - 6/18/2004 - MAP - fixed bug in _changeEncodingDeclaration that
failed to parse utf-16 encoded feeds; made source into a FeedParserDict;
duplicate admin:generatorAgent/@rdf:resource in generator_detail.url;
added support for image; refactored parse() fallback logic to try other
encodings if SAX parsing fails (previously it would only try other encodings
if re-encoding failed); remove unichr madness in normalize_attrs now that
we're properly tracking encoding in and out of BaseHTMLProcessor; set
feed.language from root-level xml:lang; set entry.id from rdf:about;
send Accept header
3.0 - 6/21/2004 - MAP - don't try iso-8859-1 (can't distinguish between
iso-8859-1 and windows-1252 anyway, and most incorrectly marked feeds are
windows-1252); fixed regression that could cause the same encoding to be
tried twice (even if it failed the first time)
3.0.1 - 6/22/2004 - MAP - default to us-ascii for all text/* content types;
recover from malformed content-type header parameter with no equals sign
('text/xml; charset:iso-8859-1')
3.1 - 6/28/2004 - MAP - added and passed tests for converting HTML entities
to Unicode equivalents in illformed feeds (aaronsw); added and
passed tests for converting character entities to Unicode equivalents
in illformed feeds (aaronsw); test for valid parsers when setting
XML_AVAILABLE; make version and encoding available when server returns
a 304; add handlers parameter to pass arbitrary urllib2 handlers (like
digest auth or proxy support); add code to parse username/password
out of url and send as basic authentication; expose downloading-related
exceptions in bozo_exception (aaronsw); added __contains__ method to
FeedParserDict (aaronsw); added publisher_detail (aaronsw)
3.2 - 7/3/2004 - MAP - use cjkcodecs and iconv_codec if available; always
convert feed to UTF-8 before passing to XML parser; completely revamped
logic for determining character encoding and attempting XML parsing
(much faster); increased default timeout to 20 seconds; test for presence
of Location header on redirects; added tests for many alternate character
encodings; support various EBCDIC encodings; support UTF-16BE and
UTF16-LE with or without a BOM; support UTF-8 with a BOM; support
UTF-32BE and UTF-32LE with or without a BOM; fixed crashing bug if no
XML parsers are available; added support for 'Content-encoding: deflate';
send blank 'Accept-encoding: ' header if neither gzip nor zlib modules
are available
3.3 - 7/15/2004 - MAP - optimize EBCDIC to ASCII conversion; fix obscure
problem tracking xml:base and xml:lang if element declares it, child
doesn't, first grandchild redeclares it, and second grandchild doesn't;
refactored date parsing; defined public registerDateHandler so callers
can add support for additional date formats at runtime; added support
for OnBlog, Nate, MSSQL, Greek, and Hungarian dates (ytrewq1); added
zopeCompatibilityHack() which turns FeedParserDict into a regular
dictionary, required for Zope compatibility, and also makes command-
line debugging easier because pprint module formats real dictionaries
better than dictionary-like objects; added NonXMLContentType exception,
which is stored in bozo_exception when a feed is served with a non-XML
media type such as 'text/plain'; respect Content-Language as default
language if not xml:lang is present; cloud dict is now FeedParserDict;
generator dict is now FeedParserDict; better tracking of xml:lang,
including support for xml:lang='' to unset the current language;
recognize RSS 1.0 feeds even when RSS 1.0 namespace is not the default
namespace; don't overwrite final status on redirects (scenarios:
redirecting to a URL that returns 304, redirecting to a URL that
redirects to another URL with a different type of redirect); add
support for HTTP 303 redirects
4.0 - MAP - support for relative URIs in xml:base attribute; fixed
encoding issue with mxTidy (phopkins); preliminary support for RFC 3229;
support for Atom 1.0; support for iTunes extensions; new 'tags' for
categories/keywords/etc. as array of dict
{'term': term, 'scheme': scheme, 'label': label} to match Atom 1.0
terminology; parse RFC 822-style dates with no time; lots of other
bug fixes
4.1 - MAP - removed socket timeout; added support for chardet library