New Upstream Release - ruby-rails-html-sanitizer

QA Page
Maintainer email: pkg-ruby-extras-maintainers@lists.alioth.debian.org
Automatic publish policy: main: push-derived , pristine-tar: push-derived , upstream: push-derived
Last processed: 2023-06-27T08:13 (took 4m34s)
Branch URL: https://salsa.debian.org/ruby-team/ruby-rails-html-sanitizer.git -b master (taken from version 1.4.4-1)
Queue position: 132591 (a 56w0d wait)

Ready changes

Summary

Merged new upstream version: 1.6.0 (was: 1.4.4).

Resulting package

Built on 2023-06-27T08:13 (took 4m34s)

The resulting binary packages can be installed (if you have the apt repository enabled) by running one of:

apt install -t fresh-releases ruby-rails-html-sanitizer

Lintian Result

ruby-rails-html-sanitizer_1.6.0-1~jan+nur1.dsc

ruby-rails-html-sanitizer_1.6.0-1~jan+nur1_all.deb

ruby-rails-html-sanitizer_1.6.0-1~jan+nur1_amd64.buildinfo

ruby-rails-html-sanitizer_1.6.0-1~jan+nur1_amd64.changes

Diff

diff --git a/CHANGELOG.md b/CHANGELOG.md
index e18051c..fc3e49c 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,75 @@
+## 1.6.0 / 2023-05-26
+
+* Dependencies have been updated:
+
+  - Loofah `~>2.21` and Nokogiri `~>1.14` for HTML5 parser support
+  - As a result, required Ruby version is now `>= 2.7.0`
+
+  Security updates will continue to be made on the `1.5.x` release branch as long as Rails 6.1
+  (which supports Ruby 2.5) is still in security support.
+
+  *Mike Dalessio*
+
+* HTML5 standards-compliant sanitizers are now available on platforms supported by
+  Nokogiri::HTML5. These are available as:
+
+  - `Rails::HTML5::FullSanitizer`
+  - `Rails::HTML5::LinkSanitizer`
+  - `Rails::HTML5::SafeListSanitizer`
+
+  And a new "vendor" is provided at `Rails::HTML5::Sanitizer` that can be used in a future version
+  of Rails.
+
+  Note that for symmetry `Rails::HTML4::Sanitizer` is also added, though its behavior is identical
+  to the vendor class methods on `Rails::HTML::Sanitizer`.
+
+  Users may call `Rails::HTML::Sanitizer.best_supported_vendor` to get back the HTML5 vendor if it's
+  supported, else the legacy HTML4 vendor.
+
+  *Mike Dalessio*
+
+* Module namespaces have changed, but backwards compatibility is provided by aliases.
+
+  The library defines three additional modules:
+
+  - `Rails::HTML` for general functionality (replacing `Rails::Html`)
+  - `Rails::HTML4` containing sanitizers that parse content as HTML4
+  - `Rails::HTML5` containing sanitizers that parse content as HTML5
+
+  The following aliases are maintained for backwards compatibility:
+
+  - `Rails::Html` points to `Rails::HTML`
+  - `Rails::HTML::FullSanitizer` points to `Rails::HTML4::FullSanitizer`
+  - `Rails::HTML::LinkSanitizer` points to `Rails::HTML4::LinkSanitizer`
+  - `Rails::HTML::SafeListSanitizer` points to `Rails::HTML4::SafeListSanitizer`
+
+  *Mike Dalessio*
+
+* `LinkSanitizer` always returns UTF-8 encoded strings. `SafeListSanitizer` and `FullSanitizer`
+  already ensured this encoding.
+
+  *Mike Dalessio*
+
+* `SafeListSanitizer` allows `time` tag and `lang` attribute by default.
+
+  *Mike Dalessio*
+
+* The constant `Rails::Html::XPATHS_TO_REMOVE` has been removed. It's not necessary with the
+  existing sanitizers, and should have been a private constant all along anyway.
+
+  *Mike Dalessio*
+
+
+## 1.5.0 / 2023-01-20
+
+* `SafeListSanitizer`, `PermitScrubber`, and `TargetScrubber` now all support pruning of unsafe tags.
+
+  By default, unsafe tags are still stripped, but this behavior can be changed to prune the element
+  and its children from the document by passing `prune: true` to any of these classes' constructors.
+
+  *seyerian*
+
+
 ## 1.4.4 / 2022-12-13
 
 * Address inefficient regular expression complexity with certain configurations of Rails::Html::Sanitizer.
@@ -52,6 +124,7 @@
 
   *Mike Dalessio*
 
+
 ## 1.4.1 / 2021-08-18
 
 * Fix regression in v1.4.0 that did not pass comment nodes to the scrubber.
@@ -64,6 +137,7 @@
 
   *Mike Dalessio*
 
+
 ## 1.4.0 / 2021-08-18
 
 * Processing Instructions are no longer allowed by Rails::Html::PermitScrubber
@@ -76,12 +150,14 @@
 
   *Mike Dalessio*
 
+
 ## 1.3.0
 
 * Address deprecations in Loofah 2.3.0.
 
   *Josh Goodall*
 
+
 ## 1.2.0
 
 * Remove needless `white_list_sanitizer` deprecation.
@@ -96,6 +172,7 @@
 
   *Kasper Timm Hansen*
 
+
 ## 1.1.0
 
 * Add `safe_list_sanitizer` and deprecate `white_list_sanitizer` to be removed
@@ -113,10 +190,12 @@
 
   *Kasper Timm Hansen*
 
+
 ## 1.0.1
 
 * Added support for Rails 4.2.0.beta2 and above
 
+
 ## 1.0.0
 
 * First release.
diff --git a/MIT-LICENSE b/MIT-LICENSE
index 330b78b..c56f78e 100644
--- a/MIT-LICENSE
+++ b/MIT-LICENSE
@@ -1,4 +1,4 @@
-Copyright (c) 2013-2015 Rafael Mendonça França, Kasper Timm Hansen
+Copyright (c) 2013-2023 Rafael Mendonça França, Kasper Timm Hansen, Mike Dalessio
 
 MIT License
 
diff --git a/README.md b/README.md
index 7b160b5..8cde5c1 100644
--- a/README.md
+++ b/README.md
@@ -1,61 +1,76 @@
-# Rails Html Sanitizers
+# Rails HTML Sanitizers
 
-In Rails 4.2 and above this gem will be responsible for sanitizing HTML fragments in Rails
-applications, i.e. in the `sanitize`, `sanitize_css`, `strip_tags` and `strip_links` methods.
+This gem is responsible for sanitizing HTML fragments in Rails applications. Specifically, this is the set of sanitizers used to implement the Action View `SanitizerHelper` methods `sanitize`, `sanitize_css`, `strip_tags` and `strip_links`.
 
-Rails Html Sanitizer is only intended to be used with Rails applications. If you need similar functionality in non Rails apps consider using [Loofah](https://github.com/flavorjones/loofah) directly (that's what handles sanitization under the hood).
+Rails HTML Sanitizer is only intended to be used with Rails applications. If you need similar functionality but aren't using Rails, consider using the underlying sanitization library [Loofah](https://github.com/flavorjones/loofah) directly.
 
-## Installation
-
-Add this line to your application's Gemfile:
 
-    gem 'rails-html-sanitizer'
-
-And then execute:
+## Usage
 
-    $ bundle
+### Sanitizers
 
-Or install it yourself as:
+All sanitizers respond to `sanitize`, and are available in variants that use either HTML4 or HTML5 parsing, under the `Rails::HTML4` and `Rails::HTML5` namespaces, respectively.
 
-    $ gem install rails-html-sanitizer
+NOTE: The HTML5 sanitizers are not supported on JRuby. Users may programmatically check for support by calling `Rails::HTML::Sanitizer.html5_support?`.
 
-## Usage
 
-### Sanitizers
+#### FullSanitizer
 
-All sanitizers respond to `sanitize`.
+```ruby
+full_sanitizer = Rails::HTML5::FullSanitizer.new
+full_sanitizer.sanitize("<b>Bold</b> no more!  <a href='more.html'>See more here</a>...")
+# => Bold no more!  See more here...
+```
 
-#### FullSanitizer
+or, if you insist on parsing the content as HTML4:
 
 ```ruby
-full_sanitizer = Rails::Html::FullSanitizer.new
+full_sanitizer = Rails::HTML4::FullSanitizer.new
 full_sanitizer.sanitize("<b>Bold</b> no more!  <a href='more.html'>See more here</a>...")
 # => Bold no more!  See more here...
 ```
 
+HTML5 version:
+
+
+
 #### LinkSanitizer
 
 ```ruby
-link_sanitizer = Rails::Html::LinkSanitizer.new
+link_sanitizer = Rails::HTML5::LinkSanitizer.new
+link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
+# => Only the link text will be kept.
+```
+
+or, if you insist on parsing the content as HTML4:
+
+```ruby
+link_sanitizer = Rails::HTML4::LinkSanitizer.new
 link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
 # => Only the link text will be kept.
 ```
 
+
 #### SafeListSanitizer
 
+This sanitizer is also available as an HTML4 variant, but for simplicity we'll document only the HTML5 variant below.
+
 ```ruby
-safe_list_sanitizer = Rails::Html::SafeListSanitizer.new
+safe_list_sanitizer = Rails::HTML5::SafeListSanitizer.new
 
 # sanitize via an extensive safe list of allowed elements
 safe_list_sanitizer.sanitize(@article.body)
 
-# safe list only the supplied tags and attributes
+# sanitize only the supplied tags and attributes
 safe_list_sanitizer.sanitize(@article.body, tags: %w(table tr td), attributes: %w(id class style))
 
-# safe list via a custom scrubber
+# sanitize via a custom scrubber
 safe_list_sanitizer.sanitize(@article.body, scrubber: ArticleScrubber.new)
 
-# safe list sanitizer can also sanitize css
+# prune nodes from the tree instead of stripping tags and leaving inner content
+safe_list_sanitizer = Rails::HTML5::SafeListSanitizer.new(prune: true)
+
+# the sanitizer can also sanitize css
 safe_list_sanitizer.sanitize_css('background-color: #000;')
 ```
 
@@ -63,14 +78,14 @@ safe_list_sanitizer.sanitize_css('background-color: #000;')
 
 Scrubbers are objects responsible for removing nodes or attributes you don't want in your HTML document.
 
-This gem includes two scrubbers `Rails::Html::PermitScrubber` and `Rails::Html::TargetScrubber`.
+This gem includes two scrubbers `Rails::HTML::PermitScrubber` and `Rails::HTML::TargetScrubber`.
 
-#### `Rails::Html::PermitScrubber`
+#### `Rails::HTML::PermitScrubber`
 
 This scrubber allows you to permit only the tags and attributes you want.
 
 ```ruby
-scrubber = Rails::Html::PermitScrubber.new
+scrubber = Rails::HTML::PermitScrubber.new
 scrubber.tags = ['a']
 
 html_fragment = Loofah.fragment('<a><img/ ></a>')
@@ -78,16 +93,34 @@ html_fragment.scrub!(scrubber)
 html_fragment.to_s # => "<a></a>"
 ```
 
-#### `Rails::Html::TargetScrubber`
+By default, inner content is left, but it can be removed as well.
+
+```ruby
+scrubber = Rails::HTML::PermitScrubber.new
+scrubber.tags = ['a']
+
+html_fragment = Loofah.fragment('<a><span>text</span></a>')
+html_fragment.scrub!(scrubber)
+html_fragment.to_s # => "<a>text</a>"
+
+scrubber = Rails::HTML::PermitScrubber.new(prune: true)
+scrubber.tags = ['a']
+
+html_fragment = Loofah.fragment('<a><span>text</span></a>')
+html_fragment.scrub!(scrubber)
+html_fragment.to_s # => "<a></a>"
+```
+
+#### `Rails::HTML::TargetScrubber`
 
 Where `PermitScrubber` picks out tags and attributes to permit in sanitization,
-`Rails::Html::TargetScrubber` targets them for removal. See https://github.com/flavorjones/loofah/blob/main/lib/loofah/html5/safelist.rb for the tag list.
+`Rails::HTML::TargetScrubber` targets them for removal. See https://github.com/flavorjones/loofah/blob/main/lib/loofah/html5/safelist.rb for the tag list.
 
 **Note:** by default, it will scrub anything that is not part of the permitted tags from
 loofah `HTML5::Scrub.allowed_element?`.
 
 ```ruby
-scrubber = Rails::Html::TargetScrubber.new
+scrubber = Rails::HTML::TargetScrubber.new
 scrubber.tags = ['img']
 
 html_fragment = Loofah.fragment('<a><img/ ></a>')
@@ -95,12 +128,30 @@ html_fragment.scrub!(scrubber)
 html_fragment.to_s # => "<a></a>"
 ```
 
+Similarly to `PermitScrubber`, nodes can be fully pruned.
+
+```ruby
+scrubber = Rails::HTML::TargetScrubber.new
+scrubber.tags = ['span']
+
+html_fragment = Loofah.fragment('<a><span>text</span></a>')
+html_fragment.scrub!(scrubber)
+html_fragment.to_s # => "<a>text</a>"
+
+scrubber = Rails::HTML::TargetScrubber.new(prune: true)
+scrubber.tags = ['span']
+
+html_fragment = Loofah.fragment('<a><span>text</span></a>')
+html_fragment.scrub!(scrubber)
+html_fragment.to_s # => "<a></a>"
+```
+
 #### Custom Scrubbers
 
 You can also create custom scrubbers in your application if you want to.
 
 ```ruby
-class CommentScrubber < Rails::Html::PermitScrubber
+class CommentScrubber < Rails::HTML::PermitScrubber
   def initialize
     super
     self.tags = %w( form script comment blockquote )
@@ -113,7 +164,7 @@ class CommentScrubber < Rails::Html::PermitScrubber
 end
 ```
 
-See `Rails::Html::PermitScrubber` documentation to learn more about which methods can be overridden.
+See `Rails::HTML::PermitScrubber` documentation to learn more about which methods can be overridden.
 
 #### Custom Scrubber in a Rails app
 
@@ -123,20 +174,98 @@ Using the `CommentScrubber` from above, you can use this in a Rails view like so
 <%= sanitize @comment, scrubber: CommentScrubber.new %>
 ```
 
+### A note on HTML entities
+
+__Rails HTML sanitizers are intended to be used by the view layer, at page-render time. They are *not* intended to sanitize persisted strings that will be sanitized *again* at page-render time.__
+
+Proper HTML sanitization will replace some characters with HTML entities. For example, text containing a `<` character will be updated to contain `&lt;` to ensure that the markup is well-formed.
+
+This is important to keep in mind because __HTML entities will render improperly if they are sanitized twice.__
+
+
+#### A concrete example showing the problem that can arise
+
+Imagine the user is asked to enter their employer's name, which will appear on their public profile page. Then imagine they enter `JPMorgan Chase & Co.`.
+
+If you sanitize this before persisting it in the database, the stored string will be `JPMorgan Chase &amp; Co.`
+
+When the page is rendered, if this string is sanitized a second time by the view layer, the HTML will contain `JPMorgan Chase &amp;amp; Co.` which will render as "JPMorgan Chase &amp;amp; Co.".
+
+Another problem that can arise is rendering the sanitized string in a non-HTML context (for example, if it ends up being part of an SMS message). In this case, it may contain inappropriate HTML entities.
+
+
+#### Suggested alternatives
+
+You might simply choose to persist the untrusted string as-is (the raw input), and then ensure that the string will be properly sanitized by the view layer.
+
+That raw string, if rendered in an non-HTML context (like SMS), must also be sanitized by a method appropriate for that context. You may wish to look into using [Loofah](https://github.com/flavorjones/loofah) or [Sanitize](https://github.com/rgrove/sanitize) to customize how this sanitization works, including omitting HTML entities in the final string.
+
+If you really want to sanitize the string that's stored in your database, you may wish to look into  [Loofah::ActiveRecord](https://github.com/flavorjones/loofah-activerecord) rather than use the Rails HTML sanitizers.
+
+
+### A note on module names
+
+In versions < 1.6, the only module defined by this library was `Rails::Html`. Starting in 1.6, we define three additional modules:
+
+- `Rails::HTML` for general functionality (replacing `Rails::Html`)
+- `Rails::HTML4` containing sanitizers that parse content as HTML4
+- `Rails::HTML5` containing sanitizers that parse content as HTML5 (if supported)
+
+The following aliases are maintained for backwards compatibility:
+
+- `Rails::Html` points to `Rails::HTML`
+- `Rails::HTML::FullSanitizer` points to `Rails::HTML4::FullSanitizer`
+- `Rails::HTML::LinkSanitizer` points to `Rails::HTML4::LinkSanitizer`
+- `Rails::HTML::SafeListSanitizer` points to `Rails::HTML4::SafeListSanitizer`
+
+
+## Installation
+
+Add this line to your application's Gemfile:
+
+    gem 'rails-html-sanitizer'
+
+And then execute:
+
+    $ bundle
+
+Or install it yourself as:
+
+    $ gem install rails-html-sanitizer
+
+
+## Support matrix
+
+| branch | ruby support | actively maintained | security support                       |
+|--------|--------------|---------------------|----------------------------------------|
+| 1.6.x  | >= 2.7       | yes                 | yes                                    |
+| 1.5.x  | >= 2.5       | no                  | while Rails 6.1 is in security support |
+| 1.4.x  | >= 1.8.7     | no                  | no                                     |
+
+
 ## Read more
 
 Loofah is what underlies the sanitizers and scrubbers of rails-html-sanitizer.
+
 - [Loofah and Loofah Scrubbers](https://github.com/flavorjones/loofah)
 
 The `node` argument passed to some methods in a custom scrubber is an instance of `Nokogiri::XML::Node`.
+
 - [`Nokogiri::XML::Node`](https://nokogiri.org/rdoc/Nokogiri/XML/Node.html)
 - [Nokogiri](http://nokogiri.org)
 
-## Contributing to Rails Html Sanitizers
 
-Rails Html Sanitizers is work of many contributors. You're encouraged to submit pull requests, propose features and discuss issues.
+## Contributing to Rails HTML Sanitizers
+
+Rails HTML Sanitizers is work of many contributors. You're encouraged to submit pull requests, propose features and discuss issues.
 
 See [CONTRIBUTING](CONTRIBUTING.md).
 
+### Security reports
+
+Trying to report a possible security vulnerability in this project? Please check out the [Rails project's security policy](https://rubyonrails.org/security) for instructions.
+
+
 ## License
-Rails Html Sanitizers is released under the [MIT License](MIT-LICENSE).
+
+Rails HTML Sanitizers is released under the [MIT License](MIT-LICENSE).
diff --git a/debian/changelog b/debian/changelog
index 044701b..ae67d71 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+ruby-rails-html-sanitizer (1.6.0-1) UNRELEASED; urgency=low
+
+  * New upstream release.
+
+ -- Debian Janitor <janitor@jelmer.uk>  Tue, 27 Jun 2023 08:09:39 -0000
+
 ruby-rails-html-sanitizer (1.4.4-1) unstable; urgency=medium
 
   * Team upload
diff --git a/lib/rails-html-sanitizer.rb b/lib/rails-html-sanitizer.rb
index 59ed70d..0c48f7f 100644
--- a/lib/rails-html-sanitizer.rb
+++ b/lib/rails-html-sanitizer.rb
@@ -1,30 +1,14 @@
-require "rails/html/sanitizer/version"
-require "loofah"
-require "rails/html/scrubbers"
-require "rails/html/sanitizer"
+# frozen_string_literal: true
 
-module Rails
-  module Html
-    class Sanitizer
-      class << self
-        def full_sanitizer
-          Html::FullSanitizer
-        end
+require_relative "rails/html/sanitizer/version"
 
-        def link_sanitizer
-          Html::LinkSanitizer
-        end
+require "loofah"
 
-        def safe_list_sanitizer
-          Html::SafeListSanitizer
-        end
+require_relative "rails/html/scrubbers"
+require_relative "rails/html/sanitizer"
 
-        def white_list_sanitizer
-          safe_list_sanitizer
-        end
-      end
-    end
-  end
+module Rails
+  Html = HTML # :nodoc:
 end
 
 module ActionView
diff --git a/lib/rails/html/sanitizer.rb b/lib/rails/html/sanitizer.rb
index 5633ca1..b3712a7 100644
--- a/lib/rails/html/sanitizer.rb
+++ b/lib/rails/html/sanitizer.rb
@@ -1,155 +1,422 @@
+# frozen_string_literal: true
+
 module Rails
-  module Html
-    XPATHS_TO_REMOVE = %w{.//script .//form comment()}
+  module HTML
+    class Sanitizer
+      class << self
+        def html5_support?
+          return @html5_support if defined?(@html5_support)
+
+          @html5_support = Loofah.respond_to?(:html5_support?) && Loofah.html5_support?
+        end
+
+        def best_supported_vendor
+          html5_support? ? Rails::HTML5::Sanitizer : Rails::HTML4::Sanitizer
+        end
+      end
 
-    class Sanitizer # :nodoc:
       def sanitize(html, options = {})
         raise NotImplementedError, "subclasses must implement sanitize method."
       end
 
       private
+        def remove_xpaths(node, xpaths)
+          node.xpath(*xpaths).remove
+          node
+        end
+
+        def properly_encode(fragment, options)
+          fragment.xml? ? fragment.to_xml(options) : fragment.to_html(options)
+        end
+    end
+
+    module Concern
+      module ComposedSanitize
+        def sanitize(html, options = {})
+          return unless html
+          return html if html.empty?
+
+          serialize(scrub(parse_fragment(html), options))
+        end
+      end
+
+      module Parser
+        module HTML4
+          def parse_fragment(html)
+            Loofah.html4_fragment(html)
+          end
+        end
+
+        module HTML5
+          def parse_fragment(html)
+            Loofah.html5_fragment(html)
+          end
+        end if Rails::HTML::Sanitizer.html5_support?
+      end
+
+      module Scrubber
+        module Full
+          def scrub(fragment, options = {})
+            fragment.scrub!(TextOnlyScrubber.new)
+          end
+        end
+
+        module Link
+          def initialize
+            super
+            @link_scrubber = TargetScrubber.new
+            @link_scrubber.tags = %w(a)
+            @link_scrubber.attributes = %w(href)
+          end
+
+          def scrub(fragment, options = {})
+            fragment.scrub!(@link_scrubber)
+          end
+        end
+
+        module SafeList
+          # The default safe list for tags
+          DEFAULT_ALLOWED_TAGS = Set.new([
+                                           "a",
+                                           "abbr",
+                                           "acronym",
+                                           "address",
+                                           "b",
+                                           "big",
+                                           "blockquote",
+                                           "br",
+                                           "cite",
+                                           "code",
+                                           "dd",
+                                           "del",
+                                           "dfn",
+                                           "div",
+                                           "dl",
+                                           "dt",
+                                           "em",
+                                           "h1",
+                                           "h2",
+                                           "h3",
+                                           "h4",
+                                           "h5",
+                                           "h6",
+                                           "hr",
+                                           "i",
+                                           "img",
+                                           "ins",
+                                           "kbd",
+                                           "li",
+                                           "ol",
+                                           "p",
+                                           "pre",
+                                           "samp",
+                                           "small",
+                                           "span",
+                                           "strong",
+                                           "sub",
+                                           "sup",
+                                           "time",
+                                           "tt",
+                                           "ul",
+                                           "var",
+                                         ]).freeze
+
+          # The default safe list for attributes
+          DEFAULT_ALLOWED_ATTRIBUTES = Set.new([
+                                                 "abbr",
+                                                 "alt",
+                                                 "cite",
+                                                 "class",
+                                                 "datetime",
+                                                 "height",
+                                                 "href",
+                                                 "lang",
+                                                 "name",
+                                                 "src",
+                                                 "title",
+                                                 "width",
+                                                 "xml:lang",
+                                               ]).freeze
 
-      def remove_xpaths(node, xpaths)
-        node.xpath(*xpaths).remove
-        node
+          def self.included(klass)
+            class << klass
+              attr_accessor :allowed_tags
+              attr_accessor :allowed_attributes
+            end
+
+            klass.allowed_tags = DEFAULT_ALLOWED_TAGS.dup
+            klass.allowed_attributes = DEFAULT_ALLOWED_ATTRIBUTES.dup
+          end
+
+          def initialize(prune: false)
+            @permit_scrubber = PermitScrubber.new(prune: prune)
+          end
+
+          def scrub(fragment, options = {})
+            if scrubber = options[:scrubber]
+              # No duck typing, Loofah ensures subclass of Loofah::Scrubber
+              fragment.scrub!(scrubber)
+            elsif allowed_tags(options) || allowed_attributes(options)
+              @permit_scrubber.tags = allowed_tags(options)
+              @permit_scrubber.attributes = allowed_attributes(options)
+              fragment.scrub!(@permit_scrubber)
+            else
+              fragment.scrub!(:strip)
+            end
+          end
+
+          def sanitize_css(style_string)
+            Loofah::HTML5::Scrub.scrub_css(style_string)
+          end
+
+          private
+            def allowed_tags(options)
+              options[:tags] || self.class.allowed_tags
+            end
+
+            def allowed_attributes(options)
+              options[:attributes] || self.class.allowed_attributes
+            end
+        end
       end
 
-      def properly_encode(fragment, options)
-        fragment.xml? ? fragment.to_xml(options) : fragment.to_html(options)
+      module Serializer
+        module UTF8Encode
+          def serialize(fragment)
+            properly_encode(fragment, encoding: "UTF-8")
+          end
+        end
       end
     end
+  end
 
-    # === Rails::Html::FullSanitizer
-    # Removes all tags but strips out scripts, forms and comments.
-    #
-    # full_sanitizer = Rails::Html::FullSanitizer.new
-    # full_sanitizer.sanitize("<b>Bold</b> no more!  <a href='more.html'>See more here</a>...")
-    # # => Bold no more!  See more here...
-    class FullSanitizer < Sanitizer
-      def sanitize(html, options = {})
-        return unless html
-        return html if html.empty?
+  module HTML4
+    module Sanitizer
+      module VendorMethods
+        def full_sanitizer
+          Rails::HTML4::FullSanitizer
+        end
 
-        loofah_fragment = Loofah.fragment(html)
+        def link_sanitizer
+          Rails::HTML4::LinkSanitizer
+        end
 
-        remove_xpaths(loofah_fragment, XPATHS_TO_REMOVE)
-        loofah_fragment.scrub!(TextOnlyScrubber.new)
+        def safe_list_sanitizer
+          Rails::HTML4::SafeListSanitizer
+        end
 
-        properly_encode(loofah_fragment, encoding: 'UTF-8')
+        def white_list_sanitizer # :nodoc:
+          safe_list_sanitizer
+        end
       end
+
+      extend VendorMethods
     end
 
-    # === Rails::Html::LinkSanitizer
-    # Removes +a+ tags and +href+ attributes leaving only the link text.
+    # == Rails::HTML4::FullSanitizer
     #
-    #  link_sanitizer = Rails::Html::LinkSanitizer.new
-    #  link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
+    # Removes all tags from HTML4 but strips out scripts, forms and comments.
     #
-    #  => 'Only the link text will be kept.'
-    class LinkSanitizer < Sanitizer
-      def initialize
-        @link_scrubber = TargetScrubber.new
-        @link_scrubber.tags = %w(a)
-        @link_scrubber.attributes = %w(href)
-      end
+    #   full_sanitizer = Rails::HTML4::FullSanitizer.new
+    #   full_sanitizer.sanitize("<b>Bold</b> no more!  <a href='more.html'>See more here</a>...")
+    #   # => "Bold no more!  See more here..."
+    #
+    class FullSanitizer < Rails::HTML::Sanitizer
+      include HTML::Concern::ComposedSanitize
+      include HTML::Concern::Parser::HTML4
+      include HTML::Concern::Scrubber::Full
+      include HTML::Concern::Serializer::UTF8Encode
+    end
 
-      def sanitize(html, options = {})
-        Loofah.scrub_fragment(html, @link_scrubber).to_s
-      end
+    # == Rails::HTML4::LinkSanitizer
+    #
+    # Removes +a+ tags and +href+ attributes from HTML4 leaving only the link text.
+    #
+    #   link_sanitizer = Rails::HTML4::LinkSanitizer.new
+    #   link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
+    #   # => "Only the link text will be kept."
+    #
+    class LinkSanitizer < Rails::HTML::Sanitizer
+      include HTML::Concern::ComposedSanitize
+      include HTML::Concern::Parser::HTML4
+      include HTML::Concern::Scrubber::Link
+      include HTML::Concern::Serializer::UTF8Encode
     end
 
-    # === Rails::Html::SafeListSanitizer
-    # Sanitizes html and css from an extensive safe list (see link further down).
+    # == Rails::HTML4::SafeListSanitizer
+    #
+    # Sanitizes HTML4 and CSS from an extensive safe list.
     #
     # === Whitespace
-    # We can't make any guarantees about whitespace being kept or stripped.
-    # Loofah uses Nokogiri, which wraps either a C or Java parser for the
-    # respective Ruby implementation.
-    # Those two parsers determine how whitespace is ultimately handled.
     #
-    # When the stripped markup will be rendered the users browser won't take
-    # whitespace into account anyway. It might be better to suggest your users
-    # wrap their whitespace sensitive content in pre tags or that you do
-    # so automatically.
+    # We can't make any guarantees about whitespace being kept or stripped.  Loofah uses Nokogiri,
+    # which wraps either a C or Java parser for the respective Ruby implementation.  Those two
+    # parsers determine how whitespace is ultimately handled.
+    #
+    # When the stripped markup will be rendered the users browser won't take whitespace into account
+    # anyway. It might be better to suggest your users wrap their whitespace sensitive content in
+    # pre tags or that you do so automatically.
     #
     # === Options
-    # Sanitizes both html and css via the safe lists found here:
-    # https://github.com/flavorjones/loofah/blob/master/lib/loofah/html5/safelist.rb
     #
-    # SafeListSanitizer also accepts options to configure
-    # the safe list used when sanitizing html.
+    # Sanitizes both html and css via the safe lists found in
+    # Rails::HTML::Concern::Scrubber::SafeList
+    #
+    # SafeListSanitizer also accepts options to configure the safe list used when sanitizing html.
     # There's a class level option:
-    # Rails::Html::SafeListSanitizer.allowed_tags = %w(table tr td)
-    # Rails::Html::SafeListSanitizer.allowed_attributes = %w(id class style)
     #
-    # Tags and attributes can also be passed to +sanitize+.
-    # Passed options take precedence over the class level options.
+    #   Rails::HTML4::SafeListSanitizer.allowed_tags = %w(table tr td)
+    #   Rails::HTML4::SafeListSanitizer.allowed_attributes = %w(id class style)
+    #
+    # Tags and attributes can also be passed to +sanitize+.  Passed options take precedence over the
+    # class level options.
     #
     # === Examples
-    # safe_list_sanitizer = Rails::Html::SafeListSanitizer.new
     #
-    # Sanitize css doesn't take options
-    # safe_list_sanitizer.sanitize_css('background-color: #000;')
+    #   safe_list_sanitizer = Rails::HTML4::SafeListSanitizer.new
     #
-    # Default: sanitize via a extensive safe list of allowed elements
-    # safe_list_sanitizer.sanitize(@article.body)
+    #   # default: sanitize via a extensive safe list of allowed elements
+    #   safe_list_sanitizer.sanitize(@article.body)
     #
-    # Safe list via the supplied tags and attributes
-    # safe_list_sanitizer.sanitize(@article.body, tags: %w(table tr td),
-    # attributes: %w(id class style))
+    #   # sanitize via the supplied tags and attributes
+    #   safe_list_sanitizer.sanitize(
+    #     @article.body,
+    #     tags: %w(table tr td),
+    #     attributes: %w(id class style),
+    #   )
     #
-    # Safe list via a custom scrubber
-    # safe_list_sanitizer.sanitize(@article.body, scrubber: ArticleScrubber.new)
-    class SafeListSanitizer < Sanitizer
-      class << self
-        attr_accessor :allowed_tags
-        attr_accessor :allowed_attributes
-      end
-      self.allowed_tags = Set.new(%w(strong em b i p code pre tt samp kbd var sub
-        sup dfn cite big small address hr br div span h1 h2 h3 h4 h5 h6 ul ol li dl dt dd abbr
-        acronym a img blockquote del ins))
-      self.allowed_attributes = Set.new(%w(href src width height alt cite datetime title class name xml:lang abbr))
-
-      def initialize
-        @permit_scrubber = PermitScrubber.new
-      end
-
-      def sanitize(html, options = {})
-        return unless html
-        return html if html.empty?
+    #   # sanitize via a custom Loofah scrubber
+    #   safe_list_sanitizer.sanitize(@article.body, scrubber: ArticleScrubber.new)
+    #
+    #   # prune nodes from the tree instead of stripping tags and leaving inner content
+    #   safe_list_sanitizer = Rails::HTML4::SafeListSanitizer.new(prune: true)
+    #
+    #   # the sanitizer can also sanitize CSS
+    #   safe_list_sanitizer.sanitize_css('background-color: #000;')
+    #
+    class SafeListSanitizer < Rails::HTML::Sanitizer
+      include HTML::Concern::ComposedSanitize
+      include HTML::Concern::Parser::HTML4
+      include HTML::Concern::Scrubber::SafeList
+      include HTML::Concern::Serializer::UTF8Encode
+    end
+  end
 
-        loofah_fragment = Loofah.fragment(html)
+  module HTML5
+    class Sanitizer
+      class << self
+        def full_sanitizer
+          Rails::HTML5::FullSanitizer
+        end
 
-        if scrubber = options[:scrubber]
-          # No duck typing, Loofah ensures subclass of Loofah::Scrubber
-          loofah_fragment.scrub!(scrubber)
-        elsif allowed_tags(options) || allowed_attributes(options)
-          @permit_scrubber.tags = allowed_tags(options)
-          @permit_scrubber.attributes = allowed_attributes(options)
-          loofah_fragment.scrub!(@permit_scrubber)
-        else
-          remove_xpaths(loofah_fragment, XPATHS_TO_REMOVE)
-          loofah_fragment.scrub!(:strip)
+        def link_sanitizer
+          Rails::HTML5::LinkSanitizer
         end
 
-        properly_encode(loofah_fragment, encoding: 'UTF-8')
-      end
+        def safe_list_sanitizer
+          Rails::HTML5::SafeListSanitizer
+        end
 
-      def sanitize_css(style_string)
-        Loofah::HTML5::Scrub.scrub_css(style_string)
+        def white_list_sanitizer # :nodoc:
+          safe_list_sanitizer
+        end
       end
+    end
 
-      private
+    # == Rails::HTML5::FullSanitizer
+    #
+    # Removes all tags from HTML5 but strips out scripts, forms and comments.
+    #
+    #   full_sanitizer = Rails::HTML5::FullSanitizer.new
+    #   full_sanitizer.sanitize("<b>Bold</b> no more!  <a href='more.html'>See more here</a>...")
+    #   # => "Bold no more!  See more here..."
+    #
+    class FullSanitizer < Rails::HTML::Sanitizer
+      include HTML::Concern::ComposedSanitize
+      include HTML::Concern::Parser::HTML5
+      include HTML::Concern::Scrubber::Full
+      include HTML::Concern::Serializer::UTF8Encode
+    end
 
-      def allowed_tags(options)
-        options[:tags] || self.class.allowed_tags
-      end
+    # == Rails::HTML5::LinkSanitizer
+    #
+    # Removes +a+ tags and +href+ attributes from HTML5 leaving only the link text.
+    #
+    #   link_sanitizer = Rails::HTML5::LinkSanitizer.new
+    #   link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
+    #   # => "Only the link text will be kept."
+    #
+    class LinkSanitizer < Rails::HTML::Sanitizer
+      include HTML::Concern::ComposedSanitize
+      include HTML::Concern::Parser::HTML5
+      include HTML::Concern::Scrubber::Link
+      include HTML::Concern::Serializer::UTF8Encode
+    end
 
-      def allowed_attributes(options)
-        options[:attributes] || self.class.allowed_attributes
-      end
+    # == Rails::HTML5::SafeListSanitizer
+    #
+    # Sanitizes HTML5 and CSS from an extensive safe list.
+    #
+    # === Whitespace
+    #
+    # We can't make any guarantees about whitespace being kept or stripped.  Loofah uses Nokogiri,
+    # which wraps either a C or Java parser for the respective Ruby implementation.  Those two
+    # parsers determine how whitespace is ultimately handled.
+    #
+    # When the stripped markup will be rendered the users browser won't take whitespace into account
+    # anyway. It might be better to suggest your users wrap their whitespace sensitive content in
+    # pre tags or that you do so automatically.
+    #
+    # === Options
+    #
+    # Sanitizes both html and css via the safe lists found in
+    # Rails::HTML::Concern::Scrubber::SafeList
+    #
+    # SafeListSanitizer also accepts options to configure the safe list used when sanitizing html.
+    # There's a class level option:
+    #
+    #   Rails::HTML5::SafeListSanitizer.allowed_tags = %w(table tr td)
+    #   Rails::HTML5::SafeListSanitizer.allowed_attributes = %w(id class style)
+    #
+    # Tags and attributes can also be passed to +sanitize+.  Passed options take precedence over the
+    # class level options.
+    #
+    # === Examples
+    #
+    #   safe_list_sanitizer = Rails::HTML5::SafeListSanitizer.new
+    #
+    #   # default: sanitize via a extensive safe list of allowed elements
+    #   safe_list_sanitizer.sanitize(@article.body)
+    #
+    #   # sanitize via the supplied tags and attributes
+    #   safe_list_sanitizer.sanitize(
+    #     @article.body,
+    #     tags: %w(table tr td),
+    #     attributes: %w(id class style),
+    #   )
+    #
+    #   # sanitize via a custom Loofah scrubber
+    #   safe_list_sanitizer.sanitize(@article.body, scrubber: ArticleScrubber.new)
+    #
+    #   # prune nodes from the tree instead of stripping tags and leaving inner content
+    #   safe_list_sanitizer = Rails::HTML5::SafeListSanitizer.new(prune: true)
+    #
+    #   # the sanitizer can also sanitize CSS
+    #   safe_list_sanitizer.sanitize_css('background-color: #000;')
+    #
+    class SafeListSanitizer < Rails::HTML::Sanitizer
+      include HTML::Concern::ComposedSanitize
+      include HTML::Concern::Parser::HTML5
+      include HTML::Concern::Scrubber::SafeList
+      include HTML::Concern::Serializer::UTF8Encode
     end
+  end if Rails::HTML::Sanitizer.html5_support?
 
-    WhiteListSanitizer = SafeListSanitizer
+  module HTML
+    Sanitizer.extend(HTML4::Sanitizer::VendorMethods) # :nodoc:
+    FullSanitizer = HTML4::FullSanitizer # :nodoc:
+    LinkSanitizer = HTML4::LinkSanitizer # :nodoc:
+    SafeListSanitizer = HTML4::SafeListSanitizer # :nodoc:
+    WhiteListSanitizer = SafeListSanitizer # :nodoc:
   end
 end
diff --git a/lib/rails/html/sanitizer/version.rb b/lib/rails/html/sanitizer/version.rb
index 3ceb4c8..e478448 100644
--- a/lib/rails/html/sanitizer/version.rb
+++ b/lib/rails/html/sanitizer/version.rb
@@ -1,7 +1,9 @@
+# frozen_string_literal: true
+
 module Rails
-  module Html
+  module HTML
     class Sanitizer
-      VERSION = "1.4.4"
+      VERSION = "1.6.0"
     end
   end
 end
diff --git a/lib/rails/html/scrubbers.rb b/lib/rails/html/scrubbers.rb
index 674d1c4..af53db4 100644
--- a/lib/rails/html/scrubbers.rb
+++ b/lib/rails/html/scrubbers.rb
@@ -1,10 +1,12 @@
+# frozen_string_literal: true
+
 module Rails
-  module Html
-    # === Rails::Html::PermitScrubber
+  module HTML
+    # === Rails::HTML::PermitScrubber
     #
-    # +Rails::Html::PermitScrubber+ allows you to permit only your own tags and/or attributes.
+    # +Rails::HTML::PermitScrubber+ allows you to permit only your own tags and/or attributes.
     #
-    # +Rails::Html::PermitScrubber+ can be subclassed to determine:
+    # +Rails::HTML::PermitScrubber+ can be subclassed to determine:
     # - When a node should be skipped via +skip_node?+.
     # - When a node is allowed via +allowed_node?+.
     # - When an attribute should be scrubbed via +scrub_attribute?+.
@@ -27,7 +29,7 @@ module Rails
     # If set, attributes excluded will be removed.
     # If not, attributes are removed based on Loofahs +HTML5::Scrub.scrub_attributes+.
     #
-    #  class CommentScrubber < Html::PermitScrubber
+    #  class CommentScrubber < Rails::HTML::PermitScrubber
     #    def initialize
     #      super
     #      self.tags = %w(form script comment blockquote)
@@ -45,10 +47,11 @@ module Rails
     # See the documentation for +Nokogiri::XML::Node+ to understand what's possible
     # with nodes: https://nokogiri.org/rdoc/Nokogiri/XML/Node.html
     class PermitScrubber < Loofah::Scrubber
-      attr_reader :tags, :attributes
+      attr_reader :tags, :attributes, :prune
 
-      def initialize
-        @direction = :bottom_up
+      def initialize(prune: false)
+        @prune = prune
+        @direction = @prune ? :top_down : :bottom_up
         @tags, @attributes = nil, nil
       end
 
@@ -76,90 +79,89 @@ module Rails
       end
 
       protected
+        def allowed_node?(node)
+          @tags.include?(node.name)
+        end
 
-      def allowed_node?(node)
-        @tags.include?(node.name)
-      end
+        def skip_node?(node)
+          node.text?
+        end
 
-      def skip_node?(node)
-        node.text?
-      end
+        def scrub_attribute?(name)
+          !@attributes.include?(name)
+        end
 
-      def scrub_attribute?(name)
-        !@attributes.include?(name)
-      end
+        def keep_node?(node)
+          if @tags
+            allowed_node?(node)
+          else
+            Loofah::HTML5::Scrub.allowed_element?(node.name)
+          end
+        end
 
-      def keep_node?(node)
-        if @tags
-          allowed_node?(node)
-        else
-          Loofah::HTML5::Scrub.allowed_element?(node.name)
+        def scrub_node(node)
+          node.before(node.children) unless prune # strip
+          node.remove
         end
-      end
 
-      def scrub_node(node)
-        node.before(node.children) # strip
-        node.remove
-      end
+        def scrub_attributes(node)
+          if @attributes
+            node.attribute_nodes.each do |attr|
+              attr.remove if scrub_attribute?(attr.name)
+              scrub_attribute(node, attr)
+            end
 
-      def scrub_attributes(node)
-        if @attributes
-          node.attribute_nodes.each do |attr|
-            attr.remove if scrub_attribute?(attr.name)
-            scrub_attribute(node, attr)
+            scrub_css_attribute(node)
+          else
+            Loofah::HTML5::Scrub.scrub_attributes(node)
           end
-
-          scrub_css_attribute(node)
-        else
-          Loofah::HTML5::Scrub.scrub_attributes(node)
         end
-      end
 
-      def scrub_css_attribute(node)
-        if Loofah::HTML5::Scrub.respond_to?(:scrub_css_attribute)
-          Loofah::HTML5::Scrub.scrub_css_attribute(node)
-        else
-          style = node.attributes['style']
-          style.value = Loofah::HTML5::Scrub.scrub_css(style.value) if style
+        def scrub_css_attribute(node)
+          if Loofah::HTML5::Scrub.respond_to?(:scrub_css_attribute)
+            Loofah::HTML5::Scrub.scrub_css_attribute(node)
+          else
+            style = node.attributes["style"]
+            style.value = Loofah::HTML5::Scrub.scrub_css(style.value) if style
+          end
         end
-      end
 
-      def validate!(var, name)
-        if var && !var.is_a?(Enumerable)
-          raise ArgumentError, "You should pass :#{name} as an Enumerable"
+        def validate!(var, name)
+          if var && !var.is_a?(Enumerable)
+            raise ArgumentError, "You should pass :#{name} as an Enumerable"
+          end
+          var
         end
-        var
-      end
 
-      def scrub_attribute(node, attr_node)
-        attr_name = if attr_node.namespace
-                      "#{attr_node.namespace.prefix}:#{attr_node.node_name}"
-                    else
-                      attr_node.node_name
-                    end
+        def scrub_attribute(node, attr_node)
+          attr_name = if attr_node.namespace
+            "#{attr_node.namespace.prefix}:#{attr_node.node_name}"
+          else
+            attr_node.node_name
+          end
 
-        if Loofah::HTML5::SafeList::ATTR_VAL_IS_URI.include?(attr_name)
-          return if Loofah::HTML5::Scrub.scrub_uri_attribute(attr_node)
-        end
+          if Loofah::HTML5::SafeList::ATTR_VAL_IS_URI.include?(attr_name)
+            return if Loofah::HTML5::Scrub.scrub_uri_attribute(attr_node)
+          end
 
-        if Loofah::HTML5::SafeList::SVG_ATTR_VAL_ALLOWS_REF.include?(attr_name)
-          Loofah::HTML5::Scrub.scrub_attribute_that_allows_local_ref(attr_node)
-        end
+          if Loofah::HTML5::SafeList::SVG_ATTR_VAL_ALLOWS_REF.include?(attr_name)
+            Loofah::HTML5::Scrub.scrub_attribute_that_allows_local_ref(attr_node)
+          end
 
-        if Loofah::HTML5::SafeList::SVG_ALLOW_LOCAL_HREF.include?(node.name) && attr_name == 'xlink:href' && attr_node.value =~ /^\s*[^#\s].*/m
-          attr_node.remove
-        end
+          if Loofah::HTML5::SafeList::SVG_ALLOW_LOCAL_HREF.include?(node.name) && attr_name == "xlink:href" && attr_node.value =~ /^\s*[^#\s].*/m
+            attr_node.remove
+          end
 
-        node.remove_attribute(attr_node.name) if attr_name == 'src' && attr_node.value !~ /[^[:space:]]/
+          node.remove_attribute(attr_node.name) if attr_name == "src" && attr_node.value !~ /[^[:space:]]/
 
-        Loofah::HTML5::Scrub.force_correct_attribute_escaping! node
-      end
+          Loofah::HTML5::Scrub.force_correct_attribute_escaping! node
+        end
     end
 
-    # === Rails::Html::TargetScrubber
+    # === Rails::HTML::TargetScrubber
     #
-    # Where +Rails::Html::PermitScrubber+ picks out tags and attributes to permit in
-    # sanitization, +Rails::Html::TargetScrubber+ targets them for removal.
+    # Where +Rails::HTML::PermitScrubber+ picks out tags and attributes to permit in
+    # sanitization, +Rails::HTML::TargetScrubber+ targets them for removal.
     #
     # +tags=+
     # If set, elements included will be stripped.
@@ -176,9 +178,9 @@ module Rails
       end
     end
 
-    # === Rails::Html::TextOnlyScrubber
+    # === Rails::HTML::TextOnlyScrubber
     #
-    # +Rails::Html::TextOnlyScrubber+ allows you to permit text nodes.
+    # +Rails::HTML::TextOnlyScrubber+ allows you to permit text nodes.
     #
     # Unallowed elements will be stripped, i.e. element is removed but its subtree kept.
     class TextOnlyScrubber < Loofah::Scrubber
diff --git a/rails-html-sanitizer.gemspec b/rails-html-sanitizer.gemspec
index b2a476e..2ff2cf2 100644
--- a/rails-html-sanitizer.gemspec
+++ b/rails-html-sanitizer.gemspec
@@ -2,41 +2,36 @@
 # This file has been automatically generated by gem2tgz #
 #########################################################
 # -*- encoding: utf-8 -*-
-# stub: rails-html-sanitizer 1.4.4 ruby lib
+# stub: rails-html-sanitizer 1.6.0 ruby lib
 
 Gem::Specification.new do |s|
   s.name = "rails-html-sanitizer".freeze
-  s.version = "1.4.4"
+  s.version = "1.6.0"
 
   s.required_rubygems_version = Gem::Requirement.new(">= 0".freeze) if s.respond_to? :required_rubygems_version=
-  s.metadata = { "bug_tracker_uri" => "https://github.com/rails/rails-html-sanitizer/issues", "changelog_uri" => "https://github.com/rails/rails-html-sanitizer/blob/v1.4.4/CHANGELOG.md", "documentation_uri" => "https://www.rubydoc.info/gems/rails-html-sanitizer/1.4.4", "source_code_uri" => "https://github.com/rails/rails-html-sanitizer/tree/v1.4.4" } if s.respond_to? :metadata=
+  s.metadata = { "bug_tracker_uri" => "https://github.com/rails/rails-html-sanitizer/issues", "changelog_uri" => "https://github.com/rails/rails-html-sanitizer/blob/v1.6.0/CHANGELOG.md", "documentation_uri" => "https://www.rubydoc.info/gems/rails-html-sanitizer/1.6.0", "source_code_uri" => "https://github.com/rails/rails-html-sanitizer/tree/v1.6.0" } if s.respond_to? :metadata=
   s.require_paths = ["lib".freeze]
-  s.authors = ["Rafael Mendon\u00E7a Fran\u00E7a".freeze, "Kasper Timm Hansen".freeze]
-  s.date = "2022-12-13"
+  s.authors = ["Rafael Mendon\u00E7a Fran\u00E7a".freeze, "Kasper Timm Hansen".freeze, "Mike Dalessio".freeze]
+  s.date = "2023-05-26"
   s.description = "HTML sanitization for Rails applications".freeze
-  s.email = ["rafaelmfranca@gmail.com".freeze, "kaspth@gmail.com".freeze]
-  s.files = ["CHANGELOG.md".freeze, "MIT-LICENSE".freeze, "README.md".freeze, "lib/rails-html-sanitizer.rb".freeze, "lib/rails/html/sanitizer.rb".freeze, "lib/rails/html/sanitizer/version.rb".freeze, "lib/rails/html/scrubbers.rb".freeze, "test/sanitizer_test.rb".freeze, "test/scrubbers_test.rb".freeze]
+  s.email = ["rafaelmfranca@gmail.com".freeze, "kaspth@gmail.com".freeze, "mike.dalessio@gmail.com".freeze]
+  s.files = ["CHANGELOG.md".freeze, "MIT-LICENSE".freeze, "README.md".freeze, "lib/rails-html-sanitizer.rb".freeze, "lib/rails/html/sanitizer.rb".freeze, "lib/rails/html/sanitizer/version.rb".freeze, "lib/rails/html/scrubbers.rb".freeze, "test/rails_api_test.rb".freeze, "test/sanitizer_test.rb".freeze, "test/scrubbers_test.rb".freeze]
   s.homepage = "https://github.com/rails/rails-html-sanitizer".freeze
   s.licenses = ["MIT".freeze]
-  s.rubygems_version = "3.3.15".freeze
+  s.required_ruby_version = Gem::Requirement.new(">= 2.7.0".freeze)
+  s.rubygems_version = "3.2.5".freeze
   s.summary = "This gem is responsible to sanitize HTML fragments in Rails applications.".freeze
-  s.test_files = ["test/sanitizer_test.rb".freeze, "test/scrubbers_test.rb".freeze]
+  s.test_files = ["test/rails_api_test.rb".freeze, "test/sanitizer_test.rb".freeze, "test/scrubbers_test.rb".freeze]
 
   if s.respond_to? :specification_version then
     s.specification_version = 4
   end
 
   if s.respond_to? :add_runtime_dependency then
-    s.add_development_dependency(%q<bundler>.freeze, [">= 1.3"])
-    s.add_runtime_dependency(%q<loofah>.freeze, ["~> 2.19", ">= 2.19.1"])
-    s.add_development_dependency(%q<minitest>.freeze, [">= 0"])
-    s.add_development_dependency(%q<rails-dom-testing>.freeze, [">= 0"])
-    s.add_development_dependency(%q<rake>.freeze, [">= 0"])
+    s.add_runtime_dependency(%q<loofah>.freeze, ["~> 2.21"])
+    s.add_runtime_dependency(%q<nokogiri>.freeze, ["~> 1.14"])
   else
-    s.add_dependency(%q<bundler>.freeze, [">= 1.3"])
-    s.add_dependency(%q<loofah>.freeze, ["~> 2.19", ">= 2.19.1"])
-    s.add_dependency(%q<minitest>.freeze, [">= 0"])
-    s.add_dependency(%q<rails-dom-testing>.freeze, [">= 0"])
-    s.add_dependency(%q<rake>.freeze, [">= 0"])
+    s.add_dependency(%q<loofah>.freeze, ["~> 2.21"])
+    s.add_dependency(%q<nokogiri>.freeze, ["~> 1.14"])
   end
 end
diff --git a/test/rails_api_test.rb b/test/rails_api_test.rb
new file mode 100644
index 0000000..9bc1107
--- /dev/null
+++ b/test/rails_api_test.rb
@@ -0,0 +1,88 @@
+# frozen_string_literal: true
+
+require "minitest/autorun"
+require "rails-html-sanitizer"
+
+class RailsApiTest < Minitest::Test
+  def test_html_module_name_alias
+    assert_equal(Rails::Html, Rails::HTML)
+    assert_equal("Rails::HTML", Rails::Html.name)
+    assert_equal("Rails::HTML", Rails::HTML.name)
+  end
+
+  def test_html_scrubber_class_names
+    assert(Rails::Html::PermitScrubber)
+    assert(Rails::Html::TargetScrubber)
+    assert(Rails::Html::TextOnlyScrubber)
+    assert(Rails::Html::Sanitizer)
+  end
+
+  def test_best_supported_vendor_when_html5_is_not_supported_returns_html4
+    Rails::HTML::Sanitizer.stub(:html5_support?, false) do
+      assert_equal(Rails::HTML4::Sanitizer, Rails::HTML::Sanitizer.best_supported_vendor)
+    end
+  end
+
+  def test_best_supported_vendor_when_html5_is_supported_returns_html5
+    skip("no HTML5 support on this platform") unless Rails::HTML::Sanitizer.html5_support?
+
+    Rails::HTML::Sanitizer.stub(:html5_support?, true) do
+      assert_equal(Rails::HTML5::Sanitizer, Rails::HTML::Sanitizer.best_supported_vendor)
+    end
+  end
+
+  def test_html4_sanitizer_alias_full
+    assert_equal(Rails::HTML4::FullSanitizer, Rails::HTML::FullSanitizer)
+    assert_equal("Rails::HTML4::FullSanitizer", Rails::HTML::FullSanitizer.name)
+  end
+
+  def test_html4_sanitizer_alias_link
+    assert_equal(Rails::HTML4::LinkSanitizer, Rails::HTML::LinkSanitizer)
+    assert_equal("Rails::HTML4::LinkSanitizer", Rails::HTML::LinkSanitizer.name)
+  end
+
+  def test_html4_sanitizer_alias_safe_list
+    assert_equal(Rails::HTML4::SafeListSanitizer, Rails::HTML::SafeListSanitizer)
+    assert_equal("Rails::HTML4::SafeListSanitizer", Rails::HTML::SafeListSanitizer.name)
+  end
+
+  def test_html4_full_sanitizer
+    assert_equal(Rails::HTML4::FullSanitizer, Rails::HTML::Sanitizer.full_sanitizer)
+    assert_equal(Rails::HTML4::FullSanitizer, Rails::HTML4::Sanitizer.full_sanitizer)
+  end
+
+  def test_html4_link_sanitizer
+    assert_equal(Rails::HTML4::LinkSanitizer, Rails::HTML::Sanitizer.link_sanitizer)
+    assert_equal(Rails::HTML4::LinkSanitizer, Rails::HTML4::Sanitizer.link_sanitizer)
+  end
+
+  def test_html4_safe_list_sanitizer
+    assert_equal(Rails::HTML4::SafeListSanitizer, Rails::HTML::Sanitizer.safe_list_sanitizer)
+    assert_equal(Rails::HTML4::SafeListSanitizer, Rails::HTML4::Sanitizer.safe_list_sanitizer)
+  end
+
+  def test_html4_white_list_sanitizer
+    assert_equal(Rails::HTML4::SafeListSanitizer, Rails::HTML::Sanitizer.white_list_sanitizer)
+    assert_equal(Rails::HTML4::SafeListSanitizer, Rails::HTML4::Sanitizer.white_list_sanitizer)
+  end
+
+  def test_html5_full_sanitizer
+    skip("no HTML5 support on this platform") unless Rails::HTML::Sanitizer.html5_support?
+    assert_equal(Rails::HTML5::FullSanitizer, Rails::HTML5::Sanitizer.full_sanitizer)
+  end
+
+  def test_html5_link_sanitizer
+    skip("no HTML5 support on this platform") unless Rails::HTML::Sanitizer.html5_support?
+    assert_equal(Rails::HTML5::LinkSanitizer, Rails::HTML5::Sanitizer.link_sanitizer)
+  end
+
+  def test_html5_safe_list_sanitizer
+    skip("no HTML5 support on this platform") unless Rails::HTML::Sanitizer.html5_support?
+    assert_equal(Rails::HTML5::SafeListSanitizer, Rails::HTML5::Sanitizer.safe_list_sanitizer)
+  end
+
+  def test_html5_white_list_sanitizer
+    skip("no HTML5 support on this platform") unless Rails::HTML::Sanitizer.html5_support?
+    assert_equal(Rails::HTML5::SafeListSanitizer, Rails::HTML5::Sanitizer.white_list_sanitizer)
+  end
+end
diff --git a/test/sanitizer_test.rb b/test/sanitizer_test.rb
index cd0b046..6af882a 100644
--- a/test/sanitizer_test.rb
+++ b/test/sanitizer_test.rb
@@ -1,771 +1,1087 @@
+# frozen_string_literal: true
+
 require "minitest/autorun"
 require "rails-html-sanitizer"
-require "rails/dom/testing/assertions/dom_assertions"
 
-puts Nokogiri::VERSION_INFO
+puts "nokogiri version info: #{Nokogiri::VERSION_INFO}"
+puts "html5 support: #{Rails::HTML::Sanitizer.html5_support?}"
+
+#
+#  NOTE that many of these tests contain multiple acceptable results.
+#
+#  In some cases, this is because of how the HTML4 parser's recovery behavior changed in libxml2
+#  2.9.14 and 2.10.0. For more details, see:
+#
+#  - https://github.com/sparklemotion/nokogiri/releases/tag/v1.13.5
+#  - https://gitlab.gnome.org/GNOME/libxml2/-/issues/380
+#
+#  In other cases, multiple acceptable results are provided because Nokogiri's vendored libxml2 is
+#  patched to entity-escape server-side includes (aks "SSI", aka `<!-- #directive param=value -->`).
+#
+#  In many other cases, it's because the parser used by Nokogiri on JRuby (xerces+nekohtml) parses
+#  slightly differently than libxml2 in edge cases.
+#
+module SanitizerTests
+  def self.loofah_html5_support?
+    Loofah.respond_to?(:html5_support?) && Loofah.html5_support?
+  end
+
+  class BaseSanitizerTest < Minitest::Test
+    class XpathRemovalTestSanitizer < Rails::HTML::Sanitizer
+      def sanitize(html, options = {})
+        fragment = Loofah.fragment(html)
+        remove_xpaths(fragment, options[:xpaths]).to_s
+      end
+    end
 
-class SanitizersTest < Minitest::Test
-  include Rails::Dom::Testing::Assertions::DomAssertions
+    def test_sanitizer_sanitize_raises_not_implemented_error
+      assert_raises NotImplementedError do
+        Rails::HTML::Sanitizer.new.sanitize("asdf")
+      end
+    end
 
-  def test_sanitizer_sanitize_raises_not_implemented_error
-    assert_raises NotImplementedError do
-      Rails::Html::Sanitizer.new.sanitize('')
+    def test_remove_xpaths_removes_an_xpath
+      html = %(<h1>hello <script>code!</script></h1>)
+      assert_equal %(<h1>hello </h1>), xpath_sanitize(html, xpaths: %w(.//script))
     end
-  end
 
-  def test_sanitize_nested_script
-    assert_equal '&lt;script&gt;alert("XSS");&lt;/script&gt;', safe_list_sanitize('<script><script></script>alert("XSS");<script><</script>/</script><script>script></script>', tags: %w(em))
-  end
+    def test_remove_xpaths_removes_all_occurrences_of_xpath
+      html = %(<section><header><script>code!</script></header><p>hello <script>code!</script></p></section>)
+      assert_equal %(<section><header></header><p>hello </p></section>), xpath_sanitize(html, xpaths: %w(.//script))
+    end
 
-  def test_sanitize_nested_script_in_style
-    assert_equal '&lt;script&gt;alert("XSS");&lt;/script&gt;', safe_list_sanitize('<style><script></style>alert("XSS");<style><</style>/</style><style>script></style>', tags: %w(em))
-  end
+    def test_remove_xpaths_called_with_faulty_xpath
+      assert_raises Nokogiri::XML::XPath::SyntaxError do
+        xpath_sanitize("<h1>hello<h1>", xpaths: %w(..faulty_xpath))
+      end
+    end
 
-  class XpathRemovalTestSanitizer < Rails::Html::Sanitizer
-    def sanitize(html, options = {})
-      fragment = Loofah.fragment(html)
-      remove_xpaths(fragment, options[:xpaths]).to_s
+    def test_remove_xpaths_called_with_xpath_string
+      assert_equal "", xpath_sanitize("<a></a>", xpaths: ".//a")
     end
-  end
 
-  def test_remove_xpaths_removes_an_xpath
-    html = %(<h1>hello <script>code!</script></h1>)
-    assert_equal %(<h1>hello </h1>), xpath_sanitize(html, xpaths: %w(.//script))
-  end
+    def test_remove_xpaths_called_with_enumerable_xpaths
+      assert_equal "", xpath_sanitize("<a><span></span></a>", xpaths: %w(.//a .//span))
+    end
 
-  def test_remove_xpaths_removes_all_occurrences_of_xpath
-    html = %(<section><header><script>code!</script></header><p>hello <script>code!</script></p></section>)
-    assert_equal %(<section><header></header><p>hello </p></section>), xpath_sanitize(html, xpaths: %w(.//script))
+    protected
+      def xpath_sanitize(input, options = {})
+        XpathRemovalTestSanitizer.new.sanitize(input, options)
+      end
   end
 
-  def test_remove_xpaths_called_with_faulty_xpath
-    assert_raises Nokogiri::XML::XPath::SyntaxError do
-      xpath_sanitize('<h1>hello<h1>', xpaths: %w(..faulty_xpath))
+  module ModuleUnderTest
+    def module_under_test
+      self.class.instance_variable_get(:@module_under_test)
     end
   end
 
-  def test_remove_xpaths_called_with_xpath_string
-    assert_equal '', xpath_sanitize('<a></a>', xpaths: './/a')
-  end
+  module FullSanitizerTest
+    include ModuleUnderTest
 
-  def test_remove_xpaths_called_with_enumerable_xpaths
-    assert_equal '', xpath_sanitize('<a><span></span></a>', xpaths: %w(.//a .//span))
-  end
+    def test_strip_tags_with_quote
+      input = '<" <img src="trollface.gif" onload="alert(1)"> hi'
+      result = full_sanitize(input)
+      acceptable_results = [
+        # libxml2 >= 2.9.14 and xerces+neko
+        %{&lt;"  hi},
+        # other libxml2
+        %{ hi},
+      ]
 
-  def test_strip_tags_with_quote
-    input = '<" <img src="trollface.gif" onload="alert(1)"> hi'
-    expected = libxml_2_9_14_recovery_lt? ? %{&lt;"  hi} : %{ hi}
-    assert_equal(expected, full_sanitize(input))
-  end
+      assert_includes(acceptable_results, result)
+    end
 
-  def test_strip_invalid_html
-    assert_equal "&lt;&lt;", full_sanitize("<<<bad html")
-  end
+    def test_strip_invalid_html
+      assert_equal "&lt;&lt;", full_sanitize("<<<bad html")
+    end
 
-  def test_strip_nested_tags
-    expected = "Wei&lt;a onclick='alert(document.cookie);'/&gt;rdos"
-    input = "Wei<<a>a onclick='alert(document.cookie);'</a>/>rdos"
-    assert_equal expected, full_sanitize(input)
-  end
+    def test_strip_nested_tags
+      expected = "Wei&lt;a onclick='alert(document.cookie);'/&gt;rdos"
+      input = "Wei<<a>a onclick='alert(document.cookie);'</a>/>rdos"
+      assert_equal expected, full_sanitize(input)
+    end
 
-  def test_strip_tags_multiline
-    expected = %{This is a test.\n\n\n\nIt no longer contains any HTML.\n}
-    input = %{<title>This is <b>a <a href="" target="_blank">test</a></b>.</title>\n\n<!-- it has a comment -->\n\n<p>It no <b>longer <strong>contains <em>any <strike>HTML</strike></em>.</strong></b></p>\n}
+    def test_strip_tags_multiline
+      expected = %{This is a test.\n\n\n\nIt no longer contains any HTML.\n}
+      input = %{<h1>This is <b>a <a href="" target="_blank">test</a></b>.</h1>\n\n<!-- it has a comment -->\n\n<p>It no <b>longer <strong>contains <em>any <strike>HTML</strike></em>.</strong></b></p>\n}
 
-    assert_equal expected, full_sanitize(input)
-  end
+      assert_equal expected, full_sanitize(input)
+    end
 
-  def test_remove_unclosed_tags
-    input = "This is <-- not\n a comment here."
-    expected = libxml_2_9_14_recovery_lt? ? %{This is &lt;-- not\n a comment here.} : %{This is }
-    assert_equal(expected, full_sanitize(input))
-  end
+    def test_remove_unclosed_tags
+      input = "This is <-- not\n a comment here."
+      result = full_sanitize(input)
+      acceptable_results = [
+        # libxml2 >= 2.9.14 and xerces+neko
+        %{This is &lt;-- not\n a comment here.},
+        # other libxml2
+        %{This is },
+      ]
+
+      assert_includes(acceptable_results, result)
+    end
 
-  def test_strip_cdata
-    input = "This has a <![CDATA[<section>]]> here."
-    expected = libxml_2_9_14_recovery_lt_bang? ? %{This has a &lt;![CDATA[]]&gt; here.} : %{This has a ]]&gt; here.}
-    assert_equal(expected, full_sanitize(input))
-  end
+    def test_strip_cdata
+      input = "This has a <![CDATA[<section>]]> here."
+      result = full_sanitize(input)
+      acceptable_results = [
+        # libxml2 = 2.9.14
+        %{This has a &lt;![CDATA[]]&gt; here.},
+        # other libxml2
+        %{This has a ]]&gt; here.},
+        # xerces+neko
+        %{This has a  here.},
+      ]
+
+      assert_includes(acceptable_results, result)
+    end
 
-  def test_strip_unclosed_cdata
-    input = "This has an unclosed <![CDATA[<section>]] here..."
-    expected = libxml_2_9_14_recovery_lt_bang? ? %{This has an unclosed &lt;![CDATA[]] here...} : %{This has an unclosed ]] here...}
-    assert_equal(expected, full_sanitize(input))
-  end
+    def test_strip_blank_string
+      assert_nil full_sanitize(nil)
+      assert_equal "", full_sanitize("")
+      assert_equal "   ", full_sanitize("   ")
+    end
 
-  def test_strip_blank_string
-    assert_nil full_sanitize(nil)
-    assert_equal "", full_sanitize("")
-    assert_equal "   ", full_sanitize("   ")
-  end
+    def test_strip_tags_with_plaintext
+      assert_equal "Don't touch me", full_sanitize("Don't touch me")
+    end
 
-  def test_strip_tags_with_plaintext
-    assert_equal "Don't touch me", full_sanitize("Don't touch me")
-  end
+    def test_strip_tags_with_tags
+      assert_equal "This is a test.", full_sanitize("<p>This <u>is<u> a <a href='test.html'><strong>test</strong></a>.</p>")
+    end
 
-  def test_strip_tags_with_tags
-    assert_equal "This is a test.", full_sanitize("<p>This <u>is<u> a <a href='test.html'><strong>test</strong></a>.</p>")
-  end
+    def test_escape_tags_with_many_open_quotes
+      assert_equal "&lt;&lt;", full_sanitize("<<<bad html>")
+    end
 
-  def test_escape_tags_with_many_open_quotes
-    assert_equal "&lt;&lt;", full_sanitize("<<<bad html>")
-  end
+    def test_strip_tags_with_sentence
+      assert_equal "This is a test.", full_sanitize("This is a test.")
+    end
 
-  def test_strip_tags_with_sentence
-    assert_equal "This is a test.", full_sanitize("This is a test.")
-  end
+    def test_strip_tags_with_comment
+      assert_equal "This has a  here.", full_sanitize("This has a <!-- comment --> here.")
+    end
 
-  def test_strip_tags_with_comment
-    assert_equal "This has a  here.", full_sanitize("This has a <!-- comment --> here.")
-  end
+    def test_strip_tags_with_frozen_string
+      assert_equal "Frozen string with no tags", full_sanitize("Frozen string with no tags")
+    end
 
-  def test_strip_tags_with_frozen_string
-    assert_equal "Frozen string with no tags", full_sanitize("Frozen string with no tags".freeze)
-  end
+    def test_full_sanitize_respect_html_escaping_of_the_given_string
+      assert_equal 'test\r\nstring', full_sanitize('test\r\nstring')
+      assert_equal "&amp;", full_sanitize("&")
+      assert_equal "&amp;", full_sanitize("&amp;")
+      assert_equal "&amp;amp;", full_sanitize("&amp;amp;")
+      assert_equal "omg &lt;script&gt;BOM&lt;/script&gt;", full_sanitize("omg &lt;script&gt;BOM&lt;/script&gt;")
+    end
 
-  def test_full_sanitize_respect_html_escaping_of_the_given_string
-    assert_equal 'test\r\nstring', full_sanitize('test\r\nstring')
-    assert_equal '&amp;', full_sanitize('&')
-    assert_equal '&amp;', full_sanitize('&amp;')
-    assert_equal '&amp;amp;', full_sanitize('&amp;amp;')
-    assert_equal 'omg &lt;script&gt;BOM&lt;/script&gt;', full_sanitize('omg &lt;script&gt;BOM&lt;/script&gt;')
-  end
+    def test_sanitize_ascii_8bit_string
+      full_sanitize("<div><a>hello</a></div>".encode("ASCII-8BIT")).tap do |sanitized|
+        assert_equal "hello", sanitized
+        assert_equal Encoding::UTF_8, sanitized.encoding
+      end
+    end
 
-  def test_strip_links_with_tags_in_tags
-    expected = "&lt;a href='hello'&gt;all <b>day</b> long&lt;/a&gt;"
-    input = "<<a>a href='hello'>all <b>day</b> long<</A>/a>"
-    assert_equal expected, link_sanitize(input)
+    protected
+      def full_sanitize(input, options = {})
+        module_under_test::FullSanitizer.new.sanitize(input, options)
+      end
   end
 
-  def test_strip_links_with_unclosed_tags
-    assert_equal "", link_sanitize("<a<a")
+  class HTML4FullSanitizerTest < Minitest::Test
+    @module_under_test = Rails::HTML4
+    include FullSanitizerTest
   end
 
-  def test_strip_links_with_plaintext
-    assert_equal "Don't touch me", link_sanitize("Don't touch me")
-  end
+  class HTML5FullSanitizerTest < Minitest::Test
+    @module_under_test = Rails::HTML5
+    include FullSanitizerTest
+  end if loofah_html5_support?
 
-  def test_strip_links_with_line_feed_and_uppercase_tag
-    assert_equal "on my mind\nall day long", link_sanitize("<a href='almost'>on my mind</a>\n<A href='almost'>all day long</A>")
-  end
+  module LinkSanitizerTest
+    include ModuleUnderTest
 
-  def test_strip_links_leaves_nonlink_tags
-    assert_equal "My mind\nall <b>day</b> long", link_sanitize("<a href='almost'>My mind</a>\n<A href='almost'>all <b>day</b> long</A>")
-  end
+    def test_strip_links_with_tags_in_tags
+      expected = "&lt;a href='hello'&gt;all <b>day</b> long&lt;/a&gt;"
+      input = "<<a>a href='hello'>all <b>day</b> long<</A>/a>"
+      assert_equal expected, link_sanitize(input)
+    end
 
-  def test_strip_links_with_links
-    assert_equal "0wn3d", link_sanitize("<a href='http://www.rubyonrails.com/'><a href='http://www.rubyonrails.com/' onlclick='steal()'>0wn3d</a></a>")
-  end
+    def test_strip_links_with_unclosed_tags
+      assert_equal "", link_sanitize("<a<a")
+    end
 
-  def test_strip_links_with_linkception
-    assert_equal "Magic", link_sanitize("<a href='http://www.rubyonrails.com/'>Mag<a href='http://www.ruby-lang.org/'>ic")
-  end
+    def test_strip_links_with_plaintext
+      assert_equal "Don't touch me", link_sanitize("Don't touch me")
+    end
 
-  def test_sanitize_form
-    assert_sanitized "<form action=\"/foo/bar\" method=\"post\"><input></form>", ''
-  end
+    def test_strip_links_with_line_feed_and_uppercase_tag
+      assert_equal "on my mind\nall day long", link_sanitize("<a href='almost'>on my mind</a>\n<A href='almost'>all day long</A>")
+    end
 
-  def test_sanitize_plaintext
-    assert_sanitized "<plaintext><span>foo</span></plaintext>", "<span>foo</span>"
-  end
+    def test_strip_links_leaves_nonlink_tags
+      assert_equal "My mind\nall <b>day</b> long", link_sanitize("<a href='almost'>My mind</a>\n<A href='almost'>all <b>day</b> long</A>")
+    end
 
-  def test_sanitize_script
-    assert_sanitized "a b c<script language=\"Javascript\">blah blah blah</script>d e f", "a b cblah blah blahd e f"
-  end
+    def test_strip_links_with_links
+      assert_equal "0wn3d", link_sanitize("<a href='http://www.rubyonrails.com/'><a href='http://www.rubyonrails.com/' onlclick='steal()'>0wn3d</a></a>")
+    end
 
-  def test_sanitize_js_handlers
-    raw = %{onthis="do that" <a href="#" onclick="hello" name="foo" onbogus="remove me">hello</a>}
-    assert_sanitized raw, %{onthis="do that" <a href="#" name="foo">hello</a>}
-  end
+    def test_strip_links_with_linkception
+      assert_equal "Magic", link_sanitize("<a href='http://www.rubyonrails.com/'>Mag<a href='http://www.ruby-lang.org/'>ic")
+    end
+
+    def test_sanitize_ascii_8bit_string
+      link_sanitize("<div><a>hello</a></div>".encode("ASCII-8BIT")).tap do |sanitized|
+        assert_equal "<div>hello</div>", sanitized
+        assert_equal Encoding::UTF_8, sanitized.encoding
+      end
+    end
 
-  def test_sanitize_javascript_href
-    raw = %{href="javascript:bang" <a href="javascript:bang" name="hello">foo</a>, <span href="javascript:bang">bar</span>}
-    assert_sanitized raw, %{href="javascript:bang" <a name="hello">foo</a>, <span>bar</span>}
+    protected
+      def link_sanitize(input, options = {})
+        module_under_test::LinkSanitizer.new.sanitize(input, options)
+      end
   end
 
-  def test_sanitize_image_src
-    raw = %{src="javascript:bang" <img src="javascript:bang" width="5">foo</img>, <span src="javascript:bang">bar</span>}
-    assert_sanitized raw, %{src="javascript:bang" <img width="5">foo</img>, <span>bar</span>}
+  class HTML4LinkSanitizerTest < Minitest::Test
+    @module_under_test = Rails::HTML4
+    include LinkSanitizerTest
   end
 
-  tags = Loofah::HTML5::SafeList::ALLOWED_ELEMENTS - %w(script form)
-  tags.each do |tag_name|
-    define_method "test_should_allow_#{tag_name}_tag" do
-      scope_allowed_tags(tags) do
-        assert_sanitized "start <#{tag_name} title=\"1\" onclick=\"foo\">foo <bad>bar</bad> baz</#{tag_name}> end", %(start <#{tag_name} title="1">foo bar baz</#{tag_name}> end)
-      end
+  class HTML5LinkSanitizerTest < Minitest::Test
+    @module_under_test = Rails::HTML5
+    include LinkSanitizerTest
+  end if loofah_html5_support?
+
+  module SafeListSanitizerTest
+    include ModuleUnderTest
+
+    def test_sanitize_nested_script
+      assert_equal '&lt;script&gt;alert("XSS");&lt;/script&gt;', safe_list_sanitize('<script><script></script>alert("XSS");<script><</script>/</script><script>script></script>', tags: %w(em))
     end
-  end
 
-  def test_should_allow_anchors
-    assert_sanitized %(<a href="foo" onclick="bar"><script>baz</script></a>), %(<a href=\"foo\">baz</a>)
-  end
+    def test_sanitize_nested_script_in_style
+      input = '<style><script></style>alert("XSS");<style><</style>/</style><style>script></style>'
+      result = safe_list_sanitize(input, tags: %w(em))
+      acceptable_results = [
+        # libxml2
+        %{&lt;script&gt;alert("XSS");&lt;/script&gt;},
+        # xerces+neko. unavoidable double-escaping, see loofah/docs/2022-10-decision-on-cdata-nodes.md
+        %{&amp;lt;script&amp;gt;alert(\"XSS\");&amp;lt;&amp;lt;/style&amp;gt;/script&amp;gt;},
+      ]
+
+      assert_includes(acceptable_results, result)
+    end
 
-  def test_video_poster_sanitization
-    scope_allowed_tags(%w(video)) do
-      scope_allowed_attributes %w(src poster) do
-        assert_sanitized %(<video src="videofile.ogg" autoplay  poster="posterimage.jpg"></video>), %(<video src="videofile.ogg" poster="posterimage.jpg"></video>)
-        assert_sanitized %(<video src="videofile.ogg" poster=javascript:alert(1)></video>), %(<video src="videofile.ogg"></video>)
-      end
+    def test_strip_unclosed_cdata
+      input = "This has an unclosed <![CDATA[<section>]] here..."
+
+      result = safe_list_sanitize(input)
+
+      acceptable_results = [
+        # libxml2 = 2.9.14
+        %{This has an unclosed &lt;![CDATA[]] here...},
+        # other libxml2
+        %{This has an unclosed ]] here...},
+        # xerces+neko
+        %{This has an unclosed }
+      ]
+
+      assert_includes(acceptable_results, result)
     end
-  end
 
-  # RFC 3986, sec 4.2
-  def test_allow_colons_in_path_component
-    assert_sanitized "<a href=\"./this:that\">foo</a>"
-  end
+    def test_sanitize_form
+      assert_sanitized "<form action=\"/foo/bar\" method=\"post\"><input></form>", ""
+    end
 
-  %w(src width height alt).each do |img_attr|
-    define_method "test_should_allow_image_#{img_attr}_attribute" do
-      assert_sanitized %(<img #{img_attr}="foo" onclick="bar" />), %(<img #{img_attr}="foo" />)
+    def test_sanitize_plaintext
+      # note that the `plaintext` tag has been deprecated since HTML 2
+      # https://developer.mozilla.org/en-US/docs/Web/HTML/Element/plaintext
+      input = "<plaintext><span>foo</span></plaintext>"
+      result = safe_list_sanitize(input)
+      acceptable_results = [
+        # libxml2
+        "<span>foo</span>",
+        # xerces+nekohtml-unit
+        "&lt;span&gt;foo&lt;/span&gt;&lt;/plaintext&gt;",
+        # xerces+cyberneko
+        "&lt;span&gt;foo&lt;/span&gt;"
+      ]
+
+      assert_includes(acceptable_results, result)
     end
-  end
 
-  def test_should_handle_non_html
-    assert_sanitized 'abc'
-  end
+    def test_sanitize_script
+      assert_sanitized "a b c<script language=\"Javascript\">blah blah blah</script>d e f", "a b cblah blah blahd e f"
+    end
 
-  def test_should_handle_blank_text
-    [nil, '', '   '].each { |blank| assert_sanitized blank }
-  end
+    def test_sanitize_js_handlers
+      raw = %{onthis="do that" <a href="#" onclick="hello" name="foo" onbogus="remove me">hello</a>}
+      assert_sanitized raw, %{onthis="do that" <a href="#" name="foo">hello</a>}
+    end
 
-  def test_setting_allowed_tags_affects_sanitization
-    scope_allowed_tags %w(u) do |sanitizer|
-      assert_equal '<u></u>', sanitizer.sanitize('<a><u></u></a>')
+    def test_sanitize_javascript_href
+      raw = %{href="javascript:bang" <a href="javascript:bang" name="hello">foo</a>, <span href="javascript:bang">bar</span>}
+      assert_sanitized raw, %{href="javascript:bang" <a name="hello">foo</a>, <span>bar</span>}
     end
-  end
 
-  def test_setting_allowed_attributes_affects_sanitization
-    scope_allowed_attributes %w(foo) do |sanitizer|
-      input = '<a foo="hello" bar="world"></a>'
-      assert_equal '<a foo="hello"></a>', sanitizer.sanitize(input)
+    def test_sanitize_image_src
+      raw = %{src="javascript:bang" <img src="javascript:bang" width="5">foo</img>, <span src="javascript:bang">bar</span>}
+      assert_sanitized raw, %{src="javascript:bang" <img width="5">foo, <span>bar</span>}
     end
-  end
 
-  def test_custom_tags_overrides_allowed_tags
-    scope_allowed_tags %(u) do |sanitizer|
-      input = '<a><u></u></a>'
-      assert_equal '<a></a>', sanitizer.sanitize(input, tags: %w(a))
+    def test_should_allow_anchors
+      assert_sanitized %(<a href="foo" onclick="bar"><script>baz</script></a>), %(<a href=\"foo\">baz</a>)
     end
-  end
 
-  def test_custom_attributes_overrides_allowed_attributes
-    scope_allowed_attributes %(foo) do |sanitizer|
-      input = '<a foo="hello" bar="world"></a>'
-      assert_equal '<a bar="world"></a>', sanitizer.sanitize(input, attributes: %w(bar))
+    def test_video_poster_sanitization
+      scope_allowed_tags(%w(video)) do
+        scope_allowed_attributes %w(src poster) do
+          expected = if RUBY_PLATFORM == "java"
+            # xerces+nekohtml alphabetizes the attributes! FML.
+            %(<video poster="posterimage.jpg" src="videofile.ogg"></video>)
+          else
+            %(<video src="videofile.ogg" poster="posterimage.jpg"></video>)
+          end
+          assert_sanitized(
+            %(<video src="videofile.ogg" autoplay  poster="posterimage.jpg"></video>),
+            expected,
+          )
+          assert_sanitized(
+            %(<video src="videofile.ogg" poster=javascript:alert(1)></video>),
+            %(<video src="videofile.ogg"></video>),
+          )
+        end
+      end
     end
-  end
 
-  def test_should_allow_custom_tags
-    text = "<u>foo</u>"
-    assert_equal text, safe_list_sanitize(text, tags: %w(u))
-  end
+    # RFC 3986, sec 4.2
+    def test_allow_colons_in_path_component
+      assert_sanitized "<a href=\"./this:that\">foo</a>"
+    end
 
-  def test_should_allow_only_custom_tags
-    text = "<u>foo</u> with <i>bar</i>"
-    assert_equal "<u>foo</u> with bar", safe_list_sanitize(text, tags: %w(u))
-  end
+    %w(src width height alt).each do |img_attr|
+      define_method "test_should_allow_image_#{img_attr}_attribute" do
+        assert_sanitized %(<img #{img_attr}="foo" onclick="bar" />), %(<img #{img_attr}="foo">)
+      end
+    end
 
-  def test_should_allow_custom_tags_with_attributes
-    text = %(<blockquote cite="http://example.com/">foo</blockquote>)
-    assert_equal text, safe_list_sanitize(text)
-  end
+    def test_lang_and_xml_lang
+      # https://html.spec.whatwg.org/multipage/dom.html#the-lang-and-xml:lang-attributes
+      #
+      # 3.2.6.2 The lang and xml:lang attributes
+      #
+      # ... Authors must not use the lang attribute in the XML namespace on HTML elements in HTML
+      # documents. To ease migration to and from XML, authors may specify an attribute in no namespace
+      # with no prefix and with the literal localname "xml:lang" on HTML elements in HTML documents,
+      # but such attributes must only be specified if a lang attribute in no namespace is also
+      # specified, and both attributes must have the same value when compared in an ASCII
+      # case-insensitive manner.
+      input = expected = "<div lang=\"en\" xml:lang=\"en\">foo</div>"
+      assert_sanitized(input, expected)
+    end
 
-  def test_should_allow_custom_tags_with_custom_attributes
-    text = %(<blockquote foo="bar">Lorem ipsum</blockquote>)
-    assert_equal text, safe_list_sanitize(text, attributes: ['foo'])
-  end
+    def test_should_handle_non_html
+      assert_sanitized "abc"
+    end
 
-  def test_scrub_style_if_style_attribute_option_is_passed
-    input = '<p style="color: #000; background-image: url(http://www.ragingplatypus.com/i/cam-full.jpg);"></p>'
-    actual = safe_list_sanitize(input, attributes: %w(style))
-    assert_includes(['<p style="color: #000;"></p>', '<p style="color:#000;"></p>'], actual)
-  end
+    def test_should_handle_blank_text
+      assert_nil(safe_list_sanitize(nil))
+      assert_equal("", safe_list_sanitize(""))
+      assert_equal("   ", safe_list_sanitize("   "))
+    end
 
-  def test_should_raise_argument_error_if_tags_is_not_enumerable
-    assert_raises ArgumentError do
-      safe_list_sanitize('<a>some html</a>', tags: 'foo')
+    def test_setting_allowed_tags_affects_sanitization
+      scope_allowed_tags %w(u) do |sanitizer|
+        assert_equal "<u></u>", sanitizer.sanitize("<a><u></u></a>")
+      end
     end
-  end
 
-  def test_should_raise_argument_error_if_attributes_is_not_enumerable
-    assert_raises ArgumentError do
-      safe_list_sanitize('<a>some html</a>', attributes: 'foo')
+    def test_setting_allowed_attributes_affects_sanitization
+      scope_allowed_attributes %w(foo) do |sanitizer|
+        input = '<a foo="hello" bar="world"></a>'
+        assert_equal '<a foo="hello"></a>', sanitizer.sanitize(input)
+      end
     end
-  end
 
-  def test_should_not_accept_non_loofah_inheriting_scrubber
-    scrubber = Object.new
-    def scrubber.scrub(node); node.name = 'h1'; end
+    def test_custom_tags_overrides_allowed_tags
+      scope_allowed_tags %(u) do |sanitizer|
+        input = "<a><u></u></a>"
+        assert_equal "<a></a>", sanitizer.sanitize(input, tags: %w(a))
+      end
+    end
 
-    assert_raises Loofah::ScrubberNotFound do
-      safe_list_sanitize('<a>some html</a>', scrubber: scrubber)
+    def test_custom_attributes_overrides_allowed_attributes
+      scope_allowed_attributes %(foo) do |sanitizer|
+        input = '<a foo="hello" bar="world"></a>'
+        assert_equal '<a bar="world"></a>', sanitizer.sanitize(input, attributes: %w(bar))
+      end
     end
-  end
 
-  def test_should_accept_loofah_inheriting_scrubber
-    scrubber = Loofah::Scrubber.new
-    def scrubber.scrub(node); node.name = 'h1'; end
+    def test_should_allow_prune
+      sanitizer = module_under_test::SafeListSanitizer.new(prune: true)
+      text = "<u>leave me <b>now</b></u>"
+      assert_equal "<u>leave me </u>", sanitizer.sanitize(text, tags: %w(u))
+    end
 
-    html = "<script>hello!</script>"
-    assert_equal "<h1>hello!</h1>", safe_list_sanitize(html, scrubber: scrubber)
-  end
+    def test_should_allow_custom_tags
+      text = "<u>foo</u>"
+      assert_equal text, safe_list_sanitize(text, tags: %w(u))
+    end
 
-  def test_should_accept_loofah_scrubber_that_wraps_a_block
-    scrubber = Loofah::Scrubber.new { |node| node.name = 'h1' }
-    html = "<script>hello!</script>"
-    assert_equal "<h1>hello!</h1>", safe_list_sanitize(html, scrubber: scrubber)
-  end
+    def test_should_allow_only_custom_tags
+      text = "<u>foo</u> with <i>bar</i>"
+      assert_equal "<u>foo</u> with bar", safe_list_sanitize(text, tags: %w(u))
+    end
 
-  def test_custom_scrubber_takes_precedence_over_other_options
-    scrubber = Loofah::Scrubber.new { |node| node.name = 'h1' }
-    html = "<script>hello!</script>"
-    assert_equal "<h1>hello!</h1>", safe_list_sanitize(html, scrubber: scrubber, tags: ['foo'])
-  end
+    def test_should_allow_custom_tags_with_attributes
+      text = %(<blockquote cite="http://example.com/">foo</blockquote>)
+      assert_equal text, safe_list_sanitize(text)
+    end
 
-  [%w(img src), %w(a href)].each do |(tag, attr)|
-    define_method "test_should_strip_#{attr}_attribute_in_#{tag}_with_bad_protocols" do
-      assert_sanitized %(<#{tag} #{attr}="javascript:bang" title="1">boo</#{tag}>), %(<#{tag} title="1">boo</#{tag}>)
+    def test_should_allow_custom_tags_with_custom_attributes
+      text = %(<blockquote foo="bar">Lorem ipsum</blockquote>)
+      assert_equal text, safe_list_sanitize(text, attributes: ["foo"])
     end
-  end
 
-  def test_should_block_script_tag
-    assert_sanitized %(<SCRIPT\nSRC=http://ha.ckers.org/xss.js></SCRIPT>), ""
-  end
+    def test_scrub_style_if_style_attribute_option_is_passed
+      input = '<p style="color: #000; background-image: url(http://www.ragingplatypus.com/i/cam-full.jpg);"></p>'
+      actual = safe_list_sanitize(input, attributes: %w(style))
 
-  def test_should_not_fall_for_xss_image_hack_with_uppercase_tags
-    assert_sanitized %(<IMG """><SCRIPT>alert("XSS")</SCRIPT>">), %(<img>alert("XSS")"&gt;)
-  end
+      assert_includes(['<p style="color: #000;"></p>', '<p style="color:#000;"></p>'], actual)
+    end
 
-  [%(<IMG SRC="javascript:alert('XSS');">),
-   %(<IMG SRC=javascript:alert('XSS')>),
-   %(<IMG SRC=JaVaScRiPt:alert('XSS')>),
-   %(<IMG SRC=javascript:alert(&quot;XSS&quot;)>),
-   %(<IMG SRC=javascript:alert(String.fromCharCode(88,83,83))>),
-   %(<IMG SRC=&#106;&#97;&#118;&#97;&#115;&#99;&#114;&#105;&#112;&#116;&#58;&#97;&#108;&#101;&#114;&#116;&#40;&#39;&#88;&#83;&#83;&#39;&#41;>),
-   %(<IMG SRC=&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041>),
-   %(<IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29>),
-   %(<IMG SRC="jav\tascript:alert('XSS');">),
-   %(<IMG SRC="jav&#x09;ascript:alert('XSS');">),
-   %(<IMG SRC="jav&#x0A;ascript:alert('XSS');">),
-   %(<IMG SRC="jav&#x0D;ascript:alert('XSS');">),
-   %(<IMG SRC=" &#14;  javascript:alert('XSS');">),
-   %(<IMG SRC="javascript&#x3a;alert('XSS');">),
-   %(<IMG SRC=`javascript:alert("RSnake says, 'XSS'")`>)].each do |img_hack|
-    define_method "test_should_not_fall_for_xss_image_hack_#{img_hack}" do
-      assert_sanitized img_hack, "<img>"
+    def test_should_raise_argument_error_if_tags_is_not_enumerable
+      assert_raises ArgumentError do
+        safe_list_sanitize("<a>some html</a>", tags: "foo")
+      end
     end
-  end
 
-  def test_should_sanitize_tag_broken_up_by_null
-    assert_sanitized %(<SCR\0IPT>alert(\"XSS\")</SCR\0IPT>), ""
-  end
+    def test_should_raise_argument_error_if_attributes_is_not_enumerable
+      assert_raises ArgumentError do
+        safe_list_sanitize("<a>some html</a>", attributes: "foo")
+      end
+    end
 
-  def test_should_sanitize_invalid_script_tag
-    assert_sanitized %(<SCRIPT/XSS SRC="http://ha.ckers.org/xss.js"></SCRIPT>), ""
-  end
+    def test_should_not_accept_non_loofah_inheriting_scrubber
+      scrubber = Object.new
+      def scrubber.scrub(node); node.name = "h1"; end
 
-  def test_should_sanitize_script_tag_with_multiple_open_brackets
-    assert_sanitized %(<<SCRIPT>alert("XSS");//<</SCRIPT>), "&lt;alert(\"XSS\");//&lt;"
-    assert_sanitized %(<iframe src=http://ha.ckers.org/scriptlet.html\n<a), ""
-  end
+      assert_raises Loofah::ScrubberNotFound do
+        safe_list_sanitize("<a>some html</a>", scrubber: scrubber)
+      end
+    end
 
-  def test_should_sanitize_unclosed_script
-    assert_sanitized %(<SCRIPT SRC=http://ha.ckers.org/xss.js?<B>), ""
-  end
+    def test_should_accept_loofah_inheriting_scrubber
+      scrubber = Loofah::Scrubber.new
+      def scrubber.scrub(node); node.replace("<h1>#{node.inner_html}</h1>"); end
 
-  def test_should_sanitize_half_open_scripts
-    assert_sanitized %(<IMG SRC="javascript:alert('XSS')"), "<img>"
-  end
+      html = "<script>hello!</script>"
+      assert_equal "<h1>hello!</h1>", safe_list_sanitize(html, scrubber: scrubber)
+    end
 
-  def test_should_not_fall_for_ridiculous_hack
-    img_hack = %(<IMG\nSRC\n=\n"\nj\na\nv\na\ns\nc\nr\ni\np\nt\n:\na\nl\ne\nr\nt\n(\n'\nX\nS\nS\n'\n)\n"\n>)
-    assert_sanitized img_hack, "<img>"
-  end
+    def test_should_accept_loofah_scrubber_that_wraps_a_block
+      scrubber = Loofah::Scrubber.new { |node| node.replace("<h1>#{node.inner_html}</h1>") }
+      html = "<script>hello!</script>"
+      assert_equal "<h1>hello!</h1>", safe_list_sanitize(html, scrubber: scrubber)
+    end
 
-  def test_should_sanitize_attributes
-    assert_sanitized %(<SPAN title="'><script>alert()</script>">blah</SPAN>), %(<span title="#{CGI.escapeHTML "'><script>alert()</script>"}">blah</span>)
-  end
+    def test_custom_scrubber_takes_precedence_over_other_options
+      scrubber = Loofah::Scrubber.new { |node| node.replace("<h1>#{node.inner_html}</h1>") }
+      html = "<script>hello!</script>"
+      assert_equal "<h1>hello!</h1>", safe_list_sanitize(html, scrubber: scrubber, tags: ["foo"])
+    end
 
-  def test_should_sanitize_illegal_style_properties
-    raw      = %(display:block; position:absolute; left:0; top:0; width:100%; height:100%; z-index:1; background-color:black; background-image:url(http://www.ragingplatypus.com/i/cam-full.jpg); background-x:center; background-y:center; background-repeat:repeat;)
-    expected = %(display:block;width:100%;height:100%;background-color:black;background-x:center;background-y:center;)
-    assert_equal expected, sanitize_css(raw)
-  end
+    def test_should_strip_src_attribute_in_img_with_bad_protocols
+      assert_sanitized %(<img src="javascript:bang" title="1">), %(<img title="1">)
+    end
 
-  def test_should_sanitize_with_trailing_space
-    raw = "display:block; "
-    expected = "display:block;"
-    assert_equal expected, sanitize_css(raw)
-  end
+    def test_should_strip_href_attribute_in_a_with_bad_protocols
+      assert_sanitized %(<a href="javascript:bang" title="1">boo</a>), %(<a title="1">boo</a>)
+    end
 
-  def test_should_sanitize_xul_style_attributes
-    raw = %(-moz-binding:url('http://ha.ckers.org/xssmoz.xml#xss'))
-    assert_equal '', sanitize_css(raw)
-  end
+    def test_should_block_script_tag
+      assert_sanitized %(<SCRIPT\nSRC=http://ha.ckers.org/xss.js></SCRIPT>), ""
+    end
 
-  def test_should_sanitize_invalid_tag_names
-    assert_sanitized(%(a b c<script/XSS src="http://ha.ckers.org/xss.js"></script>d e f), "a b cd e f")
-  end
+    def test_should_not_fall_for_xss_image_hack_with_uppercase_tags
+      assert_sanitized %(<IMG """><SCRIPT>alert("XSS")</SCRIPT>">), %(<img>alert("XSS")"&gt;)
+    end
 
-  def test_should_sanitize_non_alpha_and_non_digit_characters_in_tags
-    assert_sanitized('<a onclick!#$%&()*~+-_.,:;?@[/|\]^`=alert("XSS")>foo</a>', "<a>foo</a>")
-  end
+    [%(<IMG SRC="javascript:alert('XSS');">),
+     %(<IMG SRC=javascript:alert('XSS')>),
+     %(<IMG SRC=JaVaScRiPt:alert('XSS')>),
+     %(<IMG SRC=javascript:alert(&quot;XSS&quot;)>),
+     %(<IMG SRC=javascript:alert(String.fromCharCode(88,83,83))>),
+     %(<IMG SRC=&#106;&#97;&#118;&#97;&#115;&#99;&#114;&#105;&#112;&#116;&#58;&#97;&#108;&#101;&#114;&#116;&#40;&#39;&#88;&#83;&#83;&#39;&#41;>),
+     %(<IMG SRC=&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041>),
+     %(<IMG SRC=&#x6A&#x61&#x76&#x61&#x73&#x63&#x72&#x69&#x70&#x74&#x3A&#x61&#x6C&#x65&#x72&#x74&#x28&#x27&#x58&#x53&#x53&#x27&#x29>),
+     %(<IMG SRC="jav\tascript:alert('XSS');">),
+     %(<IMG SRC="jav&#x09;ascript:alert('XSS');">),
+     %(<IMG SRC="jav&#x0A;ascript:alert('XSS');">),
+     %(<IMG SRC="jav&#x0D;ascript:alert('XSS');">),
+     %(<IMG SRC=" &#14;  javascript:alert('XSS');">),
+     %(<IMG SRC="javascript&#x3a;alert('XSS');">),
+     %(<IMG SRC=`javascript:alert("RSnake says, 'XSS'")`>)].each do |img_hack|
+      define_method "test_should_not_fall_for_xss_image_hack_#{img_hack}" do
+        assert_sanitized img_hack, "<img>"
+      end
+    end
 
-  def test_should_sanitize_invalid_tag_names_in_single_tags
-    assert_sanitized('<img/src="http://ha.ckers.org/xss.js"/>', "<img />")
-  end
+    def test_should_sanitize_tag_broken_up_by_null
+      input = %(<SCR\0IPT>alert(\"XSS\")</SCR\0IPT>)
+      result = safe_list_sanitize(input)
+      acceptable_results = [
+        # libxml2
+        "",
+        # xerces+neko
+        'alert("XSS")',
+      ]
+
+      assert_includes(acceptable_results, result)
+    end
 
-  def test_should_sanitize_img_dynsrc_lowsrc
-    assert_sanitized(%(<img lowsrc="javascript:alert('XSS')" />), "<img />")
-  end
+    def test_should_sanitize_invalid_script_tag
+      assert_sanitized %(<SCRIPT/XSS SRC="http://ha.ckers.org/xss.js"></SCRIPT>), ""
+    end
 
-  def test_should_sanitize_div_background_image_unicode_encoded
-    [
-      convert_to_css_hex("url(javascript:alert(1))", false),
-      convert_to_css_hex("url(javascript:alert(1))", true),
-      convert_to_css_hex("url(https://example.com)", false),
-      convert_to_css_hex("url(https://example.com)", true),
-    ].each do |propval|
-      raw = "background-image:" + propval
-      assert_empty(sanitize_css(raw))
+    def test_should_sanitize_script_tag_with_multiple_open_brackets
+      assert_sanitized %(<<SCRIPT>alert("XSS");//<</SCRIPT>), "&lt;alert(\"XSS\");//&lt;"
     end
-  end
 
-  def test_should_allow_div_background_image_unicode_encoded_safe_functions
-    [
-      convert_to_css_hex("rgb(255,0,0)", false),
-      convert_to_css_hex("rgb(255,0,0)", true),
-    ].each do |propval|
-      raw = "background-image:" + propval
-      assert_includes(sanitize_css(raw), "background-image")
+    def test_should_sanitize_script_tag_with_multiple_open_brackets_2
+      input = %(<iframe src=http://ha.ckers.org/scriptlet.html\n<a)
+      result = safe_list_sanitize(input)
+      acceptable_results = [
+        # libxml2
+        "",
+        # xerces+neko
+        "&lt;a",
+      ]
+
+      assert_includes(acceptable_results, result)
     end
-  end
 
-  def test_should_sanitize_div_style_expression
-    raw = %(width: expression(alert('XSS'));)
-    assert_equal '', sanitize_css(raw)
-  end
+    def test_should_sanitize_unclosed_script
+      assert_sanitized %(<SCRIPT SRC=http://ha.ckers.org/xss.js?<B>), ""
+    end
 
-  def test_should_sanitize_across_newlines
-    raw = %(\nwidth:\nexpression(alert('XSS'));\n)
-    assert_equal '', sanitize_css(raw)
-  end
+    def test_should_sanitize_half_open_scripts
+      input = %(<IMG SRC="javascript:alert('XSS')")
+      result = safe_list_sanitize(input)
+      acceptable_results = [
+        # libxml2
+        "<img>",
+        # libgumbo
+        "",
+      ]
+
+      assert_includes(acceptable_results, result)
+    end
 
-  def test_should_sanitize_img_vbscript
-    assert_sanitized %(<img src='vbscript:msgbox("XSS")' />), '<img />'
-  end
+    def test_should_not_fall_for_ridiculous_hack
+      img_hack = %(<IMG\nSRC\n=\n"\nj\na\nv\na\ns\nc\nr\ni\np\nt\n:\na\nl\ne\nr\nt\n(\n'\nX\nS\nS\n'\n)\n"\n>)
+      assert_sanitized img_hack, "<img>"
+    end
 
-  def test_should_sanitize_cdata_section
-    input = "<![CDATA[<span>section</span>]]>"
-    expected = libxml_2_9_14_recovery_lt_bang? ? %{&lt;![CDATA[<span>section</span>]]&gt;} : %{section]]&gt;}
-    assert_sanitized(input, expected)
-  end
+    def test_should_sanitize_attributes
+      input = %(<SPAN title="'><script>alert()</script>">blah</SPAN>)
+      result = safe_list_sanitize(input)
+      acceptable_results = [
+        # libxml2
+        %(<span title="'&gt;&lt;script&gt;alert()&lt;/script&gt;">blah</span>),
+        # libgumbo
+        # this looks scary, but it's fine. for a more detailed analysis check out:
+        # https://github.com/discourse/discourse/pull/21522#issuecomment-1545697968
+        %(<span title="'><script>alert()</script>">blah</span>)
+      ]
+
+      assert_includes(acceptable_results, result)
+    end
 
-  def test_should_sanitize_unterminated_cdata_section
-    input = "<![CDATA[<span>neverending..."
-    expected = libxml_2_9_14_recovery_lt_bang? ? %{&lt;![CDATA[<span>neverending...</span>} : %{neverending...}
-    assert_sanitized(input, expected)
-  end
+    def test_should_sanitize_invalid_tag_names
+      assert_sanitized(%(a b c<script/XSS src="http://ha.ckers.org/xss.js"></script>d e f), "a b cd e f")
+    end
 
-  def test_should_not_mangle_urls_with_ampersand
-     assert_sanitized %{<a href=\"http://www.domain.com?var1=1&amp;var2=2\">my link</a>}
-  end
+    def test_should_sanitize_non_alpha_and_non_digit_characters_in_tags
+      assert_sanitized('<a onclick!#$%&()*~+-_.,:;?@[/|\]^`=alert("XSS")>foo</a>', "<a>foo</a>")
+    end
 
-  def test_should_sanitize_neverending_attribute
-    assert_sanitized "<span class=\"\\", "<span class=\"\\\">"
-  end
+    def test_should_sanitize_invalid_tag_names_in_single_tags
+      input = %(<img/src="http://ha.ckers.org/xss.js"/>)
+      result = safe_list_sanitize(input)
+      acceptable_results = [
+        # libxml2
+        "<img>",
+        # libgumbo
+        %(<img src="http://ha.ckers.org/xss.js">),
+      ]
+
+      assert_includes(acceptable_results, result)
+    end
 
-  [
-    %(<a href="javascript&#x3a;alert('XSS');">),
-    %(<a href="javascript&#x003a;alert('XSS');">),
-    %(<a href="javascript&#x3A;alert('XSS');">),
-    %(<a href="javascript&#x003A;alert('XSS');">)
-  ].each_with_index do |enc_hack, i|
-    define_method "test_x03a_handling_#{i+1}" do
-      assert_sanitized enc_hack, "<a>"
+    def test_should_sanitize_img_dynsrc_lowsrc
+      assert_sanitized(%(<img lowsrc="javascript:alert('XSS')" />), "<img>")
     end
-  end
 
-  def test_x03a_legitimate
-    assert_sanitized %(<a href="http&#x3a;//legit">), %(<a href="http://legit">)
-    assert_sanitized %(<a href="http&#x3A;//legit">), %(<a href="http://legit">)
-  end
+    def test_should_sanitize_img_vbscript
+      assert_sanitized %(<img src='vbscript:msgbox("XSS")' />), "<img>"
+    end
 
-  def test_sanitize_ascii_8bit_string
-    safe_list_sanitize('<a>hello</a>'.encode('ASCII-8BIT')).tap do |sanitized|
-      assert_equal '<a>hello</a>', sanitized
-      assert_equal Encoding::UTF_8, sanitized.encoding
+    def test_should_sanitize_cdata_section
+      input = "<![CDATA[<span>section</span>]]>"
+      result = safe_list_sanitize(input)
+      acceptable_results = [
+        # libxml2 = 2.9.14
+        %{&lt;![CDATA[<span>section</span>]]&gt;},
+        # other libxml2
+        %{section]]&gt;},
+        # xerces+neko
+        "",
+      ]
+
+      assert_includes(acceptable_results, result)
     end
-  end
 
-  def test_sanitize_data_attributes
-    assert_sanitized %(<a href="/blah" data-method="post">foo</a>), %(<a href="/blah">foo</a>)
-    assert_sanitized %(<a data-remote="true" data-type="script" data-method="get" data-cross-domain="true" href="attack.js">Launch the missiles</a>), %(<a href="attack.js">Launch the missiles</a>)
-  end
+    def test_should_sanitize_unterminated_cdata_section
+      input = "<![CDATA[<span>neverending..."
+      result = safe_list_sanitize(input)
 
-  def test_allow_data_attribute_if_requested
-    text = %(<a data-foo="foo">foo</a>)
-    assert_equal %(<a data-foo="foo">foo</a>), safe_list_sanitize(text, attributes: ['data-foo'])
-  end
+      acceptable_results = [
+        # libxml2 = 2.9.14
+        %{&lt;![CDATA[<span>neverending...</span>},
+        # other libxml2
+        %{neverending...},
+        # xerces+neko
+        ""
+      ]
 
-  def test_uri_escaping_of_href_attr_in_a_tag_in_safe_list_sanitizer
-    skip if RUBY_VERSION < "2.3"
+      assert_includes(acceptable_results, result)
+    end
 
-    html = %{<a href='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
+    def test_should_not_mangle_urls_with_ampersand
+      assert_sanitized %{<a href=\"http://www.domain.com?var1=1&amp;var2=2\">my link</a>}
+    end
 
-    text = safe_list_sanitize(html)
+    def test_should_sanitize_neverending_attribute
+      # note that assert_dom_equal chokes in this case! so avoid using assert_sanitized
+      assert_equal("<span class=\"\\\"></span>", safe_list_sanitize("<span class=\"\\\">"))
+    end
 
-    acceptable_results = [
-      # nokogiri w/vendored+patched libxml2
-      %{<a href="examp&lt;!--%22%20unsafeattr=foo()&gt;--&gt;le.com">test</a>},
-      # nokogiri w/ system libxml2
-      %{<a href="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
-    ]
-    assert_includes(acceptable_results, text)
-  end
+    [
+      %(<a href="javascript&#x3a;alert('XSS');">),
+      %(<a href="javascript&#x003a;alert('XSS');">),
+      %(<a href="javascript&#x3A;alert('XSS');">),
+      %(<a href="javascript&#x003A;alert('XSS');">)
+    ].each_with_index do |enc_hack, i|
+      define_method "test_x03a_handling_#{i + 1}" do
+        assert_sanitized enc_hack, "<a></a>"
+      end
+    end
 
-  def test_uri_escaping_of_src_attr_in_a_tag_in_safe_list_sanitizer
-    skip if RUBY_VERSION < "2.3"
+    def test_x03a_legitimate
+      assert_sanitized %(<a href="http&#x3a;//legit">asdf</a>), %(<a href="http://legit">asdf</a>)
+      assert_sanitized %(<a href="http&#x3A;//legit">asdf</a>), %(<a href="http://legit">asdf</a>)
+    end
 
-    html = %{<a src='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
+    def test_sanitize_ascii_8bit_string
+      safe_list_sanitize("<div><a>hello</a></div>".encode("ASCII-8BIT")).tap do |sanitized|
+        assert_equal "<div><a>hello</a></div>", sanitized
+        assert_equal Encoding::UTF_8, sanitized.encoding
+      end
+    end
 
-    text = safe_list_sanitize(html)
+    def test_sanitize_data_attributes
+      assert_sanitized %(<a href="/blah" data-method="post">foo</a>), %(<a href="/blah">foo</a>)
+      assert_sanitized %(<a data-remote="true" data-type="script" data-method="get" data-cross-domain="true" href="attack.js">Launch the missiles</a>), %(<a href="attack.js">Launch the missiles</a>)
+    end
 
-    acceptable_results = [
-      # nokogiri w/vendored+patched libxml2
-      %{<a src="examp&lt;!--%22%20unsafeattr=foo()&gt;--&gt;le.com">test</a>},
-      # nokogiri w/system libxml2
-      %{<a src="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
-    ]
-    assert_includes(acceptable_results, text)
-  end
+    def test_allow_data_attribute_if_requested
+      text = %(<a data-foo="foo">foo</a>)
+      assert_equal %(<a data-foo="foo">foo</a>), safe_list_sanitize(text, attributes: ["data-foo"])
+    end
 
-  def test_uri_escaping_of_name_attr_in_a_tag_in_safe_list_sanitizer
-    skip if RUBY_VERSION < "2.3"
+    # https://developer.mozilla.org/en-US/docs/Glossary/Void_element
+    VOID_ELEMENTS = %w[area base br col embed hr img input keygen link meta param source track wbr]
+
+    %w(strong em b i p code pre tt samp kbd var sub
+       sup dfn cite big small address hr br div span h1 h2 h3 h4 h5 h6 ul ol li dl dt dd abbr
+       acronym a img blockquote del ins time).each do |tag_name|
+      define_method "test_default_safelist_should_allow_#{tag_name}" do
+        if VOID_ELEMENTS.include?(tag_name)
+          assert_sanitized("<#{tag_name}>")
+        else
+          assert_sanitized("<#{tag_name}>foo</#{tag_name}>")
+        end
+      end
+    end
 
-    html = %{<a name='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
+    def test_datetime_attribute
+      assert_sanitized("<time datetime=\"2023-01-01\">Today</time>")
+    end
 
-    text = safe_list_sanitize(html)
+    def test_abbr_attribute
+      scope_allowed_tags(%w(table tr th td)) do
+        assert_sanitized(%(<table><tr><td abbr="UK">United Kingdom</td></tr></table>))
+      end
+    end
 
-    acceptable_results = [
-      # nokogiri w/vendored+patched libxml2
-      %{<a name="examp&lt;!--%22%20unsafeattr=foo()&gt;--&gt;le.com">test</a>},
-      # nokogiri w/system libxml2
-      %{<a name="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
-    ]
-    assert_includes(acceptable_results, text)
-  end
+    def test_uri_escaping_of_href_attr_in_a_tag_in_safe_list_sanitizer
+      skip if RUBY_VERSION < "2.3"
 
-  def test_uri_escaping_of_name_action_in_a_tag_in_safe_list_sanitizer
-    skip if RUBY_VERSION < "2.3"
+      html = %{<a href='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
 
-    html = %{<a action='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
+      text = safe_list_sanitize(html)
 
-    text = safe_list_sanitize(html, attributes: ['action'])
+      acceptable_results = [
+        # nokogiri's vendored+patched libxml2 (0002-Update-entities-to-remove-handling-of-ssi.patch)
+        %{<a href="examp&lt;!--%22%20unsafeattr=foo()&gt;--&gt;le.com">test</a>},
+        # system libxml2
+        %{<a href="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
+        # xerces+neko
+        %{<a href="examp&lt;!--%22 unsafeattr=foo()&gt;--&gt;le.com">test</a>}
+      ]
 
-    acceptable_results = [
-      # nokogiri w/vendored+patched libxml2
-      %{<a action="examp&lt;!--%22%20unsafeattr=foo()&gt;--&gt;le.com">test</a>},
-      # nokogiri w/system libxml2
-      %{<a action="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
-    ]
-    assert_includes(acceptable_results, text)
-  end
+      assert_includes(acceptable_results, text)
+    end
 
-  def test_exclude_node_type_processing_instructions
-    assert_equal("<div>text</div><b>text</b>", safe_list_sanitize("<div>text</div><?div content><b>text</b>"))
-  end
+    def test_uri_escaping_of_src_attr_in_a_tag_in_safe_list_sanitizer
+      skip if RUBY_VERSION < "2.3"
 
-  def test_exclude_node_type_comment
-    assert_equal("<div>text</div><b>text</b>", safe_list_sanitize("<div>text</div><!-- comment --><b>text</b>"))
-  end
+      html = %{<a src='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
 
-  %w[text/plain text/css image/png image/gif image/jpeg].each do |mediatype|
-    define_method "test_mediatype_#{mediatype}_allowed" do
-      input = %Q(<img src="data:#{mediatype};base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
-      expected = input
-      actual = safe_list_sanitize(input)
-      assert_equal(expected, actual)
+      text = safe_list_sanitize(html)
 
-      input = %Q(<img src="DATA:#{mediatype};base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
-      expected = input
-      actual = safe_list_sanitize(input)
-      assert_equal(expected, actual)
+      acceptable_results = [
+        # nokogiri's vendored+patched libxml2 (0002-Update-entities-to-remove-handling-of-ssi.patch)
+        %{<a src="examp&lt;!--%22%20unsafeattr=foo()&gt;--&gt;le.com">test</a>},
+        # system libxml2
+        %{<a src="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
+        # xerces+neko
+        %{<a src="examp&lt;!--%22 unsafeattr=foo()&gt;--&gt;le.com">test</a>}
+      ]
+
+      assert_includes(acceptable_results, text)
     end
-  end
 
-  def test_mediatype_text_html_disallowed
-    input = %q(<img src="data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
-    expected = %q(<img>)
-    actual = safe_list_sanitize(input)
-    assert_equal(expected, actual)
+    def test_uri_escaping_of_name_attr_in_a_tag_in_safe_list_sanitizer
+      skip if RUBY_VERSION < "2.3"
 
-    input = %q(<img src="DATA:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
-    expected = %q(<img>)
-    actual = safe_list_sanitize(input)
-    assert_equal(expected, actual)
-  end
+      html = %{<a name='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
 
-  def test_mediatype_image_svg_xml_disallowed
-    input = %q(<img src="data:image/svg+xml;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
-    expected = %q(<img>)
-    actual = safe_list_sanitize(input)
-    assert_equal(expected, actual)
+      text = safe_list_sanitize(html)
 
-    input = %q(<img src="DATA:image/svg+xml;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
-    expected = %q(<img>)
-    actual = safe_list_sanitize(input)
-    assert_equal(expected, actual)
-  end
+      acceptable_results = [
+        # nokogiri's vendored+patched libxml2 (0002-Update-entities-to-remove-handling-of-ssi.patch)
+        %{<a name="examp&lt;!--%22%20unsafeattr=foo()&gt;--&gt;le.com">test</a>},
+        # system libxml2
+        %{<a name="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
+        # xerces+neko
+        %{<a name="examp&lt;!--%22 unsafeattr=foo()&gt;--&gt;le.com">test</a>}
+      ]
 
-  def test_mediatype_other_disallowed
-    input = %q(<a href="data:foo;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">foo</a>)
-    expected = %q(<a>foo</a>)
-    actual = safe_list_sanitize(input)
-    assert_equal(expected, actual)
+      assert_includes(acceptable_results, text)
+    end
 
-    input = %q(<a href="DATA:foo;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">foo</a>)
-    expected = %q(<a>foo</a>)
-    actual = safe_list_sanitize(input)
-    assert_equal(expected, actual)
-  end
+    def test_uri_escaping_of_name_action_in_a_tag_in_safe_list_sanitizer
+      skip if RUBY_VERSION < "2.3"
+
+      html = %{<a action='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
+
+      text = safe_list_sanitize(html, attributes: ["action"])
+
+      acceptable_results = [
+        # nokogiri's vendored+patched libxml2 (0002-Update-entities-to-remove-handling-of-ssi.patch)
+        %{<a action="examp&lt;!--%22%20unsafeattr=foo()&gt;--&gt;le.com">test</a>},
+        # system libxml2
+        %{<a action="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
+        # xerces+neko
+        %{<a action="examp&lt;!--%22 unsafeattr=foo()&gt;--&gt;le.com">test</a>},
+      ]
 
-  def test_scrubbing_svg_attr_values_that_allow_ref
-    input = %Q(<div fill="yellow url(http://bad.com/) #fff">hey</div>)
-    expected = %Q(<div fill="yellow #fff">hey</div>)
-    actual = scope_allowed_attributes %w(fill) do
-      safe_list_sanitize(input)
+      assert_includes(acceptable_results, text)
     end
 
-    assert_equal(expected, actual)
-  end
+    def test_exclude_node_type_processing_instructions
+      input = "<div>text</div><?div content><b>text</b>"
+      result = safe_list_sanitize(input)
+      acceptable_results = [
+        # jruby cyberneko (nokogiri < 1.14.0)
+        "<div>text</div>",
+        # everything else
+        "<div>text</div><b>text</b>",
+      ]
+
+      assert_includes(acceptable_results, result)
+    end
 
-  def test_style_with_css_payload
-    input, tags = "<style>div > span { background: \"red\"; }</style>", ["style"]
-    expected = "<style>div &gt; span { background: \"red\"; }</style>"
-    actual = safe_list_sanitize(input, tags: tags)
+    def test_exclude_node_type_comment
+      assert_equal("<div>text</div><b>text</b>", safe_list_sanitize("<div>text</div><!-- comment --><b>text</b>"))
+    end
 
-    assert_equal(expected, actual)
-  end
+    %w[text/plain text/css image/png image/gif image/jpeg].each do |mediatype|
+      define_method "test_mediatype_#{mediatype}_allowed" do
+        input = %Q(<img src="data:#{mediatype};base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
+        expected = input
+        actual = safe_list_sanitize(input)
+        assert_equal(expected, actual)
+
+        input = %Q(<img src="DATA:#{mediatype};base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
+        expected = input
+        actual = safe_list_sanitize(input)
+        assert_equal(expected, actual)
+      end
+    end
 
-  def test_combination_of_select_and_style_with_css_payload
-    input, tags = "<select><style>div > span { background: \"red\"; }</style></select>", ["select", "style"]
-    expected = "<select><style>div &gt; span { background: \"red\"; }</style></select>"
-    actual = safe_list_sanitize(input, tags: tags)
+    def test_mediatype_text_html_disallowed
+      input = '<img src="data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">'
+      expected = "<img>"
+      actual = safe_list_sanitize(input)
+      assert_equal(expected, actual)
 
-    assert_equal(expected, actual)
-  end
+      input = '<img src="DATA:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">'
+      expected = "<img>"
+      actual = safe_list_sanitize(input)
+      assert_equal(expected, actual)
+    end
 
-  def test_combination_of_select_and_style_with_script_payload
-    input, tags = "<select><style><script>alert(1)</script></style></select>", ["select", "style"]
-    expected = "<select><style>&lt;script&gt;alert(1)&lt;/script&gt;</style></select>"
-    actual = safe_list_sanitize(input, tags: tags)
+    def test_mediatype_image_svg_xml_disallowed
+      input = '<img src="data:image/svg+xml;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">'
+      expected = "<img>"
+      actual = safe_list_sanitize(input)
+      assert_equal(expected, actual)
 
-    assert_equal(expected, actual)
-  end
+      input = '<img src="DATA:image/svg+xml;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">'
+      expected = "<img>"
+      actual = safe_list_sanitize(input)
+      assert_equal(expected, actual)
+    end
 
-  def test_combination_of_svg_and_style_with_script_payload
-    input, tags = "<svg><style><script>alert(1)</script></style></svg>", ["svg", "style"]
-    expected = "<svg><style>&lt;script&gt;alert(1)&lt;/script&gt;</style></svg>"
-    actual = safe_list_sanitize(input, tags: tags)
+    def test_mediatype_other_disallowed
+      input = '<a href="data:foo;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">foo</a>'
+      expected = "<a>foo</a>"
+      actual = safe_list_sanitize(input)
+      assert_equal(expected, actual)
 
-    assert_equal(expected, actual)
-  end
+      input = '<a href="DATA:foo;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">foo</a>'
+      expected = "<a>foo</a>"
+      actual = safe_list_sanitize(input)
+      assert_equal(expected, actual)
+    end
 
-  def test_combination_of_math_and_style_with_img_payload
-    input, tags = "<math><style><img src=x onerror=alert(1)></style></math>", ["math", "style"]
-    expected = "<math><style>&lt;img src=x onerror=alert(1)&gt;</style></math>"
-    actual = safe_list_sanitize(input, tags: tags)
+    def test_scrubbing_svg_attr_values_that_allow_ref
+      input = '<div fill="yellow url(http://bad.com/) #fff">hey</div>'
+      expected = '<div fill="yellow #fff">hey</div>'
+      actual = scope_allowed_attributes %w(fill) do
+        safe_list_sanitize(input)
+      end
 
-    assert_equal(expected, actual)
+      assert_equal(expected, actual)
+    end
 
-    input, tags = "<math><style><img src=x onerror=alert(1)></style></math>", ["math", "style", "img"]
-    expected = "<math><style>&lt;img src=x onerror=alert(1)&gt;</style></math>"
-    actual = safe_list_sanitize(input, tags: tags)
+    def test_style_with_css_payload
+      input, tags = "<style>div > span { background: \"red\"; }</style>", ["style"]
+      actual = safe_list_sanitize(input, tags: tags)
+      acceptable_results = [
+        # libxml2
+        "<style>div &gt; span { background: \"red\"; }</style>",
+        # libgumbo
+        "<style>div > span { background: \"red\"; }</style>",
+      ]
+
+      assert_includes(acceptable_results, actual)
+    end
 
-    assert_equal(expected, actual)
-  end
+    def test_combination_of_select_and_style_with_css_payload
+      input, tags = "<select><style>div > span { background: \"red\"; }</style></select>", ["select", "style"]
+      actual = safe_list_sanitize(input, tags: tags)
+      acceptable_results = [
+        # libxml2
+        "<select><style>div &gt; span { background: \"red\"; }</style></select>",
+        # libgumbo
+        "<select>div &gt; span { background: \"red\"; }</select>",
+      ]
+
+      assert_includes(acceptable_results, actual)
+    end
 
-  def test_combination_of_svg_and_style_with_img_payload
-    input, tags = "<svg><style><img src=x onerror=alert(1)></style></svg>", ["svg", "style"]
-    expected = "<svg><style>&lt;img src=x onerror=alert(1)&gt;</style></svg>"
-    actual = safe_list_sanitize(input, tags: tags)
+    def test_combination_of_select_and_style_with_script_payload
+      input, tags = "<select><style><script>alert(1)</script></style></select>", ["select", "style"]
+      actual = safe_list_sanitize(input, tags: tags)
+      acceptable_results = [
+        # libxml2
+        "<select><style>&lt;script&gt;alert(1)&lt;/script&gt;</style></select>",
+        # libgumbo
+        "<select>alert(1)</select>",
+      ]
+
+      assert_includes(acceptable_results, actual)
+    end
 
-    assert_equal(expected, actual)
+    def test_combination_of_svg_and_style_with_script_payload
+      input, tags = "<svg><style><script>alert(1)</script></style></svg>", ["svg", "style"]
+      actual = safe_list_sanitize(input, tags: tags)
+      acceptable_results = [
+        # libxml2
+        "<svg><style>&lt;script&gt;alert(1)&lt;/script&gt;</style></svg>",
+        # libgumbo
+        "<svg><style>alert(1)</style></svg>"
+      ]
+
+      assert_includes(acceptable_results, actual)
+    end
 
-    input, tags = "<svg><style><img src=x onerror=alert(1)></style></svg>", ["svg", "style", "img"]
-    expected = "<svg><style>&lt;img src=x onerror=alert(1)&gt;</style></svg>"
-    actual = safe_list_sanitize(input, tags: tags)
+    def test_combination_of_math_and_style_with_img_payload
+      input, tags = "<math><style><img src=x onerror=alert(1)></style></math>", ["math", "style"]
+      actual = safe_list_sanitize(input, tags: tags)
+      acceptable_results = [
+        # libxml2
+        "<math><style>&lt;img src=x onerror=alert(1)&gt;</style></math>",
+        # libgumbo
+        "<math><style></style></math>",
+      ]
+
+      assert_includes(acceptable_results, actual)
+    end
 
-    assert_equal(expected, actual)
-  end
+    def test_combination_of_math_and_style_with_img_payload_2
+      input, tags = "<math><style><img src=x onerror=alert(1)></style></math>", ["math", "style", "img"]
+      actual = safe_list_sanitize(input, tags: tags)
+      acceptable_results = [
+        # libxml2
+        "<math><style>&lt;img src=x onerror=alert(1)&gt;</style></math>",
+        # libgumbo
+        "<math><style></style></math><img src=\"x\">",
+      ]
+
+      assert_includes(acceptable_results, actual)
+    end
 
-protected
+    def test_combination_of_svg_and_style_with_img_payload
+      input, tags = "<svg><style><img src=x onerror=alert(1)></style></svg>", ["svg", "style"]
+      actual = safe_list_sanitize(input, tags: tags)
+      acceptable_results = [
+        # libxml2
+        "<svg><style>&lt;img src=x onerror=alert(1)&gt;</style></svg>",
+        # libgumbo
+        "<svg><style></style></svg>",
+      ]
+
+      assert_includes(acceptable_results, actual)
+    end
 
-  def xpath_sanitize(input, options = {})
-    XpathRemovalTestSanitizer.new.sanitize(input, options)
-  end
+    def test_combination_of_svg_and_style_with_img_payload_2
+      input, tags = "<svg><style><img src=x onerror=alert(1)></style></svg>", ["svg", "style", "img"]
+      actual = safe_list_sanitize(input, tags: tags)
+      acceptable_results = [
+        # libxml2
+        "<svg><style>&lt;img src=x onerror=alert(1)&gt;</style></svg>",
+        # libgumbo
+        "<svg><style></style></svg><img src=\"x\">",
+      ]
+
+      assert_includes(acceptable_results, actual)
+    end
 
-  def full_sanitize(input, options = {})
-    Rails::Html::FullSanitizer.new.sanitize(input, options)
-  end
+    def test_should_sanitize_illegal_style_properties
+      raw      = %(display:block; position:absolute; left:0; top:0; width:100%; height:100%; z-index:1; background-color:black; background-image:url(http://www.ragingplatypus.com/i/cam-full.jpg); background-x:center; background-y:center; background-repeat:repeat;)
+      expected = %(display:block;width:100%;height:100%;background-color:black;background-x:center;background-y:center;)
+      assert_equal expected, sanitize_css(raw)
+    end
 
-  def link_sanitize(input, options = {})
-    Rails::Html::LinkSanitizer.new.sanitize(input, options)
-  end
+    def test_should_sanitize_with_trailing_space
+      raw = "display:block; "
+      expected = "display:block;"
+      assert_equal expected, sanitize_css(raw)
+    end
 
-  def safe_list_sanitize(input, options = {})
-    Rails::Html::SafeListSanitizer.new.sanitize(input, options)
-  end
+    def test_should_sanitize_xul_style_attributes
+      raw = %(-moz-binding:url('http://ha.ckers.org/xssmoz.xml#xss'))
+      assert_equal "", sanitize_css(raw)
+    end
 
-  def assert_sanitized(input, expected = nil)
-    if input
-      assert_dom_equal expected || input, safe_list_sanitize(input)
-    else
-      assert_nil safe_list_sanitize(input)
+    def test_should_sanitize_div_background_image_unicode_encoded
+      [
+        convert_to_css_hex("url(javascript:alert(1))", false),
+        convert_to_css_hex("url(javascript:alert(1))", true),
+        convert_to_css_hex("url(https://example.com)", false),
+        convert_to_css_hex("url(https://example.com)", true),
+      ].each do |propval|
+        raw = "background-image:" + propval
+        assert_empty(sanitize_css(raw))
+      end
     end
-  end
 
-  def sanitize_css(input)
-    Rails::Html::SafeListSanitizer.new.sanitize_css(input)
-  end
+    def test_should_allow_div_background_image_unicode_encoded_safe_functions
+      [
+        convert_to_css_hex("rgb(255,0,0)", false),
+        convert_to_css_hex("rgb(255,0,0)", true),
+      ].each do |propval|
+        raw = "background-image:" + propval
 
-  def scope_allowed_tags(tags)
-    old_tags = Rails::Html::SafeListSanitizer.allowed_tags
-    Rails::Html::SafeListSanitizer.allowed_tags = tags
-    yield Rails::Html::SafeListSanitizer.new
-  ensure
-    Rails::Html::SafeListSanitizer.allowed_tags = old_tags
-  end
+        assert_includes(sanitize_css(raw), "background-image")
+      end
+    end
 
-  def scope_allowed_attributes(attributes)
-    old_attributes = Rails::Html::SafeListSanitizer.allowed_attributes
-    Rails::Html::SafeListSanitizer.allowed_attributes = attributes
-    yield Rails::Html::SafeListSanitizer.new
-  ensure
-    Rails::Html::SafeListSanitizer.allowed_attributes = old_attributes
-  end
+    def test_should_sanitize_div_style_expression
+      raw = %(width: expression(alert('XSS'));)
+      assert_equal "", sanitize_css(raw)
+    end
 
-  # note that this is used for testing CSS hex encoding: \\[0-9a-f]{1,6}
-  def convert_to_css_hex(string, escape_parens=false)
-    string.chars.map do |c|
-      if !escape_parens && (c == "(" || c == ")")
-        c
-      else
-        format('\00%02X', c.ord)
+    def test_should_sanitize_across_newlines
+      raw = %(\nwidth:\nexpression(alert('XSS'));\n)
+      assert_equal "", sanitize_css(raw)
+    end
+
+    protected
+      def safe_list_sanitize(input, options = {})
+        module_under_test::SafeListSanitizer.new.sanitize(input, options)
+      end
+
+      def assert_sanitized(input, expected = nil)
+        assert_equal((expected || input), safe_list_sanitize(input))
       end
-    end.join
-  end
 
-  def libxml_2_9_14_recovery_lt?
-    # changed in 2.9.14, see https://github.com/sparklemotion/nokogiri/releases/tag/v1.13.5
-    Nokogiri.method(:uses_libxml?).arity == -1 && Nokogiri.uses_libxml?(">= 2.9.14")
+      def scope_allowed_tags(tags)
+        old_tags = module_under_test::SafeListSanitizer.allowed_tags
+        module_under_test::SafeListSanitizer.allowed_tags = tags
+        yield module_under_test::SafeListSanitizer.new
+      ensure
+        module_under_test::SafeListSanitizer.allowed_tags = old_tags
+      end
+
+      def scope_allowed_attributes(attributes)
+        old_attributes = module_under_test::SafeListSanitizer.allowed_attributes
+        module_under_test::SafeListSanitizer.allowed_attributes = attributes
+        yield module_under_test::SafeListSanitizer.new
+      ensure
+        module_under_test::SafeListSanitizer.allowed_attributes = old_attributes
+      end
+
+      def sanitize_css(input)
+        module_under_test::SafeListSanitizer.new.sanitize_css(input)
+      end
+
+      # note that this is used for testing CSS hex encoding: \\[0-9a-f]{1,6}
+      def convert_to_css_hex(string, escape_parens = false)
+        string.chars.map do |c|
+          if !escape_parens && (c == "(" || c == ")")
+            c
+          else
+            format('\00%02X', c.ord)
+          end
+        end.join
+      end
   end
 
-  def libxml_2_9_14_recovery_lt_bang?
-    # changed in 2.9.14, see https://github.com/sparklemotion/nokogiri/releases/tag/v1.13.5
-    # then reverted in 2.10.0, see https://gitlab.gnome.org/GNOME/libxml2/-/issues/380
-    Nokogiri.method(:uses_libxml?).arity == -1 && Nokogiri.uses_libxml?("= 2.9.14")
+  class HTML4SafeListSanitizerTest < Minitest::Test
+    @module_under_test = Rails::HTML4
+    include SafeListSanitizerTest
   end
+
+  class HTML5SafeListSanitizerTest < Minitest::Test
+    @module_under_test = Rails::HTML5
+    include SafeListSanitizerTest
+  end if loofah_html5_support?
 end
diff --git a/test/scrubbers_test.rb b/test/scrubbers_test.rb
index a825404..8db2d85 100644
--- a/test/scrubbers_test.rb
+++ b/test/scrubbers_test.rb
@@ -1,11 +1,16 @@
+# frozen_string_literal: true
+
 require "minitest/autorun"
 require "rails-html-sanitizer"
 
 class ScrubberTest < Minitest::Test
   protected
+    def scrub_fragment(html)
+      Loofah.scrub_fragment(html, @scrubber).to_s
+    end
 
     def assert_scrubbed(html, expected = html)
-      output = Loofah.scrub_fragment(html, @scrubber).to_s
+      output = scrub_fragment(html)
       assert_equal expected, output
     end
 
@@ -28,9 +33,8 @@ class ScrubberTest < Minitest::Test
 end
 
 class PermitScrubberTest < ScrubberTest
-
   def setup
-    @scrubber = Rails::Html::PermitScrubber.new
+    @scrubber = Rails::HTML::PermitScrubber.new
   end
 
   def test_responds_to_scrub
@@ -38,44 +42,60 @@ class PermitScrubberTest < ScrubberTest
   end
 
   def test_default_scrub_behavior
-    assert_scrubbed '<tag>hello</tag>', 'hello'
+    assert_scrubbed "<tag>hello</tag>", "hello"
   end
 
   def test_default_scrub_removes_comments
-    assert_scrubbed('<div>one</div><!-- two --><span>three</span>',
-                    '<div>one</div><span>three</span>')
+    assert_scrubbed("<div>one</div><!-- two --><span>three</span>",
+                    "<div>one</div><span>three</span>")
   end
 
   def test_default_scrub_removes_processing_instructions
-    assert_scrubbed('<div>one</div><?div two><span>three</span>',
-                    '<div>one</div><span>three</span>')
+    input = "<div>one</div><?div two><span>three</span>"
+    result = scrub_fragment(input)
+
+    acceptable_results = [
+      # jruby cyberneko (nokogiri < 1.14.0)
+      "<div>one</div>",
+      # everything else
+      "<div>one</div><span>three</span>",
+    ]
+
+    assert_includes(acceptable_results, result)
   end
 
   def test_default_attributes_removal_behavior
-    assert_scrubbed '<p cooler="hello">hello</p>', '<p>hello</p>'
+    assert_scrubbed '<p cooler="hello">hello</p>', "<p>hello</p>"
   end
 
   def test_leaves_supplied_tags
     @scrubber.tags = %w(a)
-    assert_scrubbed '<a>hello</a>'
+    assert_scrubbed "<a>hello</a>"
   end
 
   def test_leaves_only_supplied_tags
-    html = '<tag>leave me <span>now</span></tag>'
+    html = "<tag>leave me <span>now</span></tag>"
     @scrubber.tags = %w(tag)
-    assert_scrubbed html, '<tag>leave me now</tag>'
+    assert_scrubbed html, "<tag>leave me now</tag>"
+  end
+
+  def test_prunes_tags
+    @scrubber = Rails::HTML::PermitScrubber.new(prune: true)
+    @scrubber.tags = %w(tag)
+    html = "<tag>leave me <span>now</span></tag>"
+    assert_scrubbed html, "<tag>leave me </tag>"
   end
 
   def test_leaves_comments_when_supplied_as_tag
     @scrubber.tags = %w(div comment)
-    assert_scrubbed('<div>one</div><!-- two --><span>three</span>',
-                    '<div>one</div><!-- two -->three')
+    assert_scrubbed("<div>one</div><!-- two --><span>three</span>",
+                    "<div>one</div><!-- two -->three")
   end
 
   def test_leaves_only_supplied_tags_nested
-    html = '<tag>leave <em>me <span>now</span></em></tag>'
+    html = "<tag>leave <em>me <span>now</span></em></tag>"
     @scrubber.tags = %w(tag)
-    assert_scrubbed html, '<tag>leave me now</tag>'
+    assert_scrubbed html, "<tag>leave me now</tag>"
   end
 
   def test_leaves_supplied_attributes
@@ -102,16 +122,16 @@ class PermitScrubberTest < ScrubberTest
   end
 
   def test_leaves_text
-    assert_scrubbed('some text')
+    assert_scrubbed("some text")
   end
 
   def test_skips_text_nodes
-    assert_node_skipped('some text')
+    assert_node_skipped("some text")
   end
 
   def test_tags_accessor_validation
     e = assert_raises(ArgumentError) do
-      @scrubber.tags = 'tag'
+      @scrubber.tags = "tag"
     end
 
     assert_equal "You should pass :tags as an Enumerable", e.message
@@ -120,7 +140,7 @@ class PermitScrubberTest < ScrubberTest
 
   def test_attributes_accessor_validation
     e = assert_raises(ArgumentError) do
-      @scrubber.attributes = 'cooler'
+      @scrubber.attributes = "cooler"
     end
 
     assert_equal "You should pass :attributes as an Enumerable", e.message
@@ -130,19 +150,19 @@ end
 
 class TargetScrubberTest < ScrubberTest
   def setup
-    @scrubber = Rails::Html::TargetScrubber.new
+    @scrubber = Rails::HTML::TargetScrubber.new
   end
 
   def test_targeting_tags_removes_only_them
     @scrubber.tags = %w(a h1)
-    html = '<script></script><a></a><h1></h1>'
-    assert_scrubbed html, '<script></script>'
+    html = "<script></script><a></a><h1></h1>"
+    assert_scrubbed html, "<script></script>"
   end
 
   def test_targeting_tags_removes_only_them_nested
     @scrubber.tags = %w(a)
-    html = '<tag><a><tag><a></a></tag></a></tag>'
-    assert_scrubbed html, '<tag><tag></tag></tag>'
+    html = "<tag><a><tag><a></a></tag></a></tag>"
+    assert_scrubbed html, "<tag><tag></tag></tag>"
   end
 
   def test_targeting_attributes_removes_only_them
@@ -157,24 +177,31 @@ class TargetScrubberTest < ScrubberTest
     html = '<tag remove="" other=""></tag><a remove="" other=""></a>'
     assert_scrubbed html, '<a other=""></a>'
   end
+
+  def test_prunes_tags
+    @scrubber = Rails::HTML::TargetScrubber.new(prune: true)
+    @scrubber.tags = %w(span)
+    html = "<tag>leave me <span>now</span></tag>"
+    assert_scrubbed html, "<tag>leave me </tag>"
+  end
 end
 
 class TextOnlyScrubberTest < ScrubberTest
   def setup
-    @scrubber = Rails::Html::TextOnlyScrubber.new
+    @scrubber = Rails::HTML::TextOnlyScrubber.new
   end
 
   def test_removes_all_tags_and_keep_the_content
-    assert_scrubbed '<tag>hello</tag>', 'hello'
+    assert_scrubbed "<tag>hello</tag>", "hello"
   end
 
   def test_skips_text_nodes
-    assert_node_skipped('some text')
+    assert_node_skipped("some text")
   end
 end
 
 class ReturningStopFromScrubNodeTest < ScrubberTest
-  class ScrubStopper < Rails::Html::PermitScrubber
+  class ScrubStopper < Rails::HTML::PermitScrubber
     def scrub_node(node)
       Loofah::Scrubber::STOP
     end
@@ -185,6 +212,6 @@ class ReturningStopFromScrubNodeTest < ScrubberTest
   end
 
   def test_returns_stop_from_scrub_if_scrub_node_does
-    assert_scrub_stopped '<script>remove me</script>'
+    assert_scrub_stopped "<script>remove me</script>"
   end
 end

Debdiff

[The following lists of changes regard files as different if they have different names, permissions or owners.]

Files in second set of .debs but not in first

-rw-r--r--  root/root   /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.6.0/lib/rails-html-sanitizer.rb
-rw-r--r--  root/root   /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.6.0/lib/rails/html/sanitizer.rb
-rw-r--r--  root/root   /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.6.0/lib/rails/html/sanitizer/version.rb
-rw-r--r--  root/root   /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.6.0/lib/rails/html/scrubbers.rb
-rw-r--r--  root/root   /usr/share/rubygems-integration/all/specifications/rails-html-sanitizer-1.6.0.gemspec

Files in first set of .debs but not in second

-rw-r--r--  root/root   /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.4.4/lib/rails-html-sanitizer.rb
-rw-r--r--  root/root   /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.4.4/lib/rails/html/sanitizer.rb
-rw-r--r--  root/root   /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.4.4/lib/rails/html/sanitizer/version.rb
-rw-r--r--  root/root   /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.4.4/lib/rails/html/scrubbers.rb
-rw-r--r--  root/root   /usr/share/rubygems-integration/all/specifications/rails-html-sanitizer-1.4.4.gemspec

Control files: lines which differ (wdiff format)

Depends: ruby-loofah (>= 2.19.1) 2.21), ruby-nokogiri (>= 1.14)

More details

Full run details