New Upstream Release - ruby-rails-html-sanitizer
Ready changes
Summary
Merged new upstream version: 1.6.0 (was: 1.4.4).
Resulting package
Built on 2023-06-27T08:13 (took 4m34s)
The resulting binary packages can be installed (if you have the apt repository enabled) by running one of:
apt install -t fresh-releases ruby-rails-html-sanitizer
Lintian Result
Diff
diff --git a/CHANGELOG.md b/CHANGELOG.md
index e18051c..fc3e49c 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,75 @@
+## 1.6.0 / 2023-05-26
+
+* Dependencies have been updated:
+
+ - Loofah `~>2.21` and Nokogiri `~>1.14` for HTML5 parser support
+ - As a result, required Ruby version is now `>= 2.7.0`
+
+ Security updates will continue to be made on the `1.5.x` release branch as long as Rails 6.1
+ (which supports Ruby 2.5) is still in security support.
+
+ *Mike Dalessio*
+
+* HTML5 standards-compliant sanitizers are now available on platforms supported by
+ Nokogiri::HTML5. These are available as:
+
+ - `Rails::HTML5::FullSanitizer`
+ - `Rails::HTML5::LinkSanitizer`
+ - `Rails::HTML5::SafeListSanitizer`
+
+ And a new "vendor" is provided at `Rails::HTML5::Sanitizer` that can be used in a future version
+ of Rails.
+
+ Note that for symmetry `Rails::HTML4::Sanitizer` is also added, though its behavior is identical
+ to the vendor class methods on `Rails::HTML::Sanitizer`.
+
+ Users may call `Rails::HTML::Sanitizer.best_supported_vendor` to get back the HTML5 vendor if it's
+ supported, else the legacy HTML4 vendor.
+
+ *Mike Dalessio*
+
+* Module namespaces have changed, but backwards compatibility is provided by aliases.
+
+ The library defines three additional modules:
+
+ - `Rails::HTML` for general functionality (replacing `Rails::Html`)
+ - `Rails::HTML4` containing sanitizers that parse content as HTML4
+ - `Rails::HTML5` containing sanitizers that parse content as HTML5
+
+ The following aliases are maintained for backwards compatibility:
+
+ - `Rails::Html` points to `Rails::HTML`
+ - `Rails::HTML::FullSanitizer` points to `Rails::HTML4::FullSanitizer`
+ - `Rails::HTML::LinkSanitizer` points to `Rails::HTML4::LinkSanitizer`
+ - `Rails::HTML::SafeListSanitizer` points to `Rails::HTML4::SafeListSanitizer`
+
+ *Mike Dalessio*
+
+* `LinkSanitizer` always returns UTF-8 encoded strings. `SafeListSanitizer` and `FullSanitizer`
+ already ensured this encoding.
+
+ *Mike Dalessio*
+
+* `SafeListSanitizer` allows `time` tag and `lang` attribute by default.
+
+ *Mike Dalessio*
+
+* The constant `Rails::Html::XPATHS_TO_REMOVE` has been removed. It's not necessary with the
+ existing sanitizers, and should have been a private constant all along anyway.
+
+ *Mike Dalessio*
+
+
+## 1.5.0 / 2023-01-20
+
+* `SafeListSanitizer`, `PermitScrubber`, and `TargetScrubber` now all support pruning of unsafe tags.
+
+ By default, unsafe tags are still stripped, but this behavior can be changed to prune the element
+ and its children from the document by passing `prune: true` to any of these classes' constructors.
+
+ *seyerian*
+
+
## 1.4.4 / 2022-12-13
* Address inefficient regular expression complexity with certain configurations of Rails::Html::Sanitizer.
@@ -52,6 +124,7 @@
*Mike Dalessio*
+
## 1.4.1 / 2021-08-18
* Fix regression in v1.4.0 that did not pass comment nodes to the scrubber.
@@ -64,6 +137,7 @@
*Mike Dalessio*
+
## 1.4.0 / 2021-08-18
* Processing Instructions are no longer allowed by Rails::Html::PermitScrubber
@@ -76,12 +150,14 @@
*Mike Dalessio*
+
## 1.3.0
* Address deprecations in Loofah 2.3.0.
*Josh Goodall*
+
## 1.2.0
* Remove needless `white_list_sanitizer` deprecation.
@@ -96,6 +172,7 @@
*Kasper Timm Hansen*
+
## 1.1.0
* Add `safe_list_sanitizer` and deprecate `white_list_sanitizer` to be removed
@@ -113,10 +190,12 @@
*Kasper Timm Hansen*
+
## 1.0.1
* Added support for Rails 4.2.0.beta2 and above
+
## 1.0.0
* First release.
diff --git a/MIT-LICENSE b/MIT-LICENSE
index 330b78b..c56f78e 100644
--- a/MIT-LICENSE
+++ b/MIT-LICENSE
@@ -1,4 +1,4 @@
-Copyright (c) 2013-2015 Rafael Mendonça França, Kasper Timm Hansen
+Copyright (c) 2013-2023 Rafael Mendonça França, Kasper Timm Hansen, Mike Dalessio
MIT License
diff --git a/README.md b/README.md
index 7b160b5..8cde5c1 100644
--- a/README.md
+++ b/README.md
@@ -1,61 +1,76 @@
-# Rails Html Sanitizers
+# Rails HTML Sanitizers
-In Rails 4.2 and above this gem will be responsible for sanitizing HTML fragments in Rails
-applications, i.e. in the `sanitize`, `sanitize_css`, `strip_tags` and `strip_links` methods.
+This gem is responsible for sanitizing HTML fragments in Rails applications. Specifically, this is the set of sanitizers used to implement the Action View `SanitizerHelper` methods `sanitize`, `sanitize_css`, `strip_tags` and `strip_links`.
-Rails Html Sanitizer is only intended to be used with Rails applications. If you need similar functionality in non Rails apps consider using [Loofah](https://github.com/flavorjones/loofah) directly (that's what handles sanitization under the hood).
+Rails HTML Sanitizer is only intended to be used with Rails applications. If you need similar functionality but aren't using Rails, consider using the underlying sanitization library [Loofah](https://github.com/flavorjones/loofah) directly.
-## Installation
-
-Add this line to your application's Gemfile:
- gem 'rails-html-sanitizer'
-
-And then execute:
+## Usage
- $ bundle
+### Sanitizers
-Or install it yourself as:
+All sanitizers respond to `sanitize`, and are available in variants that use either HTML4 or HTML5 parsing, under the `Rails::HTML4` and `Rails::HTML5` namespaces, respectively.
- $ gem install rails-html-sanitizer
+NOTE: The HTML5 sanitizers are not supported on JRuby. Users may programmatically check for support by calling `Rails::HTML::Sanitizer.html5_support?`.
-## Usage
-### Sanitizers
+#### FullSanitizer
-All sanitizers respond to `sanitize`.
+```ruby
+full_sanitizer = Rails::HTML5::FullSanitizer.new
+full_sanitizer.sanitize("<b>Bold</b> no more! <a href='more.html'>See more here</a>...")
+# => Bold no more! See more here...
+```
-#### FullSanitizer
+or, if you insist on parsing the content as HTML4:
```ruby
-full_sanitizer = Rails::Html::FullSanitizer.new
+full_sanitizer = Rails::HTML4::FullSanitizer.new
full_sanitizer.sanitize("<b>Bold</b> no more! <a href='more.html'>See more here</a>...")
# => Bold no more! See more here...
```
+HTML5 version:
+
+
+
#### LinkSanitizer
```ruby
-link_sanitizer = Rails::Html::LinkSanitizer.new
+link_sanitizer = Rails::HTML5::LinkSanitizer.new
+link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
+# => Only the link text will be kept.
+```
+
+or, if you insist on parsing the content as HTML4:
+
+```ruby
+link_sanitizer = Rails::HTML4::LinkSanitizer.new
link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
# => Only the link text will be kept.
```
+
#### SafeListSanitizer
+This sanitizer is also available as an HTML4 variant, but for simplicity we'll document only the HTML5 variant below.
+
```ruby
-safe_list_sanitizer = Rails::Html::SafeListSanitizer.new
+safe_list_sanitizer = Rails::HTML5::SafeListSanitizer.new
# sanitize via an extensive safe list of allowed elements
safe_list_sanitizer.sanitize(@article.body)
-# safe list only the supplied tags and attributes
+# sanitize only the supplied tags and attributes
safe_list_sanitizer.sanitize(@article.body, tags: %w(table tr td), attributes: %w(id class style))
-# safe list via a custom scrubber
+# sanitize via a custom scrubber
safe_list_sanitizer.sanitize(@article.body, scrubber: ArticleScrubber.new)
-# safe list sanitizer can also sanitize css
+# prune nodes from the tree instead of stripping tags and leaving inner content
+safe_list_sanitizer = Rails::HTML5::SafeListSanitizer.new(prune: true)
+
+# the sanitizer can also sanitize css
safe_list_sanitizer.sanitize_css('background-color: #000;')
```
@@ -63,14 +78,14 @@ safe_list_sanitizer.sanitize_css('background-color: #000;')
Scrubbers are objects responsible for removing nodes or attributes you don't want in your HTML document.
-This gem includes two scrubbers `Rails::Html::PermitScrubber` and `Rails::Html::TargetScrubber`.
+This gem includes two scrubbers `Rails::HTML::PermitScrubber` and `Rails::HTML::TargetScrubber`.
-#### `Rails::Html::PermitScrubber`
+#### `Rails::HTML::PermitScrubber`
This scrubber allows you to permit only the tags and attributes you want.
```ruby
-scrubber = Rails::Html::PermitScrubber.new
+scrubber = Rails::HTML::PermitScrubber.new
scrubber.tags = ['a']
html_fragment = Loofah.fragment('<a><img/ ></a>')
@@ -78,16 +93,34 @@ html_fragment.scrub!(scrubber)
html_fragment.to_s # => "<a></a>"
```
-#### `Rails::Html::TargetScrubber`
+By default, inner content is left, but it can be removed as well.
+
+```ruby
+scrubber = Rails::HTML::PermitScrubber.new
+scrubber.tags = ['a']
+
+html_fragment = Loofah.fragment('<a><span>text</span></a>')
+html_fragment.scrub!(scrubber)
+html_fragment.to_s # => "<a>text</a>"
+
+scrubber = Rails::HTML::PermitScrubber.new(prune: true)
+scrubber.tags = ['a']
+
+html_fragment = Loofah.fragment('<a><span>text</span></a>')
+html_fragment.scrub!(scrubber)
+html_fragment.to_s # => "<a></a>"
+```
+
+#### `Rails::HTML::TargetScrubber`
Where `PermitScrubber` picks out tags and attributes to permit in sanitization,
-`Rails::Html::TargetScrubber` targets them for removal. See https://github.com/flavorjones/loofah/blob/main/lib/loofah/html5/safelist.rb for the tag list.
+`Rails::HTML::TargetScrubber` targets them for removal. See https://github.com/flavorjones/loofah/blob/main/lib/loofah/html5/safelist.rb for the tag list.
**Note:** by default, it will scrub anything that is not part of the permitted tags from
loofah `HTML5::Scrub.allowed_element?`.
```ruby
-scrubber = Rails::Html::TargetScrubber.new
+scrubber = Rails::HTML::TargetScrubber.new
scrubber.tags = ['img']
html_fragment = Loofah.fragment('<a><img/ ></a>')
@@ -95,12 +128,30 @@ html_fragment.scrub!(scrubber)
html_fragment.to_s # => "<a></a>"
```
+Similarly to `PermitScrubber`, nodes can be fully pruned.
+
+```ruby
+scrubber = Rails::HTML::TargetScrubber.new
+scrubber.tags = ['span']
+
+html_fragment = Loofah.fragment('<a><span>text</span></a>')
+html_fragment.scrub!(scrubber)
+html_fragment.to_s # => "<a>text</a>"
+
+scrubber = Rails::HTML::TargetScrubber.new(prune: true)
+scrubber.tags = ['span']
+
+html_fragment = Loofah.fragment('<a><span>text</span></a>')
+html_fragment.scrub!(scrubber)
+html_fragment.to_s # => "<a></a>"
+```
+
#### Custom Scrubbers
You can also create custom scrubbers in your application if you want to.
```ruby
-class CommentScrubber < Rails::Html::PermitScrubber
+class CommentScrubber < Rails::HTML::PermitScrubber
def initialize
super
self.tags = %w( form script comment blockquote )
@@ -113,7 +164,7 @@ class CommentScrubber < Rails::Html::PermitScrubber
end
```
-See `Rails::Html::PermitScrubber` documentation to learn more about which methods can be overridden.
+See `Rails::HTML::PermitScrubber` documentation to learn more about which methods can be overridden.
#### Custom Scrubber in a Rails app
@@ -123,20 +174,98 @@ Using the `CommentScrubber` from above, you can use this in a Rails view like so
<%= sanitize @comment, scrubber: CommentScrubber.new %>
```
+### A note on HTML entities
+
+__Rails HTML sanitizers are intended to be used by the view layer, at page-render time. They are *not* intended to sanitize persisted strings that will be sanitized *again* at page-render time.__
+
+Proper HTML sanitization will replace some characters with HTML entities. For example, text containing a `<` character will be updated to contain `<` to ensure that the markup is well-formed.
+
+This is important to keep in mind because __HTML entities will render improperly if they are sanitized twice.__
+
+
+#### A concrete example showing the problem that can arise
+
+Imagine the user is asked to enter their employer's name, which will appear on their public profile page. Then imagine they enter `JPMorgan Chase & Co.`.
+
+If you sanitize this before persisting it in the database, the stored string will be `JPMorgan Chase & Co.`
+
+When the page is rendered, if this string is sanitized a second time by the view layer, the HTML will contain `JPMorgan Chase &amp; Co.` which will render as "JPMorgan Chase &amp; Co.".
+
+Another problem that can arise is rendering the sanitized string in a non-HTML context (for example, if it ends up being part of an SMS message). In this case, it may contain inappropriate HTML entities.
+
+
+#### Suggested alternatives
+
+You might simply choose to persist the untrusted string as-is (the raw input), and then ensure that the string will be properly sanitized by the view layer.
+
+That raw string, if rendered in an non-HTML context (like SMS), must also be sanitized by a method appropriate for that context. You may wish to look into using [Loofah](https://github.com/flavorjones/loofah) or [Sanitize](https://github.com/rgrove/sanitize) to customize how this sanitization works, including omitting HTML entities in the final string.
+
+If you really want to sanitize the string that's stored in your database, you may wish to look into [Loofah::ActiveRecord](https://github.com/flavorjones/loofah-activerecord) rather than use the Rails HTML sanitizers.
+
+
+### A note on module names
+
+In versions < 1.6, the only module defined by this library was `Rails::Html`. Starting in 1.6, we define three additional modules:
+
+- `Rails::HTML` for general functionality (replacing `Rails::Html`)
+- `Rails::HTML4` containing sanitizers that parse content as HTML4
+- `Rails::HTML5` containing sanitizers that parse content as HTML5 (if supported)
+
+The following aliases are maintained for backwards compatibility:
+
+- `Rails::Html` points to `Rails::HTML`
+- `Rails::HTML::FullSanitizer` points to `Rails::HTML4::FullSanitizer`
+- `Rails::HTML::LinkSanitizer` points to `Rails::HTML4::LinkSanitizer`
+- `Rails::HTML::SafeListSanitizer` points to `Rails::HTML4::SafeListSanitizer`
+
+
+## Installation
+
+Add this line to your application's Gemfile:
+
+ gem 'rails-html-sanitizer'
+
+And then execute:
+
+ $ bundle
+
+Or install it yourself as:
+
+ $ gem install rails-html-sanitizer
+
+
+## Support matrix
+
+| branch | ruby support | actively maintained | security support |
+|--------|--------------|---------------------|----------------------------------------|
+| 1.6.x | >= 2.7 | yes | yes |
+| 1.5.x | >= 2.5 | no | while Rails 6.1 is in security support |
+| 1.4.x | >= 1.8.7 | no | no |
+
+
## Read more
Loofah is what underlies the sanitizers and scrubbers of rails-html-sanitizer.
+
- [Loofah and Loofah Scrubbers](https://github.com/flavorjones/loofah)
The `node` argument passed to some methods in a custom scrubber is an instance of `Nokogiri::XML::Node`.
+
- [`Nokogiri::XML::Node`](https://nokogiri.org/rdoc/Nokogiri/XML/Node.html)
- [Nokogiri](http://nokogiri.org)
-## Contributing to Rails Html Sanitizers
-Rails Html Sanitizers is work of many contributors. You're encouraged to submit pull requests, propose features and discuss issues.
+## Contributing to Rails HTML Sanitizers
+
+Rails HTML Sanitizers is work of many contributors. You're encouraged to submit pull requests, propose features and discuss issues.
See [CONTRIBUTING](CONTRIBUTING.md).
+### Security reports
+
+Trying to report a possible security vulnerability in this project? Please check out the [Rails project's security policy](https://rubyonrails.org/security) for instructions.
+
+
## License
-Rails Html Sanitizers is released under the [MIT License](MIT-LICENSE).
+
+Rails HTML Sanitizers is released under the [MIT License](MIT-LICENSE).
diff --git a/debian/changelog b/debian/changelog
index 044701b..ae67d71 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+ruby-rails-html-sanitizer (1.6.0-1) UNRELEASED; urgency=low
+
+ * New upstream release.
+
+ -- Debian Janitor <janitor@jelmer.uk> Tue, 27 Jun 2023 08:09:39 -0000
+
ruby-rails-html-sanitizer (1.4.4-1) unstable; urgency=medium
* Team upload
diff --git a/lib/rails-html-sanitizer.rb b/lib/rails-html-sanitizer.rb
index 59ed70d..0c48f7f 100644
--- a/lib/rails-html-sanitizer.rb
+++ b/lib/rails-html-sanitizer.rb
@@ -1,30 +1,14 @@
-require "rails/html/sanitizer/version"
-require "loofah"
-require "rails/html/scrubbers"
-require "rails/html/sanitizer"
+# frozen_string_literal: true
-module Rails
- module Html
- class Sanitizer
- class << self
- def full_sanitizer
- Html::FullSanitizer
- end
+require_relative "rails/html/sanitizer/version"
- def link_sanitizer
- Html::LinkSanitizer
- end
+require "loofah"
- def safe_list_sanitizer
- Html::SafeListSanitizer
- end
+require_relative "rails/html/scrubbers"
+require_relative "rails/html/sanitizer"
- def white_list_sanitizer
- safe_list_sanitizer
- end
- end
- end
- end
+module Rails
+ Html = HTML # :nodoc:
end
module ActionView
diff --git a/lib/rails/html/sanitizer.rb b/lib/rails/html/sanitizer.rb
index 5633ca1..b3712a7 100644
--- a/lib/rails/html/sanitizer.rb
+++ b/lib/rails/html/sanitizer.rb
@@ -1,155 +1,422 @@
+# frozen_string_literal: true
+
module Rails
- module Html
- XPATHS_TO_REMOVE = %w{.//script .//form comment()}
+ module HTML
+ class Sanitizer
+ class << self
+ def html5_support?
+ return @html5_support if defined?(@html5_support)
+
+ @html5_support = Loofah.respond_to?(:html5_support?) && Loofah.html5_support?
+ end
+
+ def best_supported_vendor
+ html5_support? ? Rails::HTML5::Sanitizer : Rails::HTML4::Sanitizer
+ end
+ end
- class Sanitizer # :nodoc:
def sanitize(html, options = {})
raise NotImplementedError, "subclasses must implement sanitize method."
end
private
+ def remove_xpaths(node, xpaths)
+ node.xpath(*xpaths).remove
+ node
+ end
+
+ def properly_encode(fragment, options)
+ fragment.xml? ? fragment.to_xml(options) : fragment.to_html(options)
+ end
+ end
+
+ module Concern
+ module ComposedSanitize
+ def sanitize(html, options = {})
+ return unless html
+ return html if html.empty?
+
+ serialize(scrub(parse_fragment(html), options))
+ end
+ end
+
+ module Parser
+ module HTML4
+ def parse_fragment(html)
+ Loofah.html4_fragment(html)
+ end
+ end
+
+ module HTML5
+ def parse_fragment(html)
+ Loofah.html5_fragment(html)
+ end
+ end if Rails::HTML::Sanitizer.html5_support?
+ end
+
+ module Scrubber
+ module Full
+ def scrub(fragment, options = {})
+ fragment.scrub!(TextOnlyScrubber.new)
+ end
+ end
+
+ module Link
+ def initialize
+ super
+ @link_scrubber = TargetScrubber.new
+ @link_scrubber.tags = %w(a)
+ @link_scrubber.attributes = %w(href)
+ end
+
+ def scrub(fragment, options = {})
+ fragment.scrub!(@link_scrubber)
+ end
+ end
+
+ module SafeList
+ # The default safe list for tags
+ DEFAULT_ALLOWED_TAGS = Set.new([
+ "a",
+ "abbr",
+ "acronym",
+ "address",
+ "b",
+ "big",
+ "blockquote",
+ "br",
+ "cite",
+ "code",
+ "dd",
+ "del",
+ "dfn",
+ "div",
+ "dl",
+ "dt",
+ "em",
+ "h1",
+ "h2",
+ "h3",
+ "h4",
+ "h5",
+ "h6",
+ "hr",
+ "i",
+ "img",
+ "ins",
+ "kbd",
+ "li",
+ "ol",
+ "p",
+ "pre",
+ "samp",
+ "small",
+ "span",
+ "strong",
+ "sub",
+ "sup",
+ "time",
+ "tt",
+ "ul",
+ "var",
+ ]).freeze
+
+ # The default safe list for attributes
+ DEFAULT_ALLOWED_ATTRIBUTES = Set.new([
+ "abbr",
+ "alt",
+ "cite",
+ "class",
+ "datetime",
+ "height",
+ "href",
+ "lang",
+ "name",
+ "src",
+ "title",
+ "width",
+ "xml:lang",
+ ]).freeze
- def remove_xpaths(node, xpaths)
- node.xpath(*xpaths).remove
- node
+ def self.included(klass)
+ class << klass
+ attr_accessor :allowed_tags
+ attr_accessor :allowed_attributes
+ end
+
+ klass.allowed_tags = DEFAULT_ALLOWED_TAGS.dup
+ klass.allowed_attributes = DEFAULT_ALLOWED_ATTRIBUTES.dup
+ end
+
+ def initialize(prune: false)
+ @permit_scrubber = PermitScrubber.new(prune: prune)
+ end
+
+ def scrub(fragment, options = {})
+ if scrubber = options[:scrubber]
+ # No duck typing, Loofah ensures subclass of Loofah::Scrubber
+ fragment.scrub!(scrubber)
+ elsif allowed_tags(options) || allowed_attributes(options)
+ @permit_scrubber.tags = allowed_tags(options)
+ @permit_scrubber.attributes = allowed_attributes(options)
+ fragment.scrub!(@permit_scrubber)
+ else
+ fragment.scrub!(:strip)
+ end
+ end
+
+ def sanitize_css(style_string)
+ Loofah::HTML5::Scrub.scrub_css(style_string)
+ end
+
+ private
+ def allowed_tags(options)
+ options[:tags] || self.class.allowed_tags
+ end
+
+ def allowed_attributes(options)
+ options[:attributes] || self.class.allowed_attributes
+ end
+ end
end
- def properly_encode(fragment, options)
- fragment.xml? ? fragment.to_xml(options) : fragment.to_html(options)
+ module Serializer
+ module UTF8Encode
+ def serialize(fragment)
+ properly_encode(fragment, encoding: "UTF-8")
+ end
+ end
end
end
+ end
- # === Rails::Html::FullSanitizer
- # Removes all tags but strips out scripts, forms and comments.
- #
- # full_sanitizer = Rails::Html::FullSanitizer.new
- # full_sanitizer.sanitize("<b>Bold</b> no more! <a href='more.html'>See more here</a>...")
- # # => Bold no more! See more here...
- class FullSanitizer < Sanitizer
- def sanitize(html, options = {})
- return unless html
- return html if html.empty?
+ module HTML4
+ module Sanitizer
+ module VendorMethods
+ def full_sanitizer
+ Rails::HTML4::FullSanitizer
+ end
- loofah_fragment = Loofah.fragment(html)
+ def link_sanitizer
+ Rails::HTML4::LinkSanitizer
+ end
- remove_xpaths(loofah_fragment, XPATHS_TO_REMOVE)
- loofah_fragment.scrub!(TextOnlyScrubber.new)
+ def safe_list_sanitizer
+ Rails::HTML4::SafeListSanitizer
+ end
- properly_encode(loofah_fragment, encoding: 'UTF-8')
+ def white_list_sanitizer # :nodoc:
+ safe_list_sanitizer
+ end
end
+
+ extend VendorMethods
end
- # === Rails::Html::LinkSanitizer
- # Removes +a+ tags and +href+ attributes leaving only the link text.
+ # == Rails::HTML4::FullSanitizer
#
- # link_sanitizer = Rails::Html::LinkSanitizer.new
- # link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
+ # Removes all tags from HTML4 but strips out scripts, forms and comments.
#
- # => 'Only the link text will be kept.'
- class LinkSanitizer < Sanitizer
- def initialize
- @link_scrubber = TargetScrubber.new
- @link_scrubber.tags = %w(a)
- @link_scrubber.attributes = %w(href)
- end
+ # full_sanitizer = Rails::HTML4::FullSanitizer.new
+ # full_sanitizer.sanitize("<b>Bold</b> no more! <a href='more.html'>See more here</a>...")
+ # # => "Bold no more! See more here..."
+ #
+ class FullSanitizer < Rails::HTML::Sanitizer
+ include HTML::Concern::ComposedSanitize
+ include HTML::Concern::Parser::HTML4
+ include HTML::Concern::Scrubber::Full
+ include HTML::Concern::Serializer::UTF8Encode
+ end
- def sanitize(html, options = {})
- Loofah.scrub_fragment(html, @link_scrubber).to_s
- end
+ # == Rails::HTML4::LinkSanitizer
+ #
+ # Removes +a+ tags and +href+ attributes from HTML4 leaving only the link text.
+ #
+ # link_sanitizer = Rails::HTML4::LinkSanitizer.new
+ # link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
+ # # => "Only the link text will be kept."
+ #
+ class LinkSanitizer < Rails::HTML::Sanitizer
+ include HTML::Concern::ComposedSanitize
+ include HTML::Concern::Parser::HTML4
+ include HTML::Concern::Scrubber::Link
+ include HTML::Concern::Serializer::UTF8Encode
end
- # === Rails::Html::SafeListSanitizer
- # Sanitizes html and css from an extensive safe list (see link further down).
+ # == Rails::HTML4::SafeListSanitizer
+ #
+ # Sanitizes HTML4 and CSS from an extensive safe list.
#
# === Whitespace
- # We can't make any guarantees about whitespace being kept or stripped.
- # Loofah uses Nokogiri, which wraps either a C or Java parser for the
- # respective Ruby implementation.
- # Those two parsers determine how whitespace is ultimately handled.
#
- # When the stripped markup will be rendered the users browser won't take
- # whitespace into account anyway. It might be better to suggest your users
- # wrap their whitespace sensitive content in pre tags or that you do
- # so automatically.
+ # We can't make any guarantees about whitespace being kept or stripped. Loofah uses Nokogiri,
+ # which wraps either a C or Java parser for the respective Ruby implementation. Those two
+ # parsers determine how whitespace is ultimately handled.
+ #
+ # When the stripped markup will be rendered the users browser won't take whitespace into account
+ # anyway. It might be better to suggest your users wrap their whitespace sensitive content in
+ # pre tags or that you do so automatically.
#
# === Options
- # Sanitizes both html and css via the safe lists found here:
- # https://github.com/flavorjones/loofah/blob/master/lib/loofah/html5/safelist.rb
#
- # SafeListSanitizer also accepts options to configure
- # the safe list used when sanitizing html.
+ # Sanitizes both html and css via the safe lists found in
+ # Rails::HTML::Concern::Scrubber::SafeList
+ #
+ # SafeListSanitizer also accepts options to configure the safe list used when sanitizing html.
# There's a class level option:
- # Rails::Html::SafeListSanitizer.allowed_tags = %w(table tr td)
- # Rails::Html::SafeListSanitizer.allowed_attributes = %w(id class style)
#
- # Tags and attributes can also be passed to +sanitize+.
- # Passed options take precedence over the class level options.
+ # Rails::HTML4::SafeListSanitizer.allowed_tags = %w(table tr td)
+ # Rails::HTML4::SafeListSanitizer.allowed_attributes = %w(id class style)
+ #
+ # Tags and attributes can also be passed to +sanitize+. Passed options take precedence over the
+ # class level options.
#
# === Examples
- # safe_list_sanitizer = Rails::Html::SafeListSanitizer.new
#
- # Sanitize css doesn't take options
- # safe_list_sanitizer.sanitize_css('background-color: #000;')
+ # safe_list_sanitizer = Rails::HTML4::SafeListSanitizer.new
#
- # Default: sanitize via a extensive safe list of allowed elements
- # safe_list_sanitizer.sanitize(@article.body)
+ # # default: sanitize via a extensive safe list of allowed elements
+ # safe_list_sanitizer.sanitize(@article.body)
#
- # Safe list via the supplied tags and attributes
- # safe_list_sanitizer.sanitize(@article.body, tags: %w(table tr td),
- # attributes: %w(id class style))
+ # # sanitize via the supplied tags and attributes
+ # safe_list_sanitizer.sanitize(
+ # @article.body,
+ # tags: %w(table tr td),
+ # attributes: %w(id class style),
+ # )
#
- # Safe list via a custom scrubber
- # safe_list_sanitizer.sanitize(@article.body, scrubber: ArticleScrubber.new)
- class SafeListSanitizer < Sanitizer
- class << self
- attr_accessor :allowed_tags
- attr_accessor :allowed_attributes
- end
- self.allowed_tags = Set.new(%w(strong em b i p code pre tt samp kbd var sub
- sup dfn cite big small address hr br div span h1 h2 h3 h4 h5 h6 ul ol li dl dt dd abbr
- acronym a img blockquote del ins))
- self.allowed_attributes = Set.new(%w(href src width height alt cite datetime title class name xml:lang abbr))
-
- def initialize
- @permit_scrubber = PermitScrubber.new
- end
-
- def sanitize(html, options = {})
- return unless html
- return html if html.empty?
+ # # sanitize via a custom Loofah scrubber
+ # safe_list_sanitizer.sanitize(@article.body, scrubber: ArticleScrubber.new)
+ #
+ # # prune nodes from the tree instead of stripping tags and leaving inner content
+ # safe_list_sanitizer = Rails::HTML4::SafeListSanitizer.new(prune: true)
+ #
+ # # the sanitizer can also sanitize CSS
+ # safe_list_sanitizer.sanitize_css('background-color: #000;')
+ #
+ class SafeListSanitizer < Rails::HTML::Sanitizer
+ include HTML::Concern::ComposedSanitize
+ include HTML::Concern::Parser::HTML4
+ include HTML::Concern::Scrubber::SafeList
+ include HTML::Concern::Serializer::UTF8Encode
+ end
+ end
- loofah_fragment = Loofah.fragment(html)
+ module HTML5
+ class Sanitizer
+ class << self
+ def full_sanitizer
+ Rails::HTML5::FullSanitizer
+ end
- if scrubber = options[:scrubber]
- # No duck typing, Loofah ensures subclass of Loofah::Scrubber
- loofah_fragment.scrub!(scrubber)
- elsif allowed_tags(options) || allowed_attributes(options)
- @permit_scrubber.tags = allowed_tags(options)
- @permit_scrubber.attributes = allowed_attributes(options)
- loofah_fragment.scrub!(@permit_scrubber)
- else
- remove_xpaths(loofah_fragment, XPATHS_TO_REMOVE)
- loofah_fragment.scrub!(:strip)
+ def link_sanitizer
+ Rails::HTML5::LinkSanitizer
end
- properly_encode(loofah_fragment, encoding: 'UTF-8')
- end
+ def safe_list_sanitizer
+ Rails::HTML5::SafeListSanitizer
+ end
- def sanitize_css(style_string)
- Loofah::HTML5::Scrub.scrub_css(style_string)
+ def white_list_sanitizer # :nodoc:
+ safe_list_sanitizer
+ end
end
+ end
- private
+ # == Rails::HTML5::FullSanitizer
+ #
+ # Removes all tags from HTML5 but strips out scripts, forms and comments.
+ #
+ # full_sanitizer = Rails::HTML5::FullSanitizer.new
+ # full_sanitizer.sanitize("<b>Bold</b> no more! <a href='more.html'>See more here</a>...")
+ # # => "Bold no more! See more here..."
+ #
+ class FullSanitizer < Rails::HTML::Sanitizer
+ include HTML::Concern::ComposedSanitize
+ include HTML::Concern::Parser::HTML5
+ include HTML::Concern::Scrubber::Full
+ include HTML::Concern::Serializer::UTF8Encode
+ end
- def allowed_tags(options)
- options[:tags] || self.class.allowed_tags
- end
+ # == Rails::HTML5::LinkSanitizer
+ #
+ # Removes +a+ tags and +href+ attributes from HTML5 leaving only the link text.
+ #
+ # link_sanitizer = Rails::HTML5::LinkSanitizer.new
+ # link_sanitizer.sanitize('<a href="example.com">Only the link text will be kept.</a>')
+ # # => "Only the link text will be kept."
+ #
+ class LinkSanitizer < Rails::HTML::Sanitizer
+ include HTML::Concern::ComposedSanitize
+ include HTML::Concern::Parser::HTML5
+ include HTML::Concern::Scrubber::Link
+ include HTML::Concern::Serializer::UTF8Encode
+ end
- def allowed_attributes(options)
- options[:attributes] || self.class.allowed_attributes
- end
+ # == Rails::HTML5::SafeListSanitizer
+ #
+ # Sanitizes HTML5 and CSS from an extensive safe list.
+ #
+ # === Whitespace
+ #
+ # We can't make any guarantees about whitespace being kept or stripped. Loofah uses Nokogiri,
+ # which wraps either a C or Java parser for the respective Ruby implementation. Those two
+ # parsers determine how whitespace is ultimately handled.
+ #
+ # When the stripped markup will be rendered the users browser won't take whitespace into account
+ # anyway. It might be better to suggest your users wrap their whitespace sensitive content in
+ # pre tags or that you do so automatically.
+ #
+ # === Options
+ #
+ # Sanitizes both html and css via the safe lists found in
+ # Rails::HTML::Concern::Scrubber::SafeList
+ #
+ # SafeListSanitizer also accepts options to configure the safe list used when sanitizing html.
+ # There's a class level option:
+ #
+ # Rails::HTML5::SafeListSanitizer.allowed_tags = %w(table tr td)
+ # Rails::HTML5::SafeListSanitizer.allowed_attributes = %w(id class style)
+ #
+ # Tags and attributes can also be passed to +sanitize+. Passed options take precedence over the
+ # class level options.
+ #
+ # === Examples
+ #
+ # safe_list_sanitizer = Rails::HTML5::SafeListSanitizer.new
+ #
+ # # default: sanitize via a extensive safe list of allowed elements
+ # safe_list_sanitizer.sanitize(@article.body)
+ #
+ # # sanitize via the supplied tags and attributes
+ # safe_list_sanitizer.sanitize(
+ # @article.body,
+ # tags: %w(table tr td),
+ # attributes: %w(id class style),
+ # )
+ #
+ # # sanitize via a custom Loofah scrubber
+ # safe_list_sanitizer.sanitize(@article.body, scrubber: ArticleScrubber.new)
+ #
+ # # prune nodes from the tree instead of stripping tags and leaving inner content
+ # safe_list_sanitizer = Rails::HTML5::SafeListSanitizer.new(prune: true)
+ #
+ # # the sanitizer can also sanitize CSS
+ # safe_list_sanitizer.sanitize_css('background-color: #000;')
+ #
+ class SafeListSanitizer < Rails::HTML::Sanitizer
+ include HTML::Concern::ComposedSanitize
+ include HTML::Concern::Parser::HTML5
+ include HTML::Concern::Scrubber::SafeList
+ include HTML::Concern::Serializer::UTF8Encode
end
+ end if Rails::HTML::Sanitizer.html5_support?
- WhiteListSanitizer = SafeListSanitizer
+ module HTML
+ Sanitizer.extend(HTML4::Sanitizer::VendorMethods) # :nodoc:
+ FullSanitizer = HTML4::FullSanitizer # :nodoc:
+ LinkSanitizer = HTML4::LinkSanitizer # :nodoc:
+ SafeListSanitizer = HTML4::SafeListSanitizer # :nodoc:
+ WhiteListSanitizer = SafeListSanitizer # :nodoc:
end
end
diff --git a/lib/rails/html/sanitizer/version.rb b/lib/rails/html/sanitizer/version.rb
index 3ceb4c8..e478448 100644
--- a/lib/rails/html/sanitizer/version.rb
+++ b/lib/rails/html/sanitizer/version.rb
@@ -1,7 +1,9 @@
+# frozen_string_literal: true
+
module Rails
- module Html
+ module HTML
class Sanitizer
- VERSION = "1.4.4"
+ VERSION = "1.6.0"
end
end
end
diff --git a/lib/rails/html/scrubbers.rb b/lib/rails/html/scrubbers.rb
index 674d1c4..af53db4 100644
--- a/lib/rails/html/scrubbers.rb
+++ b/lib/rails/html/scrubbers.rb
@@ -1,10 +1,12 @@
+# frozen_string_literal: true
+
module Rails
- module Html
- # === Rails::Html::PermitScrubber
+ module HTML
+ # === Rails::HTML::PermitScrubber
#
- # +Rails::Html::PermitScrubber+ allows you to permit only your own tags and/or attributes.
+ # +Rails::HTML::PermitScrubber+ allows you to permit only your own tags and/or attributes.
#
- # +Rails::Html::PermitScrubber+ can be subclassed to determine:
+ # +Rails::HTML::PermitScrubber+ can be subclassed to determine:
# - When a node should be skipped via +skip_node?+.
# - When a node is allowed via +allowed_node?+.
# - When an attribute should be scrubbed via +scrub_attribute?+.
@@ -27,7 +29,7 @@ module Rails
# If set, attributes excluded will be removed.
# If not, attributes are removed based on Loofahs +HTML5::Scrub.scrub_attributes+.
#
- # class CommentScrubber < Html::PermitScrubber
+ # class CommentScrubber < Rails::HTML::PermitScrubber
# def initialize
# super
# self.tags = %w(form script comment blockquote)
@@ -45,10 +47,11 @@ module Rails
# See the documentation for +Nokogiri::XML::Node+ to understand what's possible
# with nodes: https://nokogiri.org/rdoc/Nokogiri/XML/Node.html
class PermitScrubber < Loofah::Scrubber
- attr_reader :tags, :attributes
+ attr_reader :tags, :attributes, :prune
- def initialize
- @direction = :bottom_up
+ def initialize(prune: false)
+ @prune = prune
+ @direction = @prune ? :top_down : :bottom_up
@tags, @attributes = nil, nil
end
@@ -76,90 +79,89 @@ module Rails
end
protected
+ def allowed_node?(node)
+ @tags.include?(node.name)
+ end
- def allowed_node?(node)
- @tags.include?(node.name)
- end
+ def skip_node?(node)
+ node.text?
+ end
- def skip_node?(node)
- node.text?
- end
+ def scrub_attribute?(name)
+ !@attributes.include?(name)
+ end
- def scrub_attribute?(name)
- !@attributes.include?(name)
- end
+ def keep_node?(node)
+ if @tags
+ allowed_node?(node)
+ else
+ Loofah::HTML5::Scrub.allowed_element?(node.name)
+ end
+ end
- def keep_node?(node)
- if @tags
- allowed_node?(node)
- else
- Loofah::HTML5::Scrub.allowed_element?(node.name)
+ def scrub_node(node)
+ node.before(node.children) unless prune # strip
+ node.remove
end
- end
- def scrub_node(node)
- node.before(node.children) # strip
- node.remove
- end
+ def scrub_attributes(node)
+ if @attributes
+ node.attribute_nodes.each do |attr|
+ attr.remove if scrub_attribute?(attr.name)
+ scrub_attribute(node, attr)
+ end
- def scrub_attributes(node)
- if @attributes
- node.attribute_nodes.each do |attr|
- attr.remove if scrub_attribute?(attr.name)
- scrub_attribute(node, attr)
+ scrub_css_attribute(node)
+ else
+ Loofah::HTML5::Scrub.scrub_attributes(node)
end
-
- scrub_css_attribute(node)
- else
- Loofah::HTML5::Scrub.scrub_attributes(node)
end
- end
- def scrub_css_attribute(node)
- if Loofah::HTML5::Scrub.respond_to?(:scrub_css_attribute)
- Loofah::HTML5::Scrub.scrub_css_attribute(node)
- else
- style = node.attributes['style']
- style.value = Loofah::HTML5::Scrub.scrub_css(style.value) if style
+ def scrub_css_attribute(node)
+ if Loofah::HTML5::Scrub.respond_to?(:scrub_css_attribute)
+ Loofah::HTML5::Scrub.scrub_css_attribute(node)
+ else
+ style = node.attributes["style"]
+ style.value = Loofah::HTML5::Scrub.scrub_css(style.value) if style
+ end
end
- end
- def validate!(var, name)
- if var && !var.is_a?(Enumerable)
- raise ArgumentError, "You should pass :#{name} as an Enumerable"
+ def validate!(var, name)
+ if var && !var.is_a?(Enumerable)
+ raise ArgumentError, "You should pass :#{name} as an Enumerable"
+ end
+ var
end
- var
- end
- def scrub_attribute(node, attr_node)
- attr_name = if attr_node.namespace
- "#{attr_node.namespace.prefix}:#{attr_node.node_name}"
- else
- attr_node.node_name
- end
+ def scrub_attribute(node, attr_node)
+ attr_name = if attr_node.namespace
+ "#{attr_node.namespace.prefix}:#{attr_node.node_name}"
+ else
+ attr_node.node_name
+ end
- if Loofah::HTML5::SafeList::ATTR_VAL_IS_URI.include?(attr_name)
- return if Loofah::HTML5::Scrub.scrub_uri_attribute(attr_node)
- end
+ if Loofah::HTML5::SafeList::ATTR_VAL_IS_URI.include?(attr_name)
+ return if Loofah::HTML5::Scrub.scrub_uri_attribute(attr_node)
+ end
- if Loofah::HTML5::SafeList::SVG_ATTR_VAL_ALLOWS_REF.include?(attr_name)
- Loofah::HTML5::Scrub.scrub_attribute_that_allows_local_ref(attr_node)
- end
+ if Loofah::HTML5::SafeList::SVG_ATTR_VAL_ALLOWS_REF.include?(attr_name)
+ Loofah::HTML5::Scrub.scrub_attribute_that_allows_local_ref(attr_node)
+ end
- if Loofah::HTML5::SafeList::SVG_ALLOW_LOCAL_HREF.include?(node.name) && attr_name == 'xlink:href' && attr_node.value =~ /^\s*[^#\s].*/m
- attr_node.remove
- end
+ if Loofah::HTML5::SafeList::SVG_ALLOW_LOCAL_HREF.include?(node.name) && attr_name == "xlink:href" && attr_node.value =~ /^\s*[^#\s].*/m
+ attr_node.remove
+ end
- node.remove_attribute(attr_node.name) if attr_name == 'src' && attr_node.value !~ /[^[:space:]]/
+ node.remove_attribute(attr_node.name) if attr_name == "src" && attr_node.value !~ /[^[:space:]]/
- Loofah::HTML5::Scrub.force_correct_attribute_escaping! node
- end
+ Loofah::HTML5::Scrub.force_correct_attribute_escaping! node
+ end
end
- # === Rails::Html::TargetScrubber
+ # === Rails::HTML::TargetScrubber
#
- # Where +Rails::Html::PermitScrubber+ picks out tags and attributes to permit in
- # sanitization, +Rails::Html::TargetScrubber+ targets them for removal.
+ # Where +Rails::HTML::PermitScrubber+ picks out tags and attributes to permit in
+ # sanitization, +Rails::HTML::TargetScrubber+ targets them for removal.
#
# +tags=+
# If set, elements included will be stripped.
@@ -176,9 +178,9 @@ module Rails
end
end
- # === Rails::Html::TextOnlyScrubber
+ # === Rails::HTML::TextOnlyScrubber
#
- # +Rails::Html::TextOnlyScrubber+ allows you to permit text nodes.
+ # +Rails::HTML::TextOnlyScrubber+ allows you to permit text nodes.
#
# Unallowed elements will be stripped, i.e. element is removed but its subtree kept.
class TextOnlyScrubber < Loofah::Scrubber
diff --git a/rails-html-sanitizer.gemspec b/rails-html-sanitizer.gemspec
index b2a476e..2ff2cf2 100644
--- a/rails-html-sanitizer.gemspec
+++ b/rails-html-sanitizer.gemspec
@@ -2,41 +2,36 @@
# This file has been automatically generated by gem2tgz #
#########################################################
# -*- encoding: utf-8 -*-
-# stub: rails-html-sanitizer 1.4.4 ruby lib
+# stub: rails-html-sanitizer 1.6.0 ruby lib
Gem::Specification.new do |s|
s.name = "rails-html-sanitizer".freeze
- s.version = "1.4.4"
+ s.version = "1.6.0"
s.required_rubygems_version = Gem::Requirement.new(">= 0".freeze) if s.respond_to? :required_rubygems_version=
- s.metadata = { "bug_tracker_uri" => "https://github.com/rails/rails-html-sanitizer/issues", "changelog_uri" => "https://github.com/rails/rails-html-sanitizer/blob/v1.4.4/CHANGELOG.md", "documentation_uri" => "https://www.rubydoc.info/gems/rails-html-sanitizer/1.4.4", "source_code_uri" => "https://github.com/rails/rails-html-sanitizer/tree/v1.4.4" } if s.respond_to? :metadata=
+ s.metadata = { "bug_tracker_uri" => "https://github.com/rails/rails-html-sanitizer/issues", "changelog_uri" => "https://github.com/rails/rails-html-sanitizer/blob/v1.6.0/CHANGELOG.md", "documentation_uri" => "https://www.rubydoc.info/gems/rails-html-sanitizer/1.6.0", "source_code_uri" => "https://github.com/rails/rails-html-sanitizer/tree/v1.6.0" } if s.respond_to? :metadata=
s.require_paths = ["lib".freeze]
- s.authors = ["Rafael Mendon\u00E7a Fran\u00E7a".freeze, "Kasper Timm Hansen".freeze]
- s.date = "2022-12-13"
+ s.authors = ["Rafael Mendon\u00E7a Fran\u00E7a".freeze, "Kasper Timm Hansen".freeze, "Mike Dalessio".freeze]
+ s.date = "2023-05-26"
s.description = "HTML sanitization for Rails applications".freeze
- s.email = ["rafaelmfranca@gmail.com".freeze, "kaspth@gmail.com".freeze]
- s.files = ["CHANGELOG.md".freeze, "MIT-LICENSE".freeze, "README.md".freeze, "lib/rails-html-sanitizer.rb".freeze, "lib/rails/html/sanitizer.rb".freeze, "lib/rails/html/sanitizer/version.rb".freeze, "lib/rails/html/scrubbers.rb".freeze, "test/sanitizer_test.rb".freeze, "test/scrubbers_test.rb".freeze]
+ s.email = ["rafaelmfranca@gmail.com".freeze, "kaspth@gmail.com".freeze, "mike.dalessio@gmail.com".freeze]
+ s.files = ["CHANGELOG.md".freeze, "MIT-LICENSE".freeze, "README.md".freeze, "lib/rails-html-sanitizer.rb".freeze, "lib/rails/html/sanitizer.rb".freeze, "lib/rails/html/sanitizer/version.rb".freeze, "lib/rails/html/scrubbers.rb".freeze, "test/rails_api_test.rb".freeze, "test/sanitizer_test.rb".freeze, "test/scrubbers_test.rb".freeze]
s.homepage = "https://github.com/rails/rails-html-sanitizer".freeze
s.licenses = ["MIT".freeze]
- s.rubygems_version = "3.3.15".freeze
+ s.required_ruby_version = Gem::Requirement.new(">= 2.7.0".freeze)
+ s.rubygems_version = "3.2.5".freeze
s.summary = "This gem is responsible to sanitize HTML fragments in Rails applications.".freeze
- s.test_files = ["test/sanitizer_test.rb".freeze, "test/scrubbers_test.rb".freeze]
+ s.test_files = ["test/rails_api_test.rb".freeze, "test/sanitizer_test.rb".freeze, "test/scrubbers_test.rb".freeze]
if s.respond_to? :specification_version then
s.specification_version = 4
end
if s.respond_to? :add_runtime_dependency then
- s.add_development_dependency(%q<bundler>.freeze, [">= 1.3"])
- s.add_runtime_dependency(%q<loofah>.freeze, ["~> 2.19", ">= 2.19.1"])
- s.add_development_dependency(%q<minitest>.freeze, [">= 0"])
- s.add_development_dependency(%q<rails-dom-testing>.freeze, [">= 0"])
- s.add_development_dependency(%q<rake>.freeze, [">= 0"])
+ s.add_runtime_dependency(%q<loofah>.freeze, ["~> 2.21"])
+ s.add_runtime_dependency(%q<nokogiri>.freeze, ["~> 1.14"])
else
- s.add_dependency(%q<bundler>.freeze, [">= 1.3"])
- s.add_dependency(%q<loofah>.freeze, ["~> 2.19", ">= 2.19.1"])
- s.add_dependency(%q<minitest>.freeze, [">= 0"])
- s.add_dependency(%q<rails-dom-testing>.freeze, [">= 0"])
- s.add_dependency(%q<rake>.freeze, [">= 0"])
+ s.add_dependency(%q<loofah>.freeze, ["~> 2.21"])
+ s.add_dependency(%q<nokogiri>.freeze, ["~> 1.14"])
end
end
diff --git a/test/rails_api_test.rb b/test/rails_api_test.rb
new file mode 100644
index 0000000..9bc1107
--- /dev/null
+++ b/test/rails_api_test.rb
@@ -0,0 +1,88 @@
+# frozen_string_literal: true
+
+require "minitest/autorun"
+require "rails-html-sanitizer"
+
+class RailsApiTest < Minitest::Test
+ def test_html_module_name_alias
+ assert_equal(Rails::Html, Rails::HTML)
+ assert_equal("Rails::HTML", Rails::Html.name)
+ assert_equal("Rails::HTML", Rails::HTML.name)
+ end
+
+ def test_html_scrubber_class_names
+ assert(Rails::Html::PermitScrubber)
+ assert(Rails::Html::TargetScrubber)
+ assert(Rails::Html::TextOnlyScrubber)
+ assert(Rails::Html::Sanitizer)
+ end
+
+ def test_best_supported_vendor_when_html5_is_not_supported_returns_html4
+ Rails::HTML::Sanitizer.stub(:html5_support?, false) do
+ assert_equal(Rails::HTML4::Sanitizer, Rails::HTML::Sanitizer.best_supported_vendor)
+ end
+ end
+
+ def test_best_supported_vendor_when_html5_is_supported_returns_html5
+ skip("no HTML5 support on this platform") unless Rails::HTML::Sanitizer.html5_support?
+
+ Rails::HTML::Sanitizer.stub(:html5_support?, true) do
+ assert_equal(Rails::HTML5::Sanitizer, Rails::HTML::Sanitizer.best_supported_vendor)
+ end
+ end
+
+ def test_html4_sanitizer_alias_full
+ assert_equal(Rails::HTML4::FullSanitizer, Rails::HTML::FullSanitizer)
+ assert_equal("Rails::HTML4::FullSanitizer", Rails::HTML::FullSanitizer.name)
+ end
+
+ def test_html4_sanitizer_alias_link
+ assert_equal(Rails::HTML4::LinkSanitizer, Rails::HTML::LinkSanitizer)
+ assert_equal("Rails::HTML4::LinkSanitizer", Rails::HTML::LinkSanitizer.name)
+ end
+
+ def test_html4_sanitizer_alias_safe_list
+ assert_equal(Rails::HTML4::SafeListSanitizer, Rails::HTML::SafeListSanitizer)
+ assert_equal("Rails::HTML4::SafeListSanitizer", Rails::HTML::SafeListSanitizer.name)
+ end
+
+ def test_html4_full_sanitizer
+ assert_equal(Rails::HTML4::FullSanitizer, Rails::HTML::Sanitizer.full_sanitizer)
+ assert_equal(Rails::HTML4::FullSanitizer, Rails::HTML4::Sanitizer.full_sanitizer)
+ end
+
+ def test_html4_link_sanitizer
+ assert_equal(Rails::HTML4::LinkSanitizer, Rails::HTML::Sanitizer.link_sanitizer)
+ assert_equal(Rails::HTML4::LinkSanitizer, Rails::HTML4::Sanitizer.link_sanitizer)
+ end
+
+ def test_html4_safe_list_sanitizer
+ assert_equal(Rails::HTML4::SafeListSanitizer, Rails::HTML::Sanitizer.safe_list_sanitizer)
+ assert_equal(Rails::HTML4::SafeListSanitizer, Rails::HTML4::Sanitizer.safe_list_sanitizer)
+ end
+
+ def test_html4_white_list_sanitizer
+ assert_equal(Rails::HTML4::SafeListSanitizer, Rails::HTML::Sanitizer.white_list_sanitizer)
+ assert_equal(Rails::HTML4::SafeListSanitizer, Rails::HTML4::Sanitizer.white_list_sanitizer)
+ end
+
+ def test_html5_full_sanitizer
+ skip("no HTML5 support on this platform") unless Rails::HTML::Sanitizer.html5_support?
+ assert_equal(Rails::HTML5::FullSanitizer, Rails::HTML5::Sanitizer.full_sanitizer)
+ end
+
+ def test_html5_link_sanitizer
+ skip("no HTML5 support on this platform") unless Rails::HTML::Sanitizer.html5_support?
+ assert_equal(Rails::HTML5::LinkSanitizer, Rails::HTML5::Sanitizer.link_sanitizer)
+ end
+
+ def test_html5_safe_list_sanitizer
+ skip("no HTML5 support on this platform") unless Rails::HTML::Sanitizer.html5_support?
+ assert_equal(Rails::HTML5::SafeListSanitizer, Rails::HTML5::Sanitizer.safe_list_sanitizer)
+ end
+
+ def test_html5_white_list_sanitizer
+ skip("no HTML5 support on this platform") unless Rails::HTML::Sanitizer.html5_support?
+ assert_equal(Rails::HTML5::SafeListSanitizer, Rails::HTML5::Sanitizer.white_list_sanitizer)
+ end
+end
diff --git a/test/sanitizer_test.rb b/test/sanitizer_test.rb
index cd0b046..6af882a 100644
--- a/test/sanitizer_test.rb
+++ b/test/sanitizer_test.rb
@@ -1,771 +1,1087 @@
+# frozen_string_literal: true
+
require "minitest/autorun"
require "rails-html-sanitizer"
-require "rails/dom/testing/assertions/dom_assertions"
-puts Nokogiri::VERSION_INFO
+puts "nokogiri version info: #{Nokogiri::VERSION_INFO}"
+puts "html5 support: #{Rails::HTML::Sanitizer.html5_support?}"
+
+#
+# NOTE that many of these tests contain multiple acceptable results.
+#
+# In some cases, this is because of how the HTML4 parser's recovery behavior changed in libxml2
+# 2.9.14 and 2.10.0. For more details, see:
+#
+# - https://github.com/sparklemotion/nokogiri/releases/tag/v1.13.5
+# - https://gitlab.gnome.org/GNOME/libxml2/-/issues/380
+#
+# In other cases, multiple acceptable results are provided because Nokogiri's vendored libxml2 is
+# patched to entity-escape server-side includes (aks "SSI", aka `<!-- #directive param=value -->`).
+#
+# In many other cases, it's because the parser used by Nokogiri on JRuby (xerces+nekohtml) parses
+# slightly differently than libxml2 in edge cases.
+#
+module SanitizerTests
+ def self.loofah_html5_support?
+ Loofah.respond_to?(:html5_support?) && Loofah.html5_support?
+ end
+
+ class BaseSanitizerTest < Minitest::Test
+ class XpathRemovalTestSanitizer < Rails::HTML::Sanitizer
+ def sanitize(html, options = {})
+ fragment = Loofah.fragment(html)
+ remove_xpaths(fragment, options[:xpaths]).to_s
+ end
+ end
-class SanitizersTest < Minitest::Test
- include Rails::Dom::Testing::Assertions::DomAssertions
+ def test_sanitizer_sanitize_raises_not_implemented_error
+ assert_raises NotImplementedError do
+ Rails::HTML::Sanitizer.new.sanitize("asdf")
+ end
+ end
- def test_sanitizer_sanitize_raises_not_implemented_error
- assert_raises NotImplementedError do
- Rails::Html::Sanitizer.new.sanitize('')
+ def test_remove_xpaths_removes_an_xpath
+ html = %(<h1>hello <script>code!</script></h1>)
+ assert_equal %(<h1>hello </h1>), xpath_sanitize(html, xpaths: %w(.//script))
end
- end
- def test_sanitize_nested_script
- assert_equal '<script>alert("XSS");</script>', safe_list_sanitize('<script><script></script>alert("XSS");<script><</script>/</script><script>script></script>', tags: %w(em))
- end
+ def test_remove_xpaths_removes_all_occurrences_of_xpath
+ html = %(<section><header><script>code!</script></header><p>hello <script>code!</script></p></section>)
+ assert_equal %(<section><header></header><p>hello </p></section>), xpath_sanitize(html, xpaths: %w(.//script))
+ end
- def test_sanitize_nested_script_in_style
- assert_equal '<script>alert("XSS");</script>', safe_list_sanitize('<style><script></style>alert("XSS");<style><</style>/</style><style>script></style>', tags: %w(em))
- end
+ def test_remove_xpaths_called_with_faulty_xpath
+ assert_raises Nokogiri::XML::XPath::SyntaxError do
+ xpath_sanitize("<h1>hello<h1>", xpaths: %w(..faulty_xpath))
+ end
+ end
- class XpathRemovalTestSanitizer < Rails::Html::Sanitizer
- def sanitize(html, options = {})
- fragment = Loofah.fragment(html)
- remove_xpaths(fragment, options[:xpaths]).to_s
+ def test_remove_xpaths_called_with_xpath_string
+ assert_equal "", xpath_sanitize("<a></a>", xpaths: ".//a")
end
- end
- def test_remove_xpaths_removes_an_xpath
- html = %(<h1>hello <script>code!</script></h1>)
- assert_equal %(<h1>hello </h1>), xpath_sanitize(html, xpaths: %w(.//script))
- end
+ def test_remove_xpaths_called_with_enumerable_xpaths
+ assert_equal "", xpath_sanitize("<a><span></span></a>", xpaths: %w(.//a .//span))
+ end
- def test_remove_xpaths_removes_all_occurrences_of_xpath
- html = %(<section><header><script>code!</script></header><p>hello <script>code!</script></p></section>)
- assert_equal %(<section><header></header><p>hello </p></section>), xpath_sanitize(html, xpaths: %w(.//script))
+ protected
+ def xpath_sanitize(input, options = {})
+ XpathRemovalTestSanitizer.new.sanitize(input, options)
+ end
end
- def test_remove_xpaths_called_with_faulty_xpath
- assert_raises Nokogiri::XML::XPath::SyntaxError do
- xpath_sanitize('<h1>hello<h1>', xpaths: %w(..faulty_xpath))
+ module ModuleUnderTest
+ def module_under_test
+ self.class.instance_variable_get(:@module_under_test)
end
end
- def test_remove_xpaths_called_with_xpath_string
- assert_equal '', xpath_sanitize('<a></a>', xpaths: './/a')
- end
+ module FullSanitizerTest
+ include ModuleUnderTest
- def test_remove_xpaths_called_with_enumerable_xpaths
- assert_equal '', xpath_sanitize('<a><span></span></a>', xpaths: %w(.//a .//span))
- end
+ def test_strip_tags_with_quote
+ input = '<" <img src="trollface.gif" onload="alert(1)"> hi'
+ result = full_sanitize(input)
+ acceptable_results = [
+ # libxml2 >= 2.9.14 and xerces+neko
+ %{<" hi},
+ # other libxml2
+ %{ hi},
+ ]
- def test_strip_tags_with_quote
- input = '<" <img src="trollface.gif" onload="alert(1)"> hi'
- expected = libxml_2_9_14_recovery_lt? ? %{<" hi} : %{ hi}
- assert_equal(expected, full_sanitize(input))
- end
+ assert_includes(acceptable_results, result)
+ end
- def test_strip_invalid_html
- assert_equal "<<", full_sanitize("<<<bad html")
- end
+ def test_strip_invalid_html
+ assert_equal "<<", full_sanitize("<<<bad html")
+ end
- def test_strip_nested_tags
- expected = "Wei<a onclick='alert(document.cookie);'/>rdos"
- input = "Wei<<a>a onclick='alert(document.cookie);'</a>/>rdos"
- assert_equal expected, full_sanitize(input)
- end
+ def test_strip_nested_tags
+ expected = "Wei<a onclick='alert(document.cookie);'/>rdos"
+ input = "Wei<<a>a onclick='alert(document.cookie);'</a>/>rdos"
+ assert_equal expected, full_sanitize(input)
+ end
- def test_strip_tags_multiline
- expected = %{This is a test.\n\n\n\nIt no longer contains any HTML.\n}
- input = %{<title>This is <b>a <a href="" target="_blank">test</a></b>.</title>\n\n<!-- it has a comment -->\n\n<p>It no <b>longer <strong>contains <em>any <strike>HTML</strike></em>.</strong></b></p>\n}
+ def test_strip_tags_multiline
+ expected = %{This is a test.\n\n\n\nIt no longer contains any HTML.\n}
+ input = %{<h1>This is <b>a <a href="" target="_blank">test</a></b>.</h1>\n\n<!-- it has a comment -->\n\n<p>It no <b>longer <strong>contains <em>any <strike>HTML</strike></em>.</strong></b></p>\n}
- assert_equal expected, full_sanitize(input)
- end
+ assert_equal expected, full_sanitize(input)
+ end
- def test_remove_unclosed_tags
- input = "This is <-- not\n a comment here."
- expected = libxml_2_9_14_recovery_lt? ? %{This is <-- not\n a comment here.} : %{This is }
- assert_equal(expected, full_sanitize(input))
- end
+ def test_remove_unclosed_tags
+ input = "This is <-- not\n a comment here."
+ result = full_sanitize(input)
+ acceptable_results = [
+ # libxml2 >= 2.9.14 and xerces+neko
+ %{This is <-- not\n a comment here.},
+ # other libxml2
+ %{This is },
+ ]
+
+ assert_includes(acceptable_results, result)
+ end
- def test_strip_cdata
- input = "This has a <![CDATA[<section>]]> here."
- expected = libxml_2_9_14_recovery_lt_bang? ? %{This has a <![CDATA[]]> here.} : %{This has a ]]> here.}
- assert_equal(expected, full_sanitize(input))
- end
+ def test_strip_cdata
+ input = "This has a <![CDATA[<section>]]> here."
+ result = full_sanitize(input)
+ acceptable_results = [
+ # libxml2 = 2.9.14
+ %{This has a <![CDATA[]]> here.},
+ # other libxml2
+ %{This has a ]]> here.},
+ # xerces+neko
+ %{This has a here.},
+ ]
+
+ assert_includes(acceptable_results, result)
+ end
- def test_strip_unclosed_cdata
- input = "This has an unclosed <![CDATA[<section>]] here..."
- expected = libxml_2_9_14_recovery_lt_bang? ? %{This has an unclosed <![CDATA[]] here...} : %{This has an unclosed ]] here...}
- assert_equal(expected, full_sanitize(input))
- end
+ def test_strip_blank_string
+ assert_nil full_sanitize(nil)
+ assert_equal "", full_sanitize("")
+ assert_equal " ", full_sanitize(" ")
+ end
- def test_strip_blank_string
- assert_nil full_sanitize(nil)
- assert_equal "", full_sanitize("")
- assert_equal " ", full_sanitize(" ")
- end
+ def test_strip_tags_with_plaintext
+ assert_equal "Don't touch me", full_sanitize("Don't touch me")
+ end
- def test_strip_tags_with_plaintext
- assert_equal "Don't touch me", full_sanitize("Don't touch me")
- end
+ def test_strip_tags_with_tags
+ assert_equal "This is a test.", full_sanitize("<p>This <u>is<u> a <a href='test.html'><strong>test</strong></a>.</p>")
+ end
- def test_strip_tags_with_tags
- assert_equal "This is a test.", full_sanitize("<p>This <u>is<u> a <a href='test.html'><strong>test</strong></a>.</p>")
- end
+ def test_escape_tags_with_many_open_quotes
+ assert_equal "<<", full_sanitize("<<<bad html>")
+ end
- def test_escape_tags_with_many_open_quotes
- assert_equal "<<", full_sanitize("<<<bad html>")
- end
+ def test_strip_tags_with_sentence
+ assert_equal "This is a test.", full_sanitize("This is a test.")
+ end
- def test_strip_tags_with_sentence
- assert_equal "This is a test.", full_sanitize("This is a test.")
- end
+ def test_strip_tags_with_comment
+ assert_equal "This has a here.", full_sanitize("This has a <!-- comment --> here.")
+ end
- def test_strip_tags_with_comment
- assert_equal "This has a here.", full_sanitize("This has a <!-- comment --> here.")
- end
+ def test_strip_tags_with_frozen_string
+ assert_equal "Frozen string with no tags", full_sanitize("Frozen string with no tags")
+ end
- def test_strip_tags_with_frozen_string
- assert_equal "Frozen string with no tags", full_sanitize("Frozen string with no tags".freeze)
- end
+ def test_full_sanitize_respect_html_escaping_of_the_given_string
+ assert_equal 'test\r\nstring', full_sanitize('test\r\nstring')
+ assert_equal "&", full_sanitize("&")
+ assert_equal "&", full_sanitize("&")
+ assert_equal "&amp;", full_sanitize("&amp;")
+ assert_equal "omg <script>BOM</script>", full_sanitize("omg <script>BOM</script>")
+ end
- def test_full_sanitize_respect_html_escaping_of_the_given_string
- assert_equal 'test\r\nstring', full_sanitize('test\r\nstring')
- assert_equal '&', full_sanitize('&')
- assert_equal '&', full_sanitize('&')
- assert_equal '&amp;', full_sanitize('&amp;')
- assert_equal 'omg <script>BOM</script>', full_sanitize('omg <script>BOM</script>')
- end
+ def test_sanitize_ascii_8bit_string
+ full_sanitize("<div><a>hello</a></div>".encode("ASCII-8BIT")).tap do |sanitized|
+ assert_equal "hello", sanitized
+ assert_equal Encoding::UTF_8, sanitized.encoding
+ end
+ end
- def test_strip_links_with_tags_in_tags
- expected = "<a href='hello'>all <b>day</b> long</a>"
- input = "<<a>a href='hello'>all <b>day</b> long<</A>/a>"
- assert_equal expected, link_sanitize(input)
+ protected
+ def full_sanitize(input, options = {})
+ module_under_test::FullSanitizer.new.sanitize(input, options)
+ end
end
- def test_strip_links_with_unclosed_tags
- assert_equal "", link_sanitize("<a<a")
+ class HTML4FullSanitizerTest < Minitest::Test
+ @module_under_test = Rails::HTML4
+ include FullSanitizerTest
end
- def test_strip_links_with_plaintext
- assert_equal "Don't touch me", link_sanitize("Don't touch me")
- end
+ class HTML5FullSanitizerTest < Minitest::Test
+ @module_under_test = Rails::HTML5
+ include FullSanitizerTest
+ end if loofah_html5_support?
- def test_strip_links_with_line_feed_and_uppercase_tag
- assert_equal "on my mind\nall day long", link_sanitize("<a href='almost'>on my mind</a>\n<A href='almost'>all day long</A>")
- end
+ module LinkSanitizerTest
+ include ModuleUnderTest
- def test_strip_links_leaves_nonlink_tags
- assert_equal "My mind\nall <b>day</b> long", link_sanitize("<a href='almost'>My mind</a>\n<A href='almost'>all <b>day</b> long</A>")
- end
+ def test_strip_links_with_tags_in_tags
+ expected = "<a href='hello'>all <b>day</b> long</a>"
+ input = "<<a>a href='hello'>all <b>day</b> long<</A>/a>"
+ assert_equal expected, link_sanitize(input)
+ end
- def test_strip_links_with_links
- assert_equal "0wn3d", link_sanitize("<a href='http://www.rubyonrails.com/'><a href='http://www.rubyonrails.com/' onlclick='steal()'>0wn3d</a></a>")
- end
+ def test_strip_links_with_unclosed_tags
+ assert_equal "", link_sanitize("<a<a")
+ end
- def test_strip_links_with_linkception
- assert_equal "Magic", link_sanitize("<a href='http://www.rubyonrails.com/'>Mag<a href='http://www.ruby-lang.org/'>ic")
- end
+ def test_strip_links_with_plaintext
+ assert_equal "Don't touch me", link_sanitize("Don't touch me")
+ end
- def test_sanitize_form
- assert_sanitized "<form action=\"/foo/bar\" method=\"post\"><input></form>", ''
- end
+ def test_strip_links_with_line_feed_and_uppercase_tag
+ assert_equal "on my mind\nall day long", link_sanitize("<a href='almost'>on my mind</a>\n<A href='almost'>all day long</A>")
+ end
- def test_sanitize_plaintext
- assert_sanitized "<plaintext><span>foo</span></plaintext>", "<span>foo</span>"
- end
+ def test_strip_links_leaves_nonlink_tags
+ assert_equal "My mind\nall <b>day</b> long", link_sanitize("<a href='almost'>My mind</a>\n<A href='almost'>all <b>day</b> long</A>")
+ end
- def test_sanitize_script
- assert_sanitized "a b c<script language=\"Javascript\">blah blah blah</script>d e f", "a b cblah blah blahd e f"
- end
+ def test_strip_links_with_links
+ assert_equal "0wn3d", link_sanitize("<a href='http://www.rubyonrails.com/'><a href='http://www.rubyonrails.com/' onlclick='steal()'>0wn3d</a></a>")
+ end
- def test_sanitize_js_handlers
- raw = %{onthis="do that" <a href="#" onclick="hello" name="foo" onbogus="remove me">hello</a>}
- assert_sanitized raw, %{onthis="do that" <a href="#" name="foo">hello</a>}
- end
+ def test_strip_links_with_linkception
+ assert_equal "Magic", link_sanitize("<a href='http://www.rubyonrails.com/'>Mag<a href='http://www.ruby-lang.org/'>ic")
+ end
+
+ def test_sanitize_ascii_8bit_string
+ link_sanitize("<div><a>hello</a></div>".encode("ASCII-8BIT")).tap do |sanitized|
+ assert_equal "<div>hello</div>", sanitized
+ assert_equal Encoding::UTF_8, sanitized.encoding
+ end
+ end
- def test_sanitize_javascript_href
- raw = %{href="javascript:bang" <a href="javascript:bang" name="hello">foo</a>, <span href="javascript:bang">bar</span>}
- assert_sanitized raw, %{href="javascript:bang" <a name="hello">foo</a>, <span>bar</span>}
+ protected
+ def link_sanitize(input, options = {})
+ module_under_test::LinkSanitizer.new.sanitize(input, options)
+ end
end
- def test_sanitize_image_src
- raw = %{src="javascript:bang" <img src="javascript:bang" width="5">foo</img>, <span src="javascript:bang">bar</span>}
- assert_sanitized raw, %{src="javascript:bang" <img width="5">foo</img>, <span>bar</span>}
+ class HTML4LinkSanitizerTest < Minitest::Test
+ @module_under_test = Rails::HTML4
+ include LinkSanitizerTest
end
- tags = Loofah::HTML5::SafeList::ALLOWED_ELEMENTS - %w(script form)
- tags.each do |tag_name|
- define_method "test_should_allow_#{tag_name}_tag" do
- scope_allowed_tags(tags) do
- assert_sanitized "start <#{tag_name} title=\"1\" onclick=\"foo\">foo <bad>bar</bad> baz</#{tag_name}> end", %(start <#{tag_name} title="1">foo bar baz</#{tag_name}> end)
- end
+ class HTML5LinkSanitizerTest < Minitest::Test
+ @module_under_test = Rails::HTML5
+ include LinkSanitizerTest
+ end if loofah_html5_support?
+
+ module SafeListSanitizerTest
+ include ModuleUnderTest
+
+ def test_sanitize_nested_script
+ assert_equal '<script>alert("XSS");</script>', safe_list_sanitize('<script><script></script>alert("XSS");<script><</script>/</script><script>script></script>', tags: %w(em))
end
- end
- def test_should_allow_anchors
- assert_sanitized %(<a href="foo" onclick="bar"><script>baz</script></a>), %(<a href=\"foo\">baz</a>)
- end
+ def test_sanitize_nested_script_in_style
+ input = '<style><script></style>alert("XSS");<style><</style>/</style><style>script></style>'
+ result = safe_list_sanitize(input, tags: %w(em))
+ acceptable_results = [
+ # libxml2
+ %{<script>alert("XSS");</script>},
+ # xerces+neko. unavoidable double-escaping, see loofah/docs/2022-10-decision-on-cdata-nodes.md
+ %{&lt;script&gt;alert(\"XSS\");&lt;&lt;/style&gt;/script&gt;},
+ ]
+
+ assert_includes(acceptable_results, result)
+ end
- def test_video_poster_sanitization
- scope_allowed_tags(%w(video)) do
- scope_allowed_attributes %w(src poster) do
- assert_sanitized %(<video src="videofile.ogg" autoplay poster="posterimage.jpg"></video>), %(<video src="videofile.ogg" poster="posterimage.jpg"></video>)
- assert_sanitized %(<video src="videofile.ogg" poster=javascript:alert(1)></video>), %(<video src="videofile.ogg"></video>)
- end
+ def test_strip_unclosed_cdata
+ input = "This has an unclosed <![CDATA[<section>]] here..."
+
+ result = safe_list_sanitize(input)
+
+ acceptable_results = [
+ # libxml2 = 2.9.14
+ %{This has an unclosed <![CDATA[]] here...},
+ # other libxml2
+ %{This has an unclosed ]] here...},
+ # xerces+neko
+ %{This has an unclosed }
+ ]
+
+ assert_includes(acceptable_results, result)
end
- end
- # RFC 3986, sec 4.2
- def test_allow_colons_in_path_component
- assert_sanitized "<a href=\"./this:that\">foo</a>"
- end
+ def test_sanitize_form
+ assert_sanitized "<form action=\"/foo/bar\" method=\"post\"><input></form>", ""
+ end
- %w(src width height alt).each do |img_attr|
- define_method "test_should_allow_image_#{img_attr}_attribute" do
- assert_sanitized %(<img #{img_attr}="foo" onclick="bar" />), %(<img #{img_attr}="foo" />)
+ def test_sanitize_plaintext
+ # note that the `plaintext` tag has been deprecated since HTML 2
+ # https://developer.mozilla.org/en-US/docs/Web/HTML/Element/plaintext
+ input = "<plaintext><span>foo</span></plaintext>"
+ result = safe_list_sanitize(input)
+ acceptable_results = [
+ # libxml2
+ "<span>foo</span>",
+ # xerces+nekohtml-unit
+ "<span>foo</span></plaintext>",
+ # xerces+cyberneko
+ "<span>foo</span>"
+ ]
+
+ assert_includes(acceptable_results, result)
end
- end
- def test_should_handle_non_html
- assert_sanitized 'abc'
- end
+ def test_sanitize_script
+ assert_sanitized "a b c<script language=\"Javascript\">blah blah blah</script>d e f", "a b cblah blah blahd e f"
+ end
- def test_should_handle_blank_text
- [nil, '', ' '].each { |blank| assert_sanitized blank }
- end
+ def test_sanitize_js_handlers
+ raw = %{onthis="do that" <a href="#" onclick="hello" name="foo" onbogus="remove me">hello</a>}
+ assert_sanitized raw, %{onthis="do that" <a href="#" name="foo">hello</a>}
+ end
- def test_setting_allowed_tags_affects_sanitization
- scope_allowed_tags %w(u) do |sanitizer|
- assert_equal '<u></u>', sanitizer.sanitize('<a><u></u></a>')
+ def test_sanitize_javascript_href
+ raw = %{href="javascript:bang" <a href="javascript:bang" name="hello">foo</a>, <span href="javascript:bang">bar</span>}
+ assert_sanitized raw, %{href="javascript:bang" <a name="hello">foo</a>, <span>bar</span>}
end
- end
- def test_setting_allowed_attributes_affects_sanitization
- scope_allowed_attributes %w(foo) do |sanitizer|
- input = '<a foo="hello" bar="world"></a>'
- assert_equal '<a foo="hello"></a>', sanitizer.sanitize(input)
+ def test_sanitize_image_src
+ raw = %{src="javascript:bang" <img src="javascript:bang" width="5">foo</img>, <span src="javascript:bang">bar</span>}
+ assert_sanitized raw, %{src="javascript:bang" <img width="5">foo, <span>bar</span>}
end
- end
- def test_custom_tags_overrides_allowed_tags
- scope_allowed_tags %(u) do |sanitizer|
- input = '<a><u></u></a>'
- assert_equal '<a></a>', sanitizer.sanitize(input, tags: %w(a))
+ def test_should_allow_anchors
+ assert_sanitized %(<a href="foo" onclick="bar"><script>baz</script></a>), %(<a href=\"foo\">baz</a>)
end
- end
- def test_custom_attributes_overrides_allowed_attributes
- scope_allowed_attributes %(foo) do |sanitizer|
- input = '<a foo="hello" bar="world"></a>'
- assert_equal '<a bar="world"></a>', sanitizer.sanitize(input, attributes: %w(bar))
+ def test_video_poster_sanitization
+ scope_allowed_tags(%w(video)) do
+ scope_allowed_attributes %w(src poster) do
+ expected = if RUBY_PLATFORM == "java"
+ # xerces+nekohtml alphabetizes the attributes! FML.
+ %(<video poster="posterimage.jpg" src="videofile.ogg"></video>)
+ else
+ %(<video src="videofile.ogg" poster="posterimage.jpg"></video>)
+ end
+ assert_sanitized(
+ %(<video src="videofile.ogg" autoplay poster="posterimage.jpg"></video>),
+ expected,
+ )
+ assert_sanitized(
+ %(<video src="videofile.ogg" poster=javascript:alert(1)></video>),
+ %(<video src="videofile.ogg"></video>),
+ )
+ end
+ end
end
- end
- def test_should_allow_custom_tags
- text = "<u>foo</u>"
- assert_equal text, safe_list_sanitize(text, tags: %w(u))
- end
+ # RFC 3986, sec 4.2
+ def test_allow_colons_in_path_component
+ assert_sanitized "<a href=\"./this:that\">foo</a>"
+ end
- def test_should_allow_only_custom_tags
- text = "<u>foo</u> with <i>bar</i>"
- assert_equal "<u>foo</u> with bar", safe_list_sanitize(text, tags: %w(u))
- end
+ %w(src width height alt).each do |img_attr|
+ define_method "test_should_allow_image_#{img_attr}_attribute" do
+ assert_sanitized %(<img #{img_attr}="foo" onclick="bar" />), %(<img #{img_attr}="foo">)
+ end
+ end
- def test_should_allow_custom_tags_with_attributes
- text = %(<blockquote cite="http://example.com/">foo</blockquote>)
- assert_equal text, safe_list_sanitize(text)
- end
+ def test_lang_and_xml_lang
+ # https://html.spec.whatwg.org/multipage/dom.html#the-lang-and-xml:lang-attributes
+ #
+ # 3.2.6.2 The lang and xml:lang attributes
+ #
+ # ... Authors must not use the lang attribute in the XML namespace on HTML elements in HTML
+ # documents. To ease migration to and from XML, authors may specify an attribute in no namespace
+ # with no prefix and with the literal localname "xml:lang" on HTML elements in HTML documents,
+ # but such attributes must only be specified if a lang attribute in no namespace is also
+ # specified, and both attributes must have the same value when compared in an ASCII
+ # case-insensitive manner.
+ input = expected = "<div lang=\"en\" xml:lang=\"en\">foo</div>"
+ assert_sanitized(input, expected)
+ end
- def test_should_allow_custom_tags_with_custom_attributes
- text = %(<blockquote foo="bar">Lorem ipsum</blockquote>)
- assert_equal text, safe_list_sanitize(text, attributes: ['foo'])
- end
+ def test_should_handle_non_html
+ assert_sanitized "abc"
+ end
- def test_scrub_style_if_style_attribute_option_is_passed
- input = '<p style="color: #000; background-image: url(http://www.ragingplatypus.com/i/cam-full.jpg);"></p>'
- actual = safe_list_sanitize(input, attributes: %w(style))
- assert_includes(['<p style="color: #000;"></p>', '<p style="color:#000;"></p>'], actual)
- end
+ def test_should_handle_blank_text
+ assert_nil(safe_list_sanitize(nil))
+ assert_equal("", safe_list_sanitize(""))
+ assert_equal(" ", safe_list_sanitize(" "))
+ end
- def test_should_raise_argument_error_if_tags_is_not_enumerable
- assert_raises ArgumentError do
- safe_list_sanitize('<a>some html</a>', tags: 'foo')
+ def test_setting_allowed_tags_affects_sanitization
+ scope_allowed_tags %w(u) do |sanitizer|
+ assert_equal "<u></u>", sanitizer.sanitize("<a><u></u></a>")
+ end
end
- end
- def test_should_raise_argument_error_if_attributes_is_not_enumerable
- assert_raises ArgumentError do
- safe_list_sanitize('<a>some html</a>', attributes: 'foo')
+ def test_setting_allowed_attributes_affects_sanitization
+ scope_allowed_attributes %w(foo) do |sanitizer|
+ input = '<a foo="hello" bar="world"></a>'
+ assert_equal '<a foo="hello"></a>', sanitizer.sanitize(input)
+ end
end
- end
- def test_should_not_accept_non_loofah_inheriting_scrubber
- scrubber = Object.new
- def scrubber.scrub(node); node.name = 'h1'; end
+ def test_custom_tags_overrides_allowed_tags
+ scope_allowed_tags %(u) do |sanitizer|
+ input = "<a><u></u></a>"
+ assert_equal "<a></a>", sanitizer.sanitize(input, tags: %w(a))
+ end
+ end
- assert_raises Loofah::ScrubberNotFound do
- safe_list_sanitize('<a>some html</a>', scrubber: scrubber)
+ def test_custom_attributes_overrides_allowed_attributes
+ scope_allowed_attributes %(foo) do |sanitizer|
+ input = '<a foo="hello" bar="world"></a>'
+ assert_equal '<a bar="world"></a>', sanitizer.sanitize(input, attributes: %w(bar))
+ end
end
- end
- def test_should_accept_loofah_inheriting_scrubber
- scrubber = Loofah::Scrubber.new
- def scrubber.scrub(node); node.name = 'h1'; end
+ def test_should_allow_prune
+ sanitizer = module_under_test::SafeListSanitizer.new(prune: true)
+ text = "<u>leave me <b>now</b></u>"
+ assert_equal "<u>leave me </u>", sanitizer.sanitize(text, tags: %w(u))
+ end
- html = "<script>hello!</script>"
- assert_equal "<h1>hello!</h1>", safe_list_sanitize(html, scrubber: scrubber)
- end
+ def test_should_allow_custom_tags
+ text = "<u>foo</u>"
+ assert_equal text, safe_list_sanitize(text, tags: %w(u))
+ end
- def test_should_accept_loofah_scrubber_that_wraps_a_block
- scrubber = Loofah::Scrubber.new { |node| node.name = 'h1' }
- html = "<script>hello!</script>"
- assert_equal "<h1>hello!</h1>", safe_list_sanitize(html, scrubber: scrubber)
- end
+ def test_should_allow_only_custom_tags
+ text = "<u>foo</u> with <i>bar</i>"
+ assert_equal "<u>foo</u> with bar", safe_list_sanitize(text, tags: %w(u))
+ end
- def test_custom_scrubber_takes_precedence_over_other_options
- scrubber = Loofah::Scrubber.new { |node| node.name = 'h1' }
- html = "<script>hello!</script>"
- assert_equal "<h1>hello!</h1>", safe_list_sanitize(html, scrubber: scrubber, tags: ['foo'])
- end
+ def test_should_allow_custom_tags_with_attributes
+ text = %(<blockquote cite="http://example.com/">foo</blockquote>)
+ assert_equal text, safe_list_sanitize(text)
+ end
- [%w(img src), %w(a href)].each do |(tag, attr)|
- define_method "test_should_strip_#{attr}_attribute_in_#{tag}_with_bad_protocols" do
- assert_sanitized %(<#{tag} #{attr}="javascript:bang" title="1">boo</#{tag}>), %(<#{tag} title="1">boo</#{tag}>)
+ def test_should_allow_custom_tags_with_custom_attributes
+ text = %(<blockquote foo="bar">Lorem ipsum</blockquote>)
+ assert_equal text, safe_list_sanitize(text, attributes: ["foo"])
end
- end
- def test_should_block_script_tag
- assert_sanitized %(<SCRIPT\nSRC=http://ha.ckers.org/xss.js></SCRIPT>), ""
- end
+ def test_scrub_style_if_style_attribute_option_is_passed
+ input = '<p style="color: #000; background-image: url(http://www.ragingplatypus.com/i/cam-full.jpg);"></p>'
+ actual = safe_list_sanitize(input, attributes: %w(style))
- def test_should_not_fall_for_xss_image_hack_with_uppercase_tags
- assert_sanitized %(<IMG """><SCRIPT>alert("XSS")</SCRIPT>">), %(<img>alert("XSS")">)
- end
+ assert_includes(['<p style="color: #000;"></p>', '<p style="color:#000;"></p>'], actual)
+ end
- [%(<IMG SRC="javascript:alert('XSS');">),
- %(<IMG SRC=javascript:alert('XSS')>),
- %(<IMG SRC=JaVaScRiPt:alert('XSS')>),
- %(<IMG SRC=javascript:alert("XSS")>),
- %(<IMG SRC=javascript:alert(String.fromCharCode(88,83,83))>),
- %(<IMG SRC=javascript:alert('XSS')>),
- %(<IMG SRC=javascript:alert('XSS')>),
- %(<IMG SRC=javascript:alert('XSS')>),
- %(<IMG SRC="jav\tascript:alert('XSS');">),
- %(<IMG SRC="jav	ascript:alert('XSS');">),
- %(<IMG SRC="jav
ascript:alert('XSS');">),
- %(<IMG SRC="jav
ascript:alert('XSS');">),
- %(<IMG SRC="  javascript:alert('XSS');">),
- %(<IMG SRC="javascript:alert('XSS');">),
- %(<IMG SRC=`javascript:alert("RSnake says, 'XSS'")`>)].each do |img_hack|
- define_method "test_should_not_fall_for_xss_image_hack_#{img_hack}" do
- assert_sanitized img_hack, "<img>"
+ def test_should_raise_argument_error_if_tags_is_not_enumerable
+ assert_raises ArgumentError do
+ safe_list_sanitize("<a>some html</a>", tags: "foo")
+ end
end
- end
- def test_should_sanitize_tag_broken_up_by_null
- assert_sanitized %(<SCR\0IPT>alert(\"XSS\")</SCR\0IPT>), ""
- end
+ def test_should_raise_argument_error_if_attributes_is_not_enumerable
+ assert_raises ArgumentError do
+ safe_list_sanitize("<a>some html</a>", attributes: "foo")
+ end
+ end
- def test_should_sanitize_invalid_script_tag
- assert_sanitized %(<SCRIPT/XSS SRC="http://ha.ckers.org/xss.js"></SCRIPT>), ""
- end
+ def test_should_not_accept_non_loofah_inheriting_scrubber
+ scrubber = Object.new
+ def scrubber.scrub(node); node.name = "h1"; end
- def test_should_sanitize_script_tag_with_multiple_open_brackets
- assert_sanitized %(<<SCRIPT>alert("XSS");//<</SCRIPT>), "<alert(\"XSS\");//<"
- assert_sanitized %(<iframe src=http://ha.ckers.org/scriptlet.html\n<a), ""
- end
+ assert_raises Loofah::ScrubberNotFound do
+ safe_list_sanitize("<a>some html</a>", scrubber: scrubber)
+ end
+ end
- def test_should_sanitize_unclosed_script
- assert_sanitized %(<SCRIPT SRC=http://ha.ckers.org/xss.js?<B>), ""
- end
+ def test_should_accept_loofah_inheriting_scrubber
+ scrubber = Loofah::Scrubber.new
+ def scrubber.scrub(node); node.replace("<h1>#{node.inner_html}</h1>"); end
- def test_should_sanitize_half_open_scripts
- assert_sanitized %(<IMG SRC="javascript:alert('XSS')"), "<img>"
- end
+ html = "<script>hello!</script>"
+ assert_equal "<h1>hello!</h1>", safe_list_sanitize(html, scrubber: scrubber)
+ end
- def test_should_not_fall_for_ridiculous_hack
- img_hack = %(<IMG\nSRC\n=\n"\nj\na\nv\na\ns\nc\nr\ni\np\nt\n:\na\nl\ne\nr\nt\n(\n'\nX\nS\nS\n'\n)\n"\n>)
- assert_sanitized img_hack, "<img>"
- end
+ def test_should_accept_loofah_scrubber_that_wraps_a_block
+ scrubber = Loofah::Scrubber.new { |node| node.replace("<h1>#{node.inner_html}</h1>") }
+ html = "<script>hello!</script>"
+ assert_equal "<h1>hello!</h1>", safe_list_sanitize(html, scrubber: scrubber)
+ end
- def test_should_sanitize_attributes
- assert_sanitized %(<SPAN title="'><script>alert()</script>">blah</SPAN>), %(<span title="#{CGI.escapeHTML "'><script>alert()</script>"}">blah</span>)
- end
+ def test_custom_scrubber_takes_precedence_over_other_options
+ scrubber = Loofah::Scrubber.new { |node| node.replace("<h1>#{node.inner_html}</h1>") }
+ html = "<script>hello!</script>"
+ assert_equal "<h1>hello!</h1>", safe_list_sanitize(html, scrubber: scrubber, tags: ["foo"])
+ end
- def test_should_sanitize_illegal_style_properties
- raw = %(display:block; position:absolute; left:0; top:0; width:100%; height:100%; z-index:1; background-color:black; background-image:url(http://www.ragingplatypus.com/i/cam-full.jpg); background-x:center; background-y:center; background-repeat:repeat;)
- expected = %(display:block;width:100%;height:100%;background-color:black;background-x:center;background-y:center;)
- assert_equal expected, sanitize_css(raw)
- end
+ def test_should_strip_src_attribute_in_img_with_bad_protocols
+ assert_sanitized %(<img src="javascript:bang" title="1">), %(<img title="1">)
+ end
- def test_should_sanitize_with_trailing_space
- raw = "display:block; "
- expected = "display:block;"
- assert_equal expected, sanitize_css(raw)
- end
+ def test_should_strip_href_attribute_in_a_with_bad_protocols
+ assert_sanitized %(<a href="javascript:bang" title="1">boo</a>), %(<a title="1">boo</a>)
+ end
- def test_should_sanitize_xul_style_attributes
- raw = %(-moz-binding:url('http://ha.ckers.org/xssmoz.xml#xss'))
- assert_equal '', sanitize_css(raw)
- end
+ def test_should_block_script_tag
+ assert_sanitized %(<SCRIPT\nSRC=http://ha.ckers.org/xss.js></SCRIPT>), ""
+ end
- def test_should_sanitize_invalid_tag_names
- assert_sanitized(%(a b c<script/XSS src="http://ha.ckers.org/xss.js"></script>d e f), "a b cd e f")
- end
+ def test_should_not_fall_for_xss_image_hack_with_uppercase_tags
+ assert_sanitized %(<IMG """><SCRIPT>alert("XSS")</SCRIPT>">), %(<img>alert("XSS")">)
+ end
- def test_should_sanitize_non_alpha_and_non_digit_characters_in_tags
- assert_sanitized('<a onclick!#$%&()*~+-_.,:;?@[/|\]^`=alert("XSS")>foo</a>', "<a>foo</a>")
- end
+ [%(<IMG SRC="javascript:alert('XSS');">),
+ %(<IMG SRC=javascript:alert('XSS')>),
+ %(<IMG SRC=JaVaScRiPt:alert('XSS')>),
+ %(<IMG SRC=javascript:alert("XSS")>),
+ %(<IMG SRC=javascript:alert(String.fromCharCode(88,83,83))>),
+ %(<IMG SRC=javascript:alert('XSS')>),
+ %(<IMG SRC=javascript:alert('XSS')>),
+ %(<IMG SRC=javascript:alert('XSS')>),
+ %(<IMG SRC="jav\tascript:alert('XSS');">),
+ %(<IMG SRC="jav	ascript:alert('XSS');">),
+ %(<IMG SRC="jav
ascript:alert('XSS');">),
+ %(<IMG SRC="jav
ascript:alert('XSS');">),
+ %(<IMG SRC="  javascript:alert('XSS');">),
+ %(<IMG SRC="javascript:alert('XSS');">),
+ %(<IMG SRC=`javascript:alert("RSnake says, 'XSS'")`>)].each do |img_hack|
+ define_method "test_should_not_fall_for_xss_image_hack_#{img_hack}" do
+ assert_sanitized img_hack, "<img>"
+ end
+ end
- def test_should_sanitize_invalid_tag_names_in_single_tags
- assert_sanitized('<img/src="http://ha.ckers.org/xss.js"/>', "<img />")
- end
+ def test_should_sanitize_tag_broken_up_by_null
+ input = %(<SCR\0IPT>alert(\"XSS\")</SCR\0IPT>)
+ result = safe_list_sanitize(input)
+ acceptable_results = [
+ # libxml2
+ "",
+ # xerces+neko
+ 'alert("XSS")',
+ ]
+
+ assert_includes(acceptable_results, result)
+ end
- def test_should_sanitize_img_dynsrc_lowsrc
- assert_sanitized(%(<img lowsrc="javascript:alert('XSS')" />), "<img />")
- end
+ def test_should_sanitize_invalid_script_tag
+ assert_sanitized %(<SCRIPT/XSS SRC="http://ha.ckers.org/xss.js"></SCRIPT>), ""
+ end
- def test_should_sanitize_div_background_image_unicode_encoded
- [
- convert_to_css_hex("url(javascript:alert(1))", false),
- convert_to_css_hex("url(javascript:alert(1))", true),
- convert_to_css_hex("url(https://example.com)", false),
- convert_to_css_hex("url(https://example.com)", true),
- ].each do |propval|
- raw = "background-image:" + propval
- assert_empty(sanitize_css(raw))
+ def test_should_sanitize_script_tag_with_multiple_open_brackets
+ assert_sanitized %(<<SCRIPT>alert("XSS");//<</SCRIPT>), "<alert(\"XSS\");//<"
end
- end
- def test_should_allow_div_background_image_unicode_encoded_safe_functions
- [
- convert_to_css_hex("rgb(255,0,0)", false),
- convert_to_css_hex("rgb(255,0,0)", true),
- ].each do |propval|
- raw = "background-image:" + propval
- assert_includes(sanitize_css(raw), "background-image")
+ def test_should_sanitize_script_tag_with_multiple_open_brackets_2
+ input = %(<iframe src=http://ha.ckers.org/scriptlet.html\n<a)
+ result = safe_list_sanitize(input)
+ acceptable_results = [
+ # libxml2
+ "",
+ # xerces+neko
+ "<a",
+ ]
+
+ assert_includes(acceptable_results, result)
end
- end
- def test_should_sanitize_div_style_expression
- raw = %(width: expression(alert('XSS'));)
- assert_equal '', sanitize_css(raw)
- end
+ def test_should_sanitize_unclosed_script
+ assert_sanitized %(<SCRIPT SRC=http://ha.ckers.org/xss.js?<B>), ""
+ end
- def test_should_sanitize_across_newlines
- raw = %(\nwidth:\nexpression(alert('XSS'));\n)
- assert_equal '', sanitize_css(raw)
- end
+ def test_should_sanitize_half_open_scripts
+ input = %(<IMG SRC="javascript:alert('XSS')")
+ result = safe_list_sanitize(input)
+ acceptable_results = [
+ # libxml2
+ "<img>",
+ # libgumbo
+ "",
+ ]
+
+ assert_includes(acceptable_results, result)
+ end
- def test_should_sanitize_img_vbscript
- assert_sanitized %(<img src='vbscript:msgbox("XSS")' />), '<img />'
- end
+ def test_should_not_fall_for_ridiculous_hack
+ img_hack = %(<IMG\nSRC\n=\n"\nj\na\nv\na\ns\nc\nr\ni\np\nt\n:\na\nl\ne\nr\nt\n(\n'\nX\nS\nS\n'\n)\n"\n>)
+ assert_sanitized img_hack, "<img>"
+ end
- def test_should_sanitize_cdata_section
- input = "<![CDATA[<span>section</span>]]>"
- expected = libxml_2_9_14_recovery_lt_bang? ? %{<![CDATA[<span>section</span>]]>} : %{section]]>}
- assert_sanitized(input, expected)
- end
+ def test_should_sanitize_attributes
+ input = %(<SPAN title="'><script>alert()</script>">blah</SPAN>)
+ result = safe_list_sanitize(input)
+ acceptable_results = [
+ # libxml2
+ %(<span title="'><script>alert()</script>">blah</span>),
+ # libgumbo
+ # this looks scary, but it's fine. for a more detailed analysis check out:
+ # https://github.com/discourse/discourse/pull/21522#issuecomment-1545697968
+ %(<span title="'><script>alert()</script>">blah</span>)
+ ]
+
+ assert_includes(acceptable_results, result)
+ end
- def test_should_sanitize_unterminated_cdata_section
- input = "<![CDATA[<span>neverending..."
- expected = libxml_2_9_14_recovery_lt_bang? ? %{<![CDATA[<span>neverending...</span>} : %{neverending...}
- assert_sanitized(input, expected)
- end
+ def test_should_sanitize_invalid_tag_names
+ assert_sanitized(%(a b c<script/XSS src="http://ha.ckers.org/xss.js"></script>d e f), "a b cd e f")
+ end
- def test_should_not_mangle_urls_with_ampersand
- assert_sanitized %{<a href=\"http://www.domain.com?var1=1&var2=2\">my link</a>}
- end
+ def test_should_sanitize_non_alpha_and_non_digit_characters_in_tags
+ assert_sanitized('<a onclick!#$%&()*~+-_.,:;?@[/|\]^`=alert("XSS")>foo</a>', "<a>foo</a>")
+ end
- def test_should_sanitize_neverending_attribute
- assert_sanitized "<span class=\"\\", "<span class=\"\\\">"
- end
+ def test_should_sanitize_invalid_tag_names_in_single_tags
+ input = %(<img/src="http://ha.ckers.org/xss.js"/>)
+ result = safe_list_sanitize(input)
+ acceptable_results = [
+ # libxml2
+ "<img>",
+ # libgumbo
+ %(<img src="http://ha.ckers.org/xss.js">),
+ ]
+
+ assert_includes(acceptable_results, result)
+ end
- [
- %(<a href="javascript:alert('XSS');">),
- %(<a href="javascript:alert('XSS');">),
- %(<a href="javascript:alert('XSS');">),
- %(<a href="javascript:alert('XSS');">)
- ].each_with_index do |enc_hack, i|
- define_method "test_x03a_handling_#{i+1}" do
- assert_sanitized enc_hack, "<a>"
+ def test_should_sanitize_img_dynsrc_lowsrc
+ assert_sanitized(%(<img lowsrc="javascript:alert('XSS')" />), "<img>")
end
- end
- def test_x03a_legitimate
- assert_sanitized %(<a href="http://legit">), %(<a href="http://legit">)
- assert_sanitized %(<a href="http://legit">), %(<a href="http://legit">)
- end
+ def test_should_sanitize_img_vbscript
+ assert_sanitized %(<img src='vbscript:msgbox("XSS")' />), "<img>"
+ end
- def test_sanitize_ascii_8bit_string
- safe_list_sanitize('<a>hello</a>'.encode('ASCII-8BIT')).tap do |sanitized|
- assert_equal '<a>hello</a>', sanitized
- assert_equal Encoding::UTF_8, sanitized.encoding
+ def test_should_sanitize_cdata_section
+ input = "<![CDATA[<span>section</span>]]>"
+ result = safe_list_sanitize(input)
+ acceptable_results = [
+ # libxml2 = 2.9.14
+ %{<![CDATA[<span>section</span>]]>},
+ # other libxml2
+ %{section]]>},
+ # xerces+neko
+ "",
+ ]
+
+ assert_includes(acceptable_results, result)
end
- end
- def test_sanitize_data_attributes
- assert_sanitized %(<a href="/blah" data-method="post">foo</a>), %(<a href="/blah">foo</a>)
- assert_sanitized %(<a data-remote="true" data-type="script" data-method="get" data-cross-domain="true" href="attack.js">Launch the missiles</a>), %(<a href="attack.js">Launch the missiles</a>)
- end
+ def test_should_sanitize_unterminated_cdata_section
+ input = "<![CDATA[<span>neverending..."
+ result = safe_list_sanitize(input)
- def test_allow_data_attribute_if_requested
- text = %(<a data-foo="foo">foo</a>)
- assert_equal %(<a data-foo="foo">foo</a>), safe_list_sanitize(text, attributes: ['data-foo'])
- end
+ acceptable_results = [
+ # libxml2 = 2.9.14
+ %{<![CDATA[<span>neverending...</span>},
+ # other libxml2
+ %{neverending...},
+ # xerces+neko
+ ""
+ ]
- def test_uri_escaping_of_href_attr_in_a_tag_in_safe_list_sanitizer
- skip if RUBY_VERSION < "2.3"
+ assert_includes(acceptable_results, result)
+ end
- html = %{<a href='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
+ def test_should_not_mangle_urls_with_ampersand
+ assert_sanitized %{<a href=\"http://www.domain.com?var1=1&var2=2\">my link</a>}
+ end
- text = safe_list_sanitize(html)
+ def test_should_sanitize_neverending_attribute
+ # note that assert_dom_equal chokes in this case! so avoid using assert_sanitized
+ assert_equal("<span class=\"\\\"></span>", safe_list_sanitize("<span class=\"\\\">"))
+ end
- acceptable_results = [
- # nokogiri w/vendored+patched libxml2
- %{<a href="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
- # nokogiri w/ system libxml2
- %{<a href="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
- ]
- assert_includes(acceptable_results, text)
- end
+ [
+ %(<a href="javascript:alert('XSS');">),
+ %(<a href="javascript:alert('XSS');">),
+ %(<a href="javascript:alert('XSS');">),
+ %(<a href="javascript:alert('XSS');">)
+ ].each_with_index do |enc_hack, i|
+ define_method "test_x03a_handling_#{i + 1}" do
+ assert_sanitized enc_hack, "<a></a>"
+ end
+ end
- def test_uri_escaping_of_src_attr_in_a_tag_in_safe_list_sanitizer
- skip if RUBY_VERSION < "2.3"
+ def test_x03a_legitimate
+ assert_sanitized %(<a href="http://legit">asdf</a>), %(<a href="http://legit">asdf</a>)
+ assert_sanitized %(<a href="http://legit">asdf</a>), %(<a href="http://legit">asdf</a>)
+ end
- html = %{<a src='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
+ def test_sanitize_ascii_8bit_string
+ safe_list_sanitize("<div><a>hello</a></div>".encode("ASCII-8BIT")).tap do |sanitized|
+ assert_equal "<div><a>hello</a></div>", sanitized
+ assert_equal Encoding::UTF_8, sanitized.encoding
+ end
+ end
- text = safe_list_sanitize(html)
+ def test_sanitize_data_attributes
+ assert_sanitized %(<a href="/blah" data-method="post">foo</a>), %(<a href="/blah">foo</a>)
+ assert_sanitized %(<a data-remote="true" data-type="script" data-method="get" data-cross-domain="true" href="attack.js">Launch the missiles</a>), %(<a href="attack.js">Launch the missiles</a>)
+ end
- acceptable_results = [
- # nokogiri w/vendored+patched libxml2
- %{<a src="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
- # nokogiri w/system libxml2
- %{<a src="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
- ]
- assert_includes(acceptable_results, text)
- end
+ def test_allow_data_attribute_if_requested
+ text = %(<a data-foo="foo">foo</a>)
+ assert_equal %(<a data-foo="foo">foo</a>), safe_list_sanitize(text, attributes: ["data-foo"])
+ end
- def test_uri_escaping_of_name_attr_in_a_tag_in_safe_list_sanitizer
- skip if RUBY_VERSION < "2.3"
+ # https://developer.mozilla.org/en-US/docs/Glossary/Void_element
+ VOID_ELEMENTS = %w[area base br col embed hr img input keygen link meta param source track wbr]
+
+ %w(strong em b i p code pre tt samp kbd var sub
+ sup dfn cite big small address hr br div span h1 h2 h3 h4 h5 h6 ul ol li dl dt dd abbr
+ acronym a img blockquote del ins time).each do |tag_name|
+ define_method "test_default_safelist_should_allow_#{tag_name}" do
+ if VOID_ELEMENTS.include?(tag_name)
+ assert_sanitized("<#{tag_name}>")
+ else
+ assert_sanitized("<#{tag_name}>foo</#{tag_name}>")
+ end
+ end
+ end
- html = %{<a name='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
+ def test_datetime_attribute
+ assert_sanitized("<time datetime=\"2023-01-01\">Today</time>")
+ end
- text = safe_list_sanitize(html)
+ def test_abbr_attribute
+ scope_allowed_tags(%w(table tr th td)) do
+ assert_sanitized(%(<table><tr><td abbr="UK">United Kingdom</td></tr></table>))
+ end
+ end
- acceptable_results = [
- # nokogiri w/vendored+patched libxml2
- %{<a name="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
- # nokogiri w/system libxml2
- %{<a name="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
- ]
- assert_includes(acceptable_results, text)
- end
+ def test_uri_escaping_of_href_attr_in_a_tag_in_safe_list_sanitizer
+ skip if RUBY_VERSION < "2.3"
- def test_uri_escaping_of_name_action_in_a_tag_in_safe_list_sanitizer
- skip if RUBY_VERSION < "2.3"
+ html = %{<a href='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
- html = %{<a action='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
+ text = safe_list_sanitize(html)
- text = safe_list_sanitize(html, attributes: ['action'])
+ acceptable_results = [
+ # nokogiri's vendored+patched libxml2 (0002-Update-entities-to-remove-handling-of-ssi.patch)
+ %{<a href="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
+ # system libxml2
+ %{<a href="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
+ # xerces+neko
+ %{<a href="examp<!--%22 unsafeattr=foo()>-->le.com">test</a>}
+ ]
- acceptable_results = [
- # nokogiri w/vendored+patched libxml2
- %{<a action="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
- # nokogiri w/system libxml2
- %{<a action="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
- ]
- assert_includes(acceptable_results, text)
- end
+ assert_includes(acceptable_results, text)
+ end
- def test_exclude_node_type_processing_instructions
- assert_equal("<div>text</div><b>text</b>", safe_list_sanitize("<div>text</div><?div content><b>text</b>"))
- end
+ def test_uri_escaping_of_src_attr_in_a_tag_in_safe_list_sanitizer
+ skip if RUBY_VERSION < "2.3"
- def test_exclude_node_type_comment
- assert_equal("<div>text</div><b>text</b>", safe_list_sanitize("<div>text</div><!-- comment --><b>text</b>"))
- end
+ html = %{<a src='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
- %w[text/plain text/css image/png image/gif image/jpeg].each do |mediatype|
- define_method "test_mediatype_#{mediatype}_allowed" do
- input = %Q(<img src="data:#{mediatype};base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
- expected = input
- actual = safe_list_sanitize(input)
- assert_equal(expected, actual)
+ text = safe_list_sanitize(html)
- input = %Q(<img src="DATA:#{mediatype};base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
- expected = input
- actual = safe_list_sanitize(input)
- assert_equal(expected, actual)
+ acceptable_results = [
+ # nokogiri's vendored+patched libxml2 (0002-Update-entities-to-remove-handling-of-ssi.patch)
+ %{<a src="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
+ # system libxml2
+ %{<a src="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
+ # xerces+neko
+ %{<a src="examp<!--%22 unsafeattr=foo()>-->le.com">test</a>}
+ ]
+
+ assert_includes(acceptable_results, text)
end
- end
- def test_mediatype_text_html_disallowed
- input = %q(<img src="data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
- expected = %q(<img>)
- actual = safe_list_sanitize(input)
- assert_equal(expected, actual)
+ def test_uri_escaping_of_name_attr_in_a_tag_in_safe_list_sanitizer
+ skip if RUBY_VERSION < "2.3"
- input = %q(<img src="DATA:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
- expected = %q(<img>)
- actual = safe_list_sanitize(input)
- assert_equal(expected, actual)
- end
+ html = %{<a name='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
- def test_mediatype_image_svg_xml_disallowed
- input = %q(<img src="data:image/svg+xml;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
- expected = %q(<img>)
- actual = safe_list_sanitize(input)
- assert_equal(expected, actual)
+ text = safe_list_sanitize(html)
- input = %q(<img src="DATA:image/svg+xml;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
- expected = %q(<img>)
- actual = safe_list_sanitize(input)
- assert_equal(expected, actual)
- end
+ acceptable_results = [
+ # nokogiri's vendored+patched libxml2 (0002-Update-entities-to-remove-handling-of-ssi.patch)
+ %{<a name="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
+ # system libxml2
+ %{<a name="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
+ # xerces+neko
+ %{<a name="examp<!--%22 unsafeattr=foo()>-->le.com">test</a>}
+ ]
- def test_mediatype_other_disallowed
- input = %q(<a href="data:foo;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">foo</a>)
- expected = %q(<a>foo</a>)
- actual = safe_list_sanitize(input)
- assert_equal(expected, actual)
+ assert_includes(acceptable_results, text)
+ end
- input = %q(<a href="DATA:foo;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">foo</a>)
- expected = %q(<a>foo</a>)
- actual = safe_list_sanitize(input)
- assert_equal(expected, actual)
- end
+ def test_uri_escaping_of_name_action_in_a_tag_in_safe_list_sanitizer
+ skip if RUBY_VERSION < "2.3"
+
+ html = %{<a action='examp<!--" unsafeattr=foo()>-->le.com'>test</a>}
+
+ text = safe_list_sanitize(html, attributes: ["action"])
+
+ acceptable_results = [
+ # nokogiri's vendored+patched libxml2 (0002-Update-entities-to-remove-handling-of-ssi.patch)
+ %{<a action="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
+ # system libxml2
+ %{<a action="examp<!--%22%20unsafeattr=foo()>-->le.com">test</a>},
+ # xerces+neko
+ %{<a action="examp<!--%22 unsafeattr=foo()>-->le.com">test</a>},
+ ]
- def test_scrubbing_svg_attr_values_that_allow_ref
- input = %Q(<div fill="yellow url(http://bad.com/) #fff">hey</div>)
- expected = %Q(<div fill="yellow #fff">hey</div>)
- actual = scope_allowed_attributes %w(fill) do
- safe_list_sanitize(input)
+ assert_includes(acceptable_results, text)
end
- assert_equal(expected, actual)
- end
+ def test_exclude_node_type_processing_instructions
+ input = "<div>text</div><?div content><b>text</b>"
+ result = safe_list_sanitize(input)
+ acceptable_results = [
+ # jruby cyberneko (nokogiri < 1.14.0)
+ "<div>text</div>",
+ # everything else
+ "<div>text</div><b>text</b>",
+ ]
+
+ assert_includes(acceptable_results, result)
+ end
- def test_style_with_css_payload
- input, tags = "<style>div > span { background: \"red\"; }</style>", ["style"]
- expected = "<style>div > span { background: \"red\"; }</style>"
- actual = safe_list_sanitize(input, tags: tags)
+ def test_exclude_node_type_comment
+ assert_equal("<div>text</div><b>text</b>", safe_list_sanitize("<div>text</div><!-- comment --><b>text</b>"))
+ end
- assert_equal(expected, actual)
- end
+ %w[text/plain text/css image/png image/gif image/jpeg].each do |mediatype|
+ define_method "test_mediatype_#{mediatype}_allowed" do
+ input = %Q(<img src="data:#{mediatype};base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
+ expected = input
+ actual = safe_list_sanitize(input)
+ assert_equal(expected, actual)
+
+ input = %Q(<img src="DATA:#{mediatype};base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">)
+ expected = input
+ actual = safe_list_sanitize(input)
+ assert_equal(expected, actual)
+ end
+ end
- def test_combination_of_select_and_style_with_css_payload
- input, tags = "<select><style>div > span { background: \"red\"; }</style></select>", ["select", "style"]
- expected = "<select><style>div > span { background: \"red\"; }</style></select>"
- actual = safe_list_sanitize(input, tags: tags)
+ def test_mediatype_text_html_disallowed
+ input = '<img src="data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">'
+ expected = "<img>"
+ actual = safe_list_sanitize(input)
+ assert_equal(expected, actual)
- assert_equal(expected, actual)
- end
+ input = '<img src="DATA:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">'
+ expected = "<img>"
+ actual = safe_list_sanitize(input)
+ assert_equal(expected, actual)
+ end
- def test_combination_of_select_and_style_with_script_payload
- input, tags = "<select><style><script>alert(1)</script></style></select>", ["select", "style"]
- expected = "<select><style><script>alert(1)</script></style></select>"
- actual = safe_list_sanitize(input, tags: tags)
+ def test_mediatype_image_svg_xml_disallowed
+ input = '<img src="data:image/svg+xml;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">'
+ expected = "<img>"
+ actual = safe_list_sanitize(input)
+ assert_equal(expected, actual)
- assert_equal(expected, actual)
- end
+ input = '<img src="DATA:image/svg+xml;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">'
+ expected = "<img>"
+ actual = safe_list_sanitize(input)
+ assert_equal(expected, actual)
+ end
- def test_combination_of_svg_and_style_with_script_payload
- input, tags = "<svg><style><script>alert(1)</script></style></svg>", ["svg", "style"]
- expected = "<svg><style><script>alert(1)</script></style></svg>"
- actual = safe_list_sanitize(input, tags: tags)
+ def test_mediatype_other_disallowed
+ input = '<a href="data:foo;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">foo</a>'
+ expected = "<a>foo</a>"
+ actual = safe_list_sanitize(input)
+ assert_equal(expected, actual)
- assert_equal(expected, actual)
- end
+ input = '<a href="DATA:foo;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=">foo</a>'
+ expected = "<a>foo</a>"
+ actual = safe_list_sanitize(input)
+ assert_equal(expected, actual)
+ end
- def test_combination_of_math_and_style_with_img_payload
- input, tags = "<math><style><img src=x onerror=alert(1)></style></math>", ["math", "style"]
- expected = "<math><style><img src=x onerror=alert(1)></style></math>"
- actual = safe_list_sanitize(input, tags: tags)
+ def test_scrubbing_svg_attr_values_that_allow_ref
+ input = '<div fill="yellow url(http://bad.com/) #fff">hey</div>'
+ expected = '<div fill="yellow #fff">hey</div>'
+ actual = scope_allowed_attributes %w(fill) do
+ safe_list_sanitize(input)
+ end
- assert_equal(expected, actual)
+ assert_equal(expected, actual)
+ end
- input, tags = "<math><style><img src=x onerror=alert(1)></style></math>", ["math", "style", "img"]
- expected = "<math><style><img src=x onerror=alert(1)></style></math>"
- actual = safe_list_sanitize(input, tags: tags)
+ def test_style_with_css_payload
+ input, tags = "<style>div > span { background: \"red\"; }</style>", ["style"]
+ actual = safe_list_sanitize(input, tags: tags)
+ acceptable_results = [
+ # libxml2
+ "<style>div > span { background: \"red\"; }</style>",
+ # libgumbo
+ "<style>div > span { background: \"red\"; }</style>",
+ ]
+
+ assert_includes(acceptable_results, actual)
+ end
- assert_equal(expected, actual)
- end
+ def test_combination_of_select_and_style_with_css_payload
+ input, tags = "<select><style>div > span { background: \"red\"; }</style></select>", ["select", "style"]
+ actual = safe_list_sanitize(input, tags: tags)
+ acceptable_results = [
+ # libxml2
+ "<select><style>div > span { background: \"red\"; }</style></select>",
+ # libgumbo
+ "<select>div > span { background: \"red\"; }</select>",
+ ]
+
+ assert_includes(acceptable_results, actual)
+ end
- def test_combination_of_svg_and_style_with_img_payload
- input, tags = "<svg><style><img src=x onerror=alert(1)></style></svg>", ["svg", "style"]
- expected = "<svg><style><img src=x onerror=alert(1)></style></svg>"
- actual = safe_list_sanitize(input, tags: tags)
+ def test_combination_of_select_and_style_with_script_payload
+ input, tags = "<select><style><script>alert(1)</script></style></select>", ["select", "style"]
+ actual = safe_list_sanitize(input, tags: tags)
+ acceptable_results = [
+ # libxml2
+ "<select><style><script>alert(1)</script></style></select>",
+ # libgumbo
+ "<select>alert(1)</select>",
+ ]
+
+ assert_includes(acceptable_results, actual)
+ end
- assert_equal(expected, actual)
+ def test_combination_of_svg_and_style_with_script_payload
+ input, tags = "<svg><style><script>alert(1)</script></style></svg>", ["svg", "style"]
+ actual = safe_list_sanitize(input, tags: tags)
+ acceptable_results = [
+ # libxml2
+ "<svg><style><script>alert(1)</script></style></svg>",
+ # libgumbo
+ "<svg><style>alert(1)</style></svg>"
+ ]
+
+ assert_includes(acceptable_results, actual)
+ end
- input, tags = "<svg><style><img src=x onerror=alert(1)></style></svg>", ["svg", "style", "img"]
- expected = "<svg><style><img src=x onerror=alert(1)></style></svg>"
- actual = safe_list_sanitize(input, tags: tags)
+ def test_combination_of_math_and_style_with_img_payload
+ input, tags = "<math><style><img src=x onerror=alert(1)></style></math>", ["math", "style"]
+ actual = safe_list_sanitize(input, tags: tags)
+ acceptable_results = [
+ # libxml2
+ "<math><style><img src=x onerror=alert(1)></style></math>",
+ # libgumbo
+ "<math><style></style></math>",
+ ]
+
+ assert_includes(acceptable_results, actual)
+ end
- assert_equal(expected, actual)
- end
+ def test_combination_of_math_and_style_with_img_payload_2
+ input, tags = "<math><style><img src=x onerror=alert(1)></style></math>", ["math", "style", "img"]
+ actual = safe_list_sanitize(input, tags: tags)
+ acceptable_results = [
+ # libxml2
+ "<math><style><img src=x onerror=alert(1)></style></math>",
+ # libgumbo
+ "<math><style></style></math><img src=\"x\">",
+ ]
+
+ assert_includes(acceptable_results, actual)
+ end
-protected
+ def test_combination_of_svg_and_style_with_img_payload
+ input, tags = "<svg><style><img src=x onerror=alert(1)></style></svg>", ["svg", "style"]
+ actual = safe_list_sanitize(input, tags: tags)
+ acceptable_results = [
+ # libxml2
+ "<svg><style><img src=x onerror=alert(1)></style></svg>",
+ # libgumbo
+ "<svg><style></style></svg>",
+ ]
+
+ assert_includes(acceptable_results, actual)
+ end
- def xpath_sanitize(input, options = {})
- XpathRemovalTestSanitizer.new.sanitize(input, options)
- end
+ def test_combination_of_svg_and_style_with_img_payload_2
+ input, tags = "<svg><style><img src=x onerror=alert(1)></style></svg>", ["svg", "style", "img"]
+ actual = safe_list_sanitize(input, tags: tags)
+ acceptable_results = [
+ # libxml2
+ "<svg><style><img src=x onerror=alert(1)></style></svg>",
+ # libgumbo
+ "<svg><style></style></svg><img src=\"x\">",
+ ]
+
+ assert_includes(acceptable_results, actual)
+ end
- def full_sanitize(input, options = {})
- Rails::Html::FullSanitizer.new.sanitize(input, options)
- end
+ def test_should_sanitize_illegal_style_properties
+ raw = %(display:block; position:absolute; left:0; top:0; width:100%; height:100%; z-index:1; background-color:black; background-image:url(http://www.ragingplatypus.com/i/cam-full.jpg); background-x:center; background-y:center; background-repeat:repeat;)
+ expected = %(display:block;width:100%;height:100%;background-color:black;background-x:center;background-y:center;)
+ assert_equal expected, sanitize_css(raw)
+ end
- def link_sanitize(input, options = {})
- Rails::Html::LinkSanitizer.new.sanitize(input, options)
- end
+ def test_should_sanitize_with_trailing_space
+ raw = "display:block; "
+ expected = "display:block;"
+ assert_equal expected, sanitize_css(raw)
+ end
- def safe_list_sanitize(input, options = {})
- Rails::Html::SafeListSanitizer.new.sanitize(input, options)
- end
+ def test_should_sanitize_xul_style_attributes
+ raw = %(-moz-binding:url('http://ha.ckers.org/xssmoz.xml#xss'))
+ assert_equal "", sanitize_css(raw)
+ end
- def assert_sanitized(input, expected = nil)
- if input
- assert_dom_equal expected || input, safe_list_sanitize(input)
- else
- assert_nil safe_list_sanitize(input)
+ def test_should_sanitize_div_background_image_unicode_encoded
+ [
+ convert_to_css_hex("url(javascript:alert(1))", false),
+ convert_to_css_hex("url(javascript:alert(1))", true),
+ convert_to_css_hex("url(https://example.com)", false),
+ convert_to_css_hex("url(https://example.com)", true),
+ ].each do |propval|
+ raw = "background-image:" + propval
+ assert_empty(sanitize_css(raw))
+ end
end
- end
- def sanitize_css(input)
- Rails::Html::SafeListSanitizer.new.sanitize_css(input)
- end
+ def test_should_allow_div_background_image_unicode_encoded_safe_functions
+ [
+ convert_to_css_hex("rgb(255,0,0)", false),
+ convert_to_css_hex("rgb(255,0,0)", true),
+ ].each do |propval|
+ raw = "background-image:" + propval
- def scope_allowed_tags(tags)
- old_tags = Rails::Html::SafeListSanitizer.allowed_tags
- Rails::Html::SafeListSanitizer.allowed_tags = tags
- yield Rails::Html::SafeListSanitizer.new
- ensure
- Rails::Html::SafeListSanitizer.allowed_tags = old_tags
- end
+ assert_includes(sanitize_css(raw), "background-image")
+ end
+ end
- def scope_allowed_attributes(attributes)
- old_attributes = Rails::Html::SafeListSanitizer.allowed_attributes
- Rails::Html::SafeListSanitizer.allowed_attributes = attributes
- yield Rails::Html::SafeListSanitizer.new
- ensure
- Rails::Html::SafeListSanitizer.allowed_attributes = old_attributes
- end
+ def test_should_sanitize_div_style_expression
+ raw = %(width: expression(alert('XSS'));)
+ assert_equal "", sanitize_css(raw)
+ end
- # note that this is used for testing CSS hex encoding: \\[0-9a-f]{1,6}
- def convert_to_css_hex(string, escape_parens=false)
- string.chars.map do |c|
- if !escape_parens && (c == "(" || c == ")")
- c
- else
- format('\00%02X', c.ord)
+ def test_should_sanitize_across_newlines
+ raw = %(\nwidth:\nexpression(alert('XSS'));\n)
+ assert_equal "", sanitize_css(raw)
+ end
+
+ protected
+ def safe_list_sanitize(input, options = {})
+ module_under_test::SafeListSanitizer.new.sanitize(input, options)
+ end
+
+ def assert_sanitized(input, expected = nil)
+ assert_equal((expected || input), safe_list_sanitize(input))
end
- end.join
- end
- def libxml_2_9_14_recovery_lt?
- # changed in 2.9.14, see https://github.com/sparklemotion/nokogiri/releases/tag/v1.13.5
- Nokogiri.method(:uses_libxml?).arity == -1 && Nokogiri.uses_libxml?(">= 2.9.14")
+ def scope_allowed_tags(tags)
+ old_tags = module_under_test::SafeListSanitizer.allowed_tags
+ module_under_test::SafeListSanitizer.allowed_tags = tags
+ yield module_under_test::SafeListSanitizer.new
+ ensure
+ module_under_test::SafeListSanitizer.allowed_tags = old_tags
+ end
+
+ def scope_allowed_attributes(attributes)
+ old_attributes = module_under_test::SafeListSanitizer.allowed_attributes
+ module_under_test::SafeListSanitizer.allowed_attributes = attributes
+ yield module_under_test::SafeListSanitizer.new
+ ensure
+ module_under_test::SafeListSanitizer.allowed_attributes = old_attributes
+ end
+
+ def sanitize_css(input)
+ module_under_test::SafeListSanitizer.new.sanitize_css(input)
+ end
+
+ # note that this is used for testing CSS hex encoding: \\[0-9a-f]{1,6}
+ def convert_to_css_hex(string, escape_parens = false)
+ string.chars.map do |c|
+ if !escape_parens && (c == "(" || c == ")")
+ c
+ else
+ format('\00%02X', c.ord)
+ end
+ end.join
+ end
end
- def libxml_2_9_14_recovery_lt_bang?
- # changed in 2.9.14, see https://github.com/sparklemotion/nokogiri/releases/tag/v1.13.5
- # then reverted in 2.10.0, see https://gitlab.gnome.org/GNOME/libxml2/-/issues/380
- Nokogiri.method(:uses_libxml?).arity == -1 && Nokogiri.uses_libxml?("= 2.9.14")
+ class HTML4SafeListSanitizerTest < Minitest::Test
+ @module_under_test = Rails::HTML4
+ include SafeListSanitizerTest
end
+
+ class HTML5SafeListSanitizerTest < Minitest::Test
+ @module_under_test = Rails::HTML5
+ include SafeListSanitizerTest
+ end if loofah_html5_support?
end
diff --git a/test/scrubbers_test.rb b/test/scrubbers_test.rb
index a825404..8db2d85 100644
--- a/test/scrubbers_test.rb
+++ b/test/scrubbers_test.rb
@@ -1,11 +1,16 @@
+# frozen_string_literal: true
+
require "minitest/autorun"
require "rails-html-sanitizer"
class ScrubberTest < Minitest::Test
protected
+ def scrub_fragment(html)
+ Loofah.scrub_fragment(html, @scrubber).to_s
+ end
def assert_scrubbed(html, expected = html)
- output = Loofah.scrub_fragment(html, @scrubber).to_s
+ output = scrub_fragment(html)
assert_equal expected, output
end
@@ -28,9 +33,8 @@ class ScrubberTest < Minitest::Test
end
class PermitScrubberTest < ScrubberTest
-
def setup
- @scrubber = Rails::Html::PermitScrubber.new
+ @scrubber = Rails::HTML::PermitScrubber.new
end
def test_responds_to_scrub
@@ -38,44 +42,60 @@ class PermitScrubberTest < ScrubberTest
end
def test_default_scrub_behavior
- assert_scrubbed '<tag>hello</tag>', 'hello'
+ assert_scrubbed "<tag>hello</tag>", "hello"
end
def test_default_scrub_removes_comments
- assert_scrubbed('<div>one</div><!-- two --><span>three</span>',
- '<div>one</div><span>three</span>')
+ assert_scrubbed("<div>one</div><!-- two --><span>three</span>",
+ "<div>one</div><span>three</span>")
end
def test_default_scrub_removes_processing_instructions
- assert_scrubbed('<div>one</div><?div two><span>three</span>',
- '<div>one</div><span>three</span>')
+ input = "<div>one</div><?div two><span>three</span>"
+ result = scrub_fragment(input)
+
+ acceptable_results = [
+ # jruby cyberneko (nokogiri < 1.14.0)
+ "<div>one</div>",
+ # everything else
+ "<div>one</div><span>three</span>",
+ ]
+
+ assert_includes(acceptable_results, result)
end
def test_default_attributes_removal_behavior
- assert_scrubbed '<p cooler="hello">hello</p>', '<p>hello</p>'
+ assert_scrubbed '<p cooler="hello">hello</p>', "<p>hello</p>"
end
def test_leaves_supplied_tags
@scrubber.tags = %w(a)
- assert_scrubbed '<a>hello</a>'
+ assert_scrubbed "<a>hello</a>"
end
def test_leaves_only_supplied_tags
- html = '<tag>leave me <span>now</span></tag>'
+ html = "<tag>leave me <span>now</span></tag>"
@scrubber.tags = %w(tag)
- assert_scrubbed html, '<tag>leave me now</tag>'
+ assert_scrubbed html, "<tag>leave me now</tag>"
+ end
+
+ def test_prunes_tags
+ @scrubber = Rails::HTML::PermitScrubber.new(prune: true)
+ @scrubber.tags = %w(tag)
+ html = "<tag>leave me <span>now</span></tag>"
+ assert_scrubbed html, "<tag>leave me </tag>"
end
def test_leaves_comments_when_supplied_as_tag
@scrubber.tags = %w(div comment)
- assert_scrubbed('<div>one</div><!-- two --><span>three</span>',
- '<div>one</div><!-- two -->three')
+ assert_scrubbed("<div>one</div><!-- two --><span>three</span>",
+ "<div>one</div><!-- two -->three")
end
def test_leaves_only_supplied_tags_nested
- html = '<tag>leave <em>me <span>now</span></em></tag>'
+ html = "<tag>leave <em>me <span>now</span></em></tag>"
@scrubber.tags = %w(tag)
- assert_scrubbed html, '<tag>leave me now</tag>'
+ assert_scrubbed html, "<tag>leave me now</tag>"
end
def test_leaves_supplied_attributes
@@ -102,16 +122,16 @@ class PermitScrubberTest < ScrubberTest
end
def test_leaves_text
- assert_scrubbed('some text')
+ assert_scrubbed("some text")
end
def test_skips_text_nodes
- assert_node_skipped('some text')
+ assert_node_skipped("some text")
end
def test_tags_accessor_validation
e = assert_raises(ArgumentError) do
- @scrubber.tags = 'tag'
+ @scrubber.tags = "tag"
end
assert_equal "You should pass :tags as an Enumerable", e.message
@@ -120,7 +140,7 @@ class PermitScrubberTest < ScrubberTest
def test_attributes_accessor_validation
e = assert_raises(ArgumentError) do
- @scrubber.attributes = 'cooler'
+ @scrubber.attributes = "cooler"
end
assert_equal "You should pass :attributes as an Enumerable", e.message
@@ -130,19 +150,19 @@ end
class TargetScrubberTest < ScrubberTest
def setup
- @scrubber = Rails::Html::TargetScrubber.new
+ @scrubber = Rails::HTML::TargetScrubber.new
end
def test_targeting_tags_removes_only_them
@scrubber.tags = %w(a h1)
- html = '<script></script><a></a><h1></h1>'
- assert_scrubbed html, '<script></script>'
+ html = "<script></script><a></a><h1></h1>"
+ assert_scrubbed html, "<script></script>"
end
def test_targeting_tags_removes_only_them_nested
@scrubber.tags = %w(a)
- html = '<tag><a><tag><a></a></tag></a></tag>'
- assert_scrubbed html, '<tag><tag></tag></tag>'
+ html = "<tag><a><tag><a></a></tag></a></tag>"
+ assert_scrubbed html, "<tag><tag></tag></tag>"
end
def test_targeting_attributes_removes_only_them
@@ -157,24 +177,31 @@ class TargetScrubberTest < ScrubberTest
html = '<tag remove="" other=""></tag><a remove="" other=""></a>'
assert_scrubbed html, '<a other=""></a>'
end
+
+ def test_prunes_tags
+ @scrubber = Rails::HTML::TargetScrubber.new(prune: true)
+ @scrubber.tags = %w(span)
+ html = "<tag>leave me <span>now</span></tag>"
+ assert_scrubbed html, "<tag>leave me </tag>"
+ end
end
class TextOnlyScrubberTest < ScrubberTest
def setup
- @scrubber = Rails::Html::TextOnlyScrubber.new
+ @scrubber = Rails::HTML::TextOnlyScrubber.new
end
def test_removes_all_tags_and_keep_the_content
- assert_scrubbed '<tag>hello</tag>', 'hello'
+ assert_scrubbed "<tag>hello</tag>", "hello"
end
def test_skips_text_nodes
- assert_node_skipped('some text')
+ assert_node_skipped("some text")
end
end
class ReturningStopFromScrubNodeTest < ScrubberTest
- class ScrubStopper < Rails::Html::PermitScrubber
+ class ScrubStopper < Rails::HTML::PermitScrubber
def scrub_node(node)
Loofah::Scrubber::STOP
end
@@ -185,6 +212,6 @@ class ReturningStopFromScrubNodeTest < ScrubberTest
end
def test_returns_stop_from_scrub_if_scrub_node_does
- assert_scrub_stopped '<script>remove me</script>'
+ assert_scrub_stopped "<script>remove me</script>"
end
end
Debdiff
[The following lists of changes regard files as different if they have different names, permissions or owners.]
Files in second set of .debs but not in first
-rw-r--r-- root/root /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.6.0/lib/rails-html-sanitizer.rb -rw-r--r-- root/root /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.6.0/lib/rails/html/sanitizer.rb -rw-r--r-- root/root /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.6.0/lib/rails/html/sanitizer/version.rb -rw-r--r-- root/root /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.6.0/lib/rails/html/scrubbers.rb -rw-r--r-- root/root /usr/share/rubygems-integration/all/specifications/rails-html-sanitizer-1.6.0.gemspec
Files in first set of .debs but not in second
-rw-r--r-- root/root /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.4.4/lib/rails-html-sanitizer.rb -rw-r--r-- root/root /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.4.4/lib/rails/html/sanitizer.rb -rw-r--r-- root/root /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.4.4/lib/rails/html/sanitizer/version.rb -rw-r--r-- root/root /usr/share/rubygems-integration/all/gems/rails-html-sanitizer-1.4.4/lib/rails/html/scrubbers.rb -rw-r--r-- root/root /usr/share/rubygems-integration/all/specifications/rails-html-sanitizer-1.4.4.gemspec
Control files: lines which differ (wdiff format)
Depends: ruby-loofah (>= 2.19.1) 2.21), ruby-nokogiri (>= 1.14)