New Upstream Release - ruby-csv
Ready changes
Summary
Merged new upstream version: 3.2.6 (was: 3.2.2).
Resulting package
Built on 2022-12-30T03:17 (took 4m5s)
The resulting binary packages can be installed (if you have the apt repository enabled) by running one of:
apt install -t fresh-releases ruby-csv
Lintian Result
Diff
diff --git a/NEWS.md b/NEWS.md
index 51eb456..05f2419 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,5 +1,144 @@
# News
+## 3.2.6 - 2022-12-08
+
+### Improvements
+
+ * `CSV#read` consumes the same lines with other methods like
+ `CSV#shift`.
+ [[GitHub#258](https://github.com/ruby/csv/issues/258)]
+ [Reported by Lhoussaine Ghallou]
+
+ * All `Enumerable` based methods consume the same lines with other
+ methods. This may have a performance penalty.
+ [[GitHub#260](https://github.com/ruby/csv/issues/260)]
+ [Reported by Lhoussaine Ghallou]
+
+ * Simplify some implementations.
+ [[GitHub#262](https://github.com/ruby/csv/pull/262)]
+ [[GitHub#263](https://github.com/ruby/csv/pull/263)]
+ [Patch by Mau Magnaguagno]
+
+### Fixes
+
+ * Fixed `CSV.generate_lines` document.
+ [[GitHub#257](https://github.com/ruby/csv/pull/257)]
+ [Patch by Sampat Badhe]
+
+### Thanks
+
+ * Sampat Badhe
+
+ * Lhoussaine Ghallou
+
+ * Mau Magnaguagno
+
+## 3.2.5 - 2022-08-26
+
+### Improvements
+
+ * Added `CSV.generate_lines`.
+ [[GitHub#255](https://github.com/ruby/csv/issues/255)]
+ [Reported by OKURA Masafumi]
+ [[GitHub#256](https://github.com/ruby/csv/pull/256)]
+ [Patch by Eriko Sugiyama]
+
+### Thanks
+
+ * OKURA Masafumi
+
+ * Eriko Sugiyama
+
+## 3.2.4 - 2022-08-22
+
+### Improvements
+
+ * Cleaned up internal implementations.
+ [[GitHub#249](https://github.com/ruby/csv/pull/249)]
+ [[GitHub#250](https://github.com/ruby/csv/pull/250)]
+ [[GitHub#251](https://github.com/ruby/csv/pull/251)]
+ [Patch by Mau Magnaguagno]
+
+ * Added support for RFC 3339 style time.
+ [[GitHub#248](https://github.com/ruby/csv/pull/248)]
+ [Patch by Thierry Lambert]
+
+ * Added support for transcoding String CSV. Syntax is
+ `from-encoding:to-encoding`.
+ [[GitHub#254](https://github.com/ruby/csv/issues/254)]
+ [Reported by Richard Stueven]
+
+ * Added quoted information to `CSV::FieldInfo`.
+ [[GitHub#254](https://github.com/ruby/csv/pull/253)]
+ [Reported by Hirokazu SUZUKI]
+
+### Fixes
+
+ * Fixed a link in documents.
+ [[GitHub#244](https://github.com/ruby/csv/pull/244)]
+ [Patch by Peter Zhu]
+
+### Thanks
+
+ * Peter Zhu
+
+ * Mau Magnaguagno
+
+ * Thierry Lambert
+
+ * Richard Stueven
+
+ * Hirokazu SUZUKI
+
+## 3.2.3 - 2022-04-09
+
+### Improvements
+
+ * Added contents summary to `CSV::Table#inspect`.
+ [GitHub#229][Patch by Eriko Sugiyama]
+ [GitHub#235][Patch by Sampat Badhe]
+
+ * Suppressed `$INPUT_RECORD_SEPARATOR` deprecation warning by
+ `Warning.warn`.
+ [GitHub#233][Reported by Jean byroot Boussier]
+
+ * Improved error message for liberal parsing with quoted values.
+ [GitHub#231][Patch by Nikolay Rys]
+
+ * Fixed typos in documentation.
+ [GitHub#236][Patch by Sampat Badhe]
+
+ * Added `:max_field_size` option and deprecated `:field_size_limit` option.
+ [GitHub#238][Reported by Dan Buettner]
+
+ * Added `:symbol_raw` to built-in header converters.
+ [GitHub#237][Reported by taki]
+ [GitHub#239][Patch by Eriko Sugiyama]
+
+### Fixes
+
+ * Fixed a bug that some texts may be dropped unexpectedly.
+ [Bug #18245][ruby-core:105587][Reported by Hassan Abdul Rehman]
+
+ * Fixed a bug that `:field_size_limit` doesn't work with not complex row.
+ [GitHub#238][Reported by Dan Buettner]
+
+### Thanks
+
+ * Hassan Abdul Rehman
+
+ * Eriko Sugiyama
+
+ * Jean byroot Boussier
+
+ * Nikolay Rys
+
+ * Sampat Badhe
+
+ * Dan Buettner
+
+ * taki
+
## 3.2.2 - 2021-12-24
### Improvements
@@ -15,9 +154,6 @@
* Fixed a bug that all of `ARGF` contents may not be consumed.
[GitHub#228][Reported by Rafael Navaza]
- * Fixed a bug that some texts may be dropped unexpectedly.
- [Bug #18245][ruby-core:105587][Reported by Hassan Abdul Rehman]
-
### Thanks
* adamroyjones
@@ -26,8 +162,6 @@
* Rafael Navaza
- * Hassan Abdul Rehman
-
## 3.2.1 - 2021-10-23
### Improvements
diff --git a/csv.gemspec b/csv.gemspec
index adc621c..46952d1 100644
--- a/csv.gemspec
+++ b/csv.gemspec
@@ -2,20 +2,20 @@
# This file has been automatically generated by gem2tgz #
#########################################################
# -*- encoding: utf-8 -*-
-# stub: csv 3.2.2 ruby lib
+# stub: csv 3.2.6 ruby lib
Gem::Specification.new do |s|
s.name = "csv".freeze
- s.version = "3.2.2"
+ s.version = "3.2.6"
s.required_rubygems_version = Gem::Requirement.new(">= 0".freeze) if s.respond_to? :required_rubygems_version=
s.require_paths = ["lib".freeze]
s.authors = ["James Edward Gray II".freeze, "Kouhei Sutou".freeze]
- s.date = "2021-12-24"
+ s.date = "2022-12-08"
s.description = "The CSV library provides a complete interface to CSV files and data. It offers tools to enable you to read and write to and from Strings or IO objects, as needed.".freeze
s.email = [nil, "kou@cozmixng.org".freeze]
s.extra_rdoc_files = ["LICENSE.txt".freeze, "NEWS.md".freeze, "README.md".freeze, "doc/csv/recipes/filtering.rdoc".freeze, "doc/csv/recipes/generating.rdoc".freeze, "doc/csv/recipes/parsing.rdoc".freeze, "doc/csv/recipes/recipes.rdoc".freeze]
- s.files = ["LICENSE.txt".freeze, "NEWS.md".freeze, "README.md".freeze, "doc/csv/arguments/io.rdoc".freeze, "doc/csv/options/common/col_sep.rdoc".freeze, "doc/csv/options/common/quote_char.rdoc".freeze, "doc/csv/options/common/row_sep.rdoc".freeze, "doc/csv/options/generating/force_quotes.rdoc".freeze, "doc/csv/options/generating/quote_empty.rdoc".freeze, "doc/csv/options/generating/write_converters.rdoc".freeze, "doc/csv/options/generating/write_empty_value.rdoc".freeze, "doc/csv/options/generating/write_headers.rdoc".freeze, "doc/csv/options/generating/write_nil_value.rdoc".freeze, "doc/csv/options/parsing/converters.rdoc".freeze, "doc/csv/options/parsing/empty_value.rdoc".freeze, "doc/csv/options/parsing/field_size_limit.rdoc".freeze, "doc/csv/options/parsing/header_converters.rdoc".freeze, "doc/csv/options/parsing/headers.rdoc".freeze, "doc/csv/options/parsing/liberal_parsing.rdoc".freeze, "doc/csv/options/parsing/nil_value.rdoc".freeze, "doc/csv/options/parsing/return_headers.rdoc".freeze, "doc/csv/options/parsing/skip_blanks.rdoc".freeze, "doc/csv/options/parsing/skip_lines.rdoc".freeze, "doc/csv/options/parsing/strip.rdoc".freeze, "doc/csv/options/parsing/unconverted_fields.rdoc".freeze, "doc/csv/recipes/filtering.rdoc".freeze, "doc/csv/recipes/generating.rdoc".freeze, "doc/csv/recipes/parsing.rdoc".freeze, "doc/csv/recipes/recipes.rdoc".freeze, "lib/csv.rb".freeze, "lib/csv/core_ext/array.rb".freeze, "lib/csv/core_ext/string.rb".freeze, "lib/csv/delete_suffix.rb".freeze, "lib/csv/fields_converter.rb".freeze, "lib/csv/input_record_separator.rb".freeze, "lib/csv/match_p.rb".freeze, "lib/csv/parser.rb".freeze, "lib/csv/row.rb".freeze, "lib/csv/table.rb".freeze, "lib/csv/version.rb".freeze, "lib/csv/writer.rb".freeze]
+ s.files = ["LICENSE.txt".freeze, "NEWS.md".freeze, "README.md".freeze, "doc/csv/arguments/io.rdoc".freeze, "doc/csv/options/common/col_sep.rdoc".freeze, "doc/csv/options/common/quote_char.rdoc".freeze, "doc/csv/options/common/row_sep.rdoc".freeze, "doc/csv/options/generating/force_quotes.rdoc".freeze, "doc/csv/options/generating/quote_empty.rdoc".freeze, "doc/csv/options/generating/write_converters.rdoc".freeze, "doc/csv/options/generating/write_empty_value.rdoc".freeze, "doc/csv/options/generating/write_headers.rdoc".freeze, "doc/csv/options/generating/write_nil_value.rdoc".freeze, "doc/csv/options/parsing/converters.rdoc".freeze, "doc/csv/options/parsing/empty_value.rdoc".freeze, "doc/csv/options/parsing/field_size_limit.rdoc".freeze, "doc/csv/options/parsing/header_converters.rdoc".freeze, "doc/csv/options/parsing/headers.rdoc".freeze, "doc/csv/options/parsing/liberal_parsing.rdoc".freeze, "doc/csv/options/parsing/nil_value.rdoc".freeze, "doc/csv/options/parsing/return_headers.rdoc".freeze, "doc/csv/options/parsing/skip_blanks.rdoc".freeze, "doc/csv/options/parsing/skip_lines.rdoc".freeze, "doc/csv/options/parsing/strip.rdoc".freeze, "doc/csv/options/parsing/unconverted_fields.rdoc".freeze, "doc/csv/recipes/filtering.rdoc".freeze, "doc/csv/recipes/generating.rdoc".freeze, "doc/csv/recipes/parsing.rdoc".freeze, "doc/csv/recipes/recipes.rdoc".freeze, "lib/csv.rb".freeze, "lib/csv/core_ext/array.rb".freeze, "lib/csv/core_ext/string.rb".freeze, "lib/csv/fields_converter.rb".freeze, "lib/csv/input_record_separator.rb".freeze, "lib/csv/parser.rb".freeze, "lib/csv/row.rb".freeze, "lib/csv/table.rb".freeze, "lib/csv/version.rb".freeze, "lib/csv/writer.rb".freeze]
s.homepage = "https://github.com/ruby/csv".freeze
s.licenses = ["Ruby".freeze, "BSD-2-Clause".freeze]
s.rdoc_options = ["--main".freeze, "README.md".freeze]
diff --git a/debian/changelog b/debian/changelog
index c7207a1..913ab2a 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,9 +1,10 @@
-ruby-csv (3.2.2-2) UNRELEASED; urgency=medium
+ruby-csv (3.2.6-1) UNRELEASED; urgency=medium
* Update standards version to 4.6.0, no changes needed.
* Update standards version to 4.6.1, no changes needed.
+ * New upstream release.
- -- Debian Janitor <janitor@jelmer.uk> Thu, 28 Jul 2022 23:29:10 -0000
+ -- Debian Janitor <janitor@jelmer.uk> Fri, 30 Dec 2022 03:13:41 -0000
ruby-csv (3.2.2-1) unstable; urgency=medium
diff --git a/doc/csv/options/generating/write_headers.rdoc b/doc/csv/options/generating/write_headers.rdoc
index f9faa9d..c56aa48 100644
--- a/doc/csv/options/generating/write_headers.rdoc
+++ b/doc/csv/options/generating/write_headers.rdoc
@@ -19,7 +19,7 @@ Without +write_headers+:
With +write_headers+":
CSV.open(file_path,'w',
- :write_headers=> true,
+ :write_headers => true,
:headers => ['Name','Value']
) do |csv|
csv << ['foo', '0']
diff --git a/doc/csv/recipes/generating.rdoc b/doc/csv/recipes/generating.rdoc
index 6984339..9320d53 100644
--- a/doc/csv/recipes/generating.rdoc
+++ b/doc/csv/recipes/generating.rdoc
@@ -148,7 +148,7 @@ This example defines and uses a custom write converter to strip whitespace from
==== Recipe: Specify Multiple Write Converters
-Use option <tt>:write_converters</tt> and multiple custom coverters
+Use option <tt>:write_converters</tt> and multiple custom converters
to convert field values when generating \CSV.
This example defines and uses two custom write converters to strip and upcase generated fields:
diff --git a/doc/csv/recipes/parsing.rdoc b/doc/csv/recipes/parsing.rdoc
index ad8a57c..fc116fc 100644
--- a/doc/csv/recipes/parsing.rdoc
+++ b/doc/csv/recipes/parsing.rdoc
@@ -83,7 +83,7 @@ Use instance method CSV#each with option +headers+ to read a source \String one
CSV.new(string, headers: true).each do |row|
p row
end
-Ouput:
+Output:
#<CSV::Row "Name":"foo" "Value":"0">
#<CSV::Row "Name":"bar" "Value":"1">
#<CSV::Row "Name":"baz" "Value":"2">
diff --git a/lib/csv.rb b/lib/csv.rb
index 2c47ead..ca87d04 100644
--- a/lib/csv.rb
+++ b/lib/csv.rb
@@ -95,14 +95,11 @@ require "stringio"
require_relative "csv/fields_converter"
require_relative "csv/input_record_separator"
-require_relative "csv/match_p"
require_relative "csv/parser"
require_relative "csv/row"
require_relative "csv/table"
require_relative "csv/writer"
-using CSV::MatchP if CSV.const_defined?(:MatchP)
-
# == \CSV
#
# === In a Hurry?
@@ -357,7 +354,9 @@ using CSV::MatchP if CSV.const_defined?(:MatchP)
# - +row_sep+: Specifies the row separator; used to delimit rows.
# - +col_sep+: Specifies the column separator; used to delimit fields.
# - +quote_char+: Specifies the quote character; used to quote fields.
-# - +field_size_limit+: Specifies the maximum field size allowed.
+# - +field_size_limit+: Specifies the maximum field size + 1 allowed.
+# Deprecated since 3.2.3. Use +max_field_size+ instead.
+# - +max_field_size+: Specifies the maximum field size allowed.
# - +converters+: Specifies the field converters to be used.
# - +unconverted_fields+: Specifies whether unconverted fields are to be available.
# - +headers+: Specifies whether data contains headers,
@@ -864,8 +863,9 @@ class CSV
# <b><tt>index</tt></b>:: The zero-based index of the field in its row.
# <b><tt>line</tt></b>:: The line of the data source this row is from.
# <b><tt>header</tt></b>:: The header for the column, when available.
+ # <b><tt>quoted?</tt></b>:: True or false, whether the original value is quoted or not.
#
- FieldInfo = Struct.new(:index, :line, :header)
+ FieldInfo = Struct.new(:index, :line, :header, :quoted?)
# A Regexp used to find and convert some common Date formats.
DateMatcher = / \A(?: (\w+,?\s+)?\w+\s+\d{1,2},?\s+\d{2,4} |
@@ -873,10 +873,9 @@ class CSV
# A Regexp used to find and convert some common DateTime formats.
DateTimeMatcher =
/ \A(?: (\w+,?\s+)?\w+\s+\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2},?\s+\d{2,4} |
- \d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2} |
- # ISO-8601
+ # ISO-8601 and RFC-3339 (space instead of T) recognized by DateTime.parse
\d{4}-\d{2}-\d{2}
- (?:T\d{2}:\d{2}(?::\d{2}(?:\.\d+)?(?:[+-]\d{2}(?::\d{2})|Z)?)?)?
+ (?:[T\s]\d{2}:\d{2}(?::\d{2}(?:\.\d+)?(?:[+-]\d{2}(?::\d{2})|Z)?)?)?
)\z /x
# The encoding used by all converters.
@@ -926,7 +925,8 @@ class CSV
symbol: lambda { |h|
h.encode(ConverterEncoding).downcase.gsub(/[^\s\w]+/, "").strip.
gsub(/\s+/, "_").to_sym
- }
+ },
+ symbol_raw: lambda { |h| h.encode(ConverterEncoding).to_sym }
}
# Default values for method options.
@@ -937,6 +937,7 @@ class CSV
quote_char: '"',
# For parsing.
field_size_limit: nil,
+ max_field_size: nil,
converters: nil,
unconverted_fields: nil,
headers: false,
@@ -1004,7 +1005,7 @@ class CSV
def instance(data = $stdout, **options)
# create a _signature_ for this method call, data object and options
sig = [data.object_id] +
- options.values_at(*DEFAULT_OPTIONS.keys.sort_by { |sym| sym.to_s })
+ options.values_at(*DEFAULT_OPTIONS.keys)
# fetch or create the instance for this signature
@@instances ||= Hash.new
@@ -1201,7 +1202,7 @@ class CSV
# parse options for input, output, or both
in_options, out_options = Hash.new, {row_sep: InputRecordSeparator.value}
options.each do |key, value|
- case key.to_s
+ case key
when /\Ain(?:put)?_(.+)\Z/
in_options[$1.to_sym] = value
when /\Aout(?:put)?_(.+)\Z/
@@ -1464,6 +1465,46 @@ class CSV
(new(str, **options) << row).string
end
+ # :call-seq:
+ # CSV.generate_lines(rows)
+ # CSV.generate_lines(rows, **options)
+ #
+ # Returns the \String created by generating \CSV from
+ # using the specified +options+.
+ #
+ # Argument +rows+ must be an \Array of row. Row is \Array of \String or \CSV::Row.
+ #
+ # Special options:
+ # * Option <tt>:row_sep</tt> defaults to <tt>"\n"</tt> on Ruby 3.0 or later
+ # and <tt>$INPUT_RECORD_SEPARATOR</tt> (<tt>$/</tt>) otherwise.:
+ # $INPUT_RECORD_SEPARATOR # => "\n"
+ # * This method accepts an additional option, <tt>:encoding</tt>, which sets the base
+ # Encoding for the output. This method will try to guess your Encoding from
+ # the first non-+nil+ field in +row+, if possible, but you may need to use
+ # this parameter as a backup plan.
+ #
+ # For other +options+,
+ # see {Options for Generating}[#class-CSV-label-Options+for+Generating].
+ #
+ # ---
+ #
+ # Returns the \String generated from an
+ # CSV.generate_lines([['foo', '0'], ['bar', '1'], ['baz', '2']]) # => "foo,0\nbar,1\nbaz,2\n"
+ #
+ # ---
+ #
+ # Raises an exception
+ # # Raises NoMethodError (undefined method `each' for :foo:Symbol)
+ # CSV.generate_lines(:foo)
+ #
+ def generate_lines(rows, **options)
+ self.generate(**options) do |csv|
+ rows.each do |row|
+ csv << row
+ end
+ end
+ end
+
#
# :call-seq:
# open(file_path, mode = "rb", **options ) -> new_csv
@@ -1865,6 +1906,7 @@ class CSV
row_sep: :auto,
quote_char: '"',
field_size_limit: nil,
+ max_field_size: nil,
converters: nil,
unconverted_fields: nil,
headers: false,
@@ -1888,8 +1930,19 @@ class CSV
raise ArgumentError.new("Cannot parse nil as CSV") if data.nil?
if data.is_a?(String)
+ if encoding
+ if encoding.is_a?(String)
+ data_external_encoding, data_internal_encoding = encoding.split(":", 2)
+ if data_internal_encoding
+ data = data.encode(data_internal_encoding, data_external_encoding)
+ else
+ data = data.dup.force_encoding(data_external_encoding)
+ end
+ else
+ data = data.dup.force_encoding(encoding)
+ end
+ end
@io = StringIO.new(data)
- @io.set_encoding(encoding || data.encoding)
else
@io = data
end
@@ -1907,11 +1960,14 @@ class CSV
@initial_header_converters = header_converters
@initial_write_converters = write_converters
+ if max_field_size.nil? and field_size_limit
+ max_field_size = field_size_limit - 1
+ end
@parser_options = {
column_separator: col_sep,
row_separator: row_sep,
quote_character: quote_char,
- field_size_limit: field_size_limit,
+ max_field_size: max_field_size,
unconverted_fields: unconverted_fields,
headers: headers,
return_headers: return_headers,
@@ -1979,10 +2035,24 @@ class CSV
# Returns the limit for field size; used for parsing;
# see {Option +field_size_limit+}[#class-CSV-label-Option+field_size_limit]:
# CSV.new('').field_size_limit # => nil
+ #
+ # Deprecated since 3.2.3. Use +max_field_size+ instead.
def field_size_limit
parser.field_size_limit
end
+ # :call-seq:
+ # csv.max_field_size -> integer or nil
+ #
+ # Returns the limit for field size; used for parsing;
+ # see {Option +max_field_size+}[#class-CSV-label-Option+max_field_size]:
+ # CSV.new('').max_field_size # => nil
+ #
+ # Since 3.2.3.
+ def max_field_size
+ parser.max_field_size
+ end
+
# :call-seq:
# csv.skip_lines -> regexp or nil
#
@@ -2481,7 +2551,13 @@ class CSV
# p row
# end
def each(&block)
- parser_enumerator.each(&block)
+ return to_enum(__method__) unless block_given?
+ begin
+ while true
+ yield(parser_enumerator.next)
+ end
+ rescue StopIteration
+ end
end
# :call-seq:
diff --git a/lib/csv/delete_suffix.rb b/lib/csv/delete_suffix.rb
deleted file mode 100644
index d457718..0000000
--- a/lib/csv/delete_suffix.rb
+++ /dev/null
@@ -1,18 +0,0 @@
-# frozen_string_literal: true
-
-# This provides String#delete_suffix? for Ruby 2.4.
-unless String.method_defined?(:delete_suffix)
- class CSV
- module DeleteSuffix
- refine String do
- def delete_suffix(suffix)
- if end_with?(suffix)
- self[0...-suffix.size]
- else
- self
- end
- end
- end
- end
- end
-end
diff --git a/lib/csv/fields_converter.rb b/lib/csv/fields_converter.rb
index b206118..d15977d 100644
--- a/lib/csv/fields_converter.rb
+++ b/lib/csv/fields_converter.rb
@@ -44,7 +44,7 @@ class CSV
@converters.empty?
end
- def convert(fields, headers, lineno)
+ def convert(fields, headers, lineno, quoted_fields)
return fields unless need_convert?
fields.collect.with_index do |field, index|
@@ -63,7 +63,8 @@ class CSV
else
header = nil
end
- field = converter[field, FieldInfo.new(index, lineno, header)]
+ quoted = quoted_fields[index]
+ field = converter[field, FieldInfo.new(index, lineno, header, quoted)]
end
break unless field.is_a?(String) # short-circuit pipeline for speed
end
diff --git a/lib/csv/input_record_separator.rb b/lib/csv/input_record_separator.rb
index bbf1347..7a99343 100644
--- a/lib/csv/input_record_separator.rb
+++ b/lib/csv/input_record_separator.rb
@@ -4,20 +4,7 @@ require "stringio"
class CSV
module InputRecordSeparator
class << self
- is_input_record_separator_deprecated = false
- verbose, $VERBOSE = $VERBOSE, true
- stderr, $stderr = $stderr, StringIO.new
- input_record_separator = $INPUT_RECORD_SEPARATOR
- begin
- $INPUT_RECORD_SEPARATOR = "\r\n"
- is_input_record_separator_deprecated = (not $stderr.string.empty?)
- ensure
- $INPUT_RECORD_SEPARATOR = input_record_separator
- $stderr = stderr
- $VERBOSE = verbose
- end
-
- if is_input_record_separator_deprecated
+ if RUBY_VERSION >= "3.0.0"
def value
"\n"
end
diff --git a/lib/csv/match_p.rb b/lib/csv/match_p.rb
deleted file mode 100644
index 775559a..0000000
--- a/lib/csv/match_p.rb
+++ /dev/null
@@ -1,20 +0,0 @@
-# frozen_string_literal: true
-
-# This provides String#match? and Regexp#match? for Ruby 2.3.
-unless String.method_defined?(:match?)
- class CSV
- module MatchP
- refine String do
- def match?(pattern)
- self =~ pattern
- end
- end
-
- refine Regexp do
- def match?(string)
- self =~ string
- end
- end
- end
- end
-end
diff --git a/lib/csv/parser.rb b/lib/csv/parser.rb
index 7e943ac..afb3131 100644
--- a/lib/csv/parser.rb
+++ b/lib/csv/parser.rb
@@ -2,15 +2,10 @@
require "strscan"
-require_relative "delete_suffix"
require_relative "input_record_separator"
-require_relative "match_p"
require_relative "row"
require_relative "table"
-using CSV::DeleteSuffix if CSV.const_defined?(:DeleteSuffix)
-using CSV::MatchP if CSV.const_defined?(:MatchP)
-
class CSV
# Note: Don't use this class directly. This is an internal class.
class Parser
@@ -27,6 +22,10 @@ class CSV
class InvalidEncoding < StandardError
end
+ # Raised when unexpected case is happen.
+ class UnexpectedError < StandardError
+ end
+
#
# CSV::Scanner receives a CSV output, scans it and return the content.
# It also controls the life cycle of the object with its methods +keep_start+,
@@ -78,10 +77,10 @@ class CSV
# +keep_end+, +keep_back+, +keep_drop+.
#
# CSV::InputsScanner.scan() tries to match with pattern at the current position.
- # If there's a match, the scanner advances the “scan pointer” and returns the matched string.
+ # If there's a match, the scanner advances the "scan pointer" and returns the matched string.
# Otherwise, the scanner returns nil.
#
- # CSV::InputsScanner.rest() returns the “rest” of the string (i.e. everything after the scan pointer).
+ # CSV::InputsScanner.rest() returns the "rest" of the string (i.e. everything after the scan pointer).
# If there is no more data (eos? = true), it returns "".
#
class InputsScanner
@@ -96,11 +95,13 @@ class CSV
end
def each_line(row_separator)
+ return enum_for(__method__, row_separator) unless block_given?
buffer = nil
input = @scanner.rest
position = @scanner.pos
offset = 0
n_row_separator_chars = row_separator.size
+ # trace(__method__, :start, line, input)
while true
input.each_line(row_separator) do |line|
@scanner.pos += line.bytesize
@@ -140,25 +141,28 @@ class CSV
end
def scan(pattern)
+ # trace(__method__, pattern, :start)
value = @scanner.scan(pattern)
+ # trace(__method__, pattern, :done, :last, value) if @last_scanner
return value if @last_scanner
- if value
- read_chunk if @scanner.eos?
- return value
- else
- nil
- end
+ read_chunk if value and @scanner.eos?
+ # trace(__method__, pattern, :done, value)
+ value
end
def scan_all(pattern)
+ # trace(__method__, pattern, :start)
value = @scanner.scan(pattern)
+ # trace(__method__, pattern, :done, :last, value) if @last_scanner
return value if @last_scanner
return nil if value.nil?
while @scanner.eos? and read_chunk and (sub_value = @scanner.scan(pattern))
+ # trace(__method__, pattern, :sub, sub_value)
value << sub_value
end
+ # trace(__method__, pattern, :done, value)
value
end
@@ -167,68 +171,126 @@ class CSV
end
def keep_start
- @keeps.push([@scanner.pos, nil])
+ # trace(__method__, :start)
+ adjust_last_keep
+ @keeps.push([@scanner, @scanner.pos, nil])
+ # trace(__method__, :done)
end
def keep_end
- start, buffer = @keeps.pop
- keep = @scanner.string.byteslice(start, @scanner.pos - start)
+ # trace(__method__, :start)
+ scanner, start, buffer = @keeps.pop
+ if scanner == @scanner
+ keep = @scanner.string.byteslice(start, @scanner.pos - start)
+ else
+ keep = @scanner.string.byteslice(0, @scanner.pos)
+ end
if buffer
buffer << keep
keep = buffer
end
+ # trace(__method__, :done, keep)
keep
end
def keep_back
- start, buffer = @keeps.pop
+ # trace(__method__, :start)
+ scanner, start, buffer = @keeps.pop
if buffer
+ # trace(__method__, :rescan, start, buffer)
string = @scanner.string
- keep = string.byteslice(start, string.bytesize - start)
+ if scanner == @scanner
+ keep = string.byteslice(start, string.bytesize - start)
+ else
+ keep = string
+ end
if keep and not keep.empty?
@inputs.unshift(StringIO.new(keep))
@last_scanner = false
end
@scanner = StringScanner.new(buffer)
else
+ if @scanner != scanner
+ message = "scanners are different but no buffer: "
+ message += "#{@scanner.inspect}(#{@scanner.object_id}): "
+ message += "#{scanner.inspect}(#{scanner.object_id})"
+ raise UnexpectedError, message
+ end
+ # trace(__method__, :repos, start, buffer)
@scanner.pos = start
end
read_chunk if @scanner.eos?
end
def keep_drop
- @keeps.pop
+ _, _, buffer = @keeps.pop
+ # trace(__method__, :done, :empty) unless buffer
+ return unless buffer
+
+ last_keep = @keeps.last
+ # trace(__method__, :done, :no_last_keep) unless last_keep
+ return unless last_keep
+
+ if last_keep[2]
+ last_keep[2] << buffer
+ else
+ last_keep[2] = buffer
+ end
+ # trace(__method__, :done)
end
def rest
@scanner.rest
end
+ def check(pattern)
+ @scanner.check(pattern)
+ end
+
private
- def read_chunk
- return false if @last_scanner
+ def trace(*args)
+ pp([*args, @scanner, @scanner&.string, @scanner&.pos, @keeps])
+ end
- unless @keeps.empty?
- keep = @keeps.last
- keep_start = keep[0]
- string = @scanner.string
- keep_data = string.byteslice(keep_start, @scanner.pos - keep_start)
- if keep_data
- keep_buffer = keep[1]
- if keep_buffer
- keep_buffer << keep_data
- else
- keep[1] = keep_data.dup
- end
+ def adjust_last_keep
+ # trace(__method__, :start)
+
+ keep = @keeps.last
+ # trace(__method__, :done, :empty) if keep.nil?
+ return if keep.nil?
+
+ scanner, start, buffer = keep
+ string = @scanner.string
+ if @scanner != scanner
+ start = 0
+ end
+ if start == 0 and @scanner.eos?
+ keep_data = string
+ else
+ keep_data = string.byteslice(start, @scanner.pos - start)
+ end
+ if keep_data
+ if buffer
+ buffer << keep_data
+ else
+ keep[2] = keep_data.dup
end
- keep[0] = 0
end
+ # trace(__method__, :done)
+ end
+
+ def read_chunk
+ return false if @last_scanner
+
+ adjust_last_keep
+
input = @inputs.first
case input
when StringIO
string = input.read
raise InvalidEncoding unless string.valid_encoding?
+ # trace(__method__, :stringio, string)
@scanner = StringScanner.new(string)
@inputs.shift
@last_scanner = @inputs.empty?
@@ -237,6 +299,7 @@ class CSV
chunk = input.gets(@row_separator, @chunk_size)
if chunk
raise InvalidEncoding unless chunk.valid_encoding?
+ # trace(__method__, :chunk, chunk)
@scanner = StringScanner.new(chunk)
if input.respond_to?(:eof?) and input.eof?
@inputs.shift
@@ -244,6 +307,7 @@ class CSV
end
true
else
+ # trace(__method__, :no_chunk)
@scanner = StringScanner.new("".encode(@encoding))
@inputs.shift
@last_scanner = @inputs.empty?
@@ -278,7 +342,11 @@ class CSV
end
def field_size_limit
- @field_size_limit
+ @max_field_size&.succ
+ end
+
+ def max_field_size
+ @max_field_size
end
def skip_lines
@@ -346,6 +414,16 @@ class CSV
end
message = "Invalid byte sequence in #{@encoding}"
raise MalformedCSVError.new(message, lineno)
+ rescue UnexpectedError => error
+ if @scanner
+ ignore_broken_line
+ lineno = @lineno
+ else
+ lineno = @lineno + 1
+ end
+ message = "This should not be happen: #{error.message}: "
+ message += "Please report this to https://github.com/ruby/csv/issues"
+ raise MalformedCSVError.new(message, lineno)
end
end
@@ -390,7 +468,7 @@ class CSV
@backslash_quote = false
end
@unconverted_fields = @options[:unconverted_fields]
- @field_size_limit = @options[:field_size_limit]
+ @max_field_size = @options[:max_field_size]
@skip_blanks = @options[:skip_blanks]
@fields_converter = @options[:fields_converter]
@header_fields_converter = @options[:header_fields_converter]
@@ -680,9 +758,10 @@ class CSV
case headers
when Array
@raw_headers = headers
+ quoted_fields = [false] * @raw_headers.size
@use_headers = true
when String
- @raw_headers = parse_headers(headers)
+ @raw_headers, quoted_fields = parse_headers(headers)
@use_headers = true
when nil, false
@raw_headers = nil
@@ -692,21 +771,28 @@ class CSV
@use_headers = true
end
if @raw_headers
- @headers = adjust_headers(@raw_headers)
+ @headers = adjust_headers(@raw_headers, quoted_fields)
else
@headers = nil
end
end
def parse_headers(row)
- CSV.parse_line(row,
- col_sep: @column_separator,
- row_sep: @row_separator,
- quote_char: @quote_character)
+ quoted_fields = []
+ converter = lambda do |field, info|
+ quoted_fields << info.quoted?
+ field
+ end
+ headers = CSV.parse_line(row,
+ col_sep: @column_separator,
+ row_sep: @row_separator,
+ quote_char: @quote_character,
+ converters: [converter])
+ [headers, quoted_fields]
end
- def adjust_headers(headers)
- adjusted_headers = @header_fields_converter.convert(headers, nil, @lineno)
+ def adjust_headers(headers, quoted_fields)
+ adjusted_headers = @header_fields_converter.convert(headers, nil, @lineno, quoted_fields)
adjusted_headers.each {|h| h.freeze if h.is_a? String}
adjusted_headers
end
@@ -729,28 +815,28 @@ class CSV
sample[0, 128].index(@quote_character)
end
- SCANNER_TEST = (ENV["CSV_PARSER_SCANNER_TEST"] == "yes")
- if SCANNER_TEST
- class UnoptimizedStringIO
- def initialize(string)
- @io = StringIO.new(string, "rb:#{string.encoding}")
- end
+ class UnoptimizedStringIO # :nodoc:
+ def initialize(string)
+ @io = StringIO.new(string, "rb:#{string.encoding}")
+ end
- def gets(*args)
- @io.gets(*args)
- end
+ def gets(*args)
+ @io.gets(*args)
+ end
- def each_line(*args, &block)
- @io.each_line(*args, &block)
- end
+ def each_line(*args, &block)
+ @io.each_line(*args, &block)
+ end
- def eof?
- @io.eof?
- end
+ def eof?
+ @io.eof?
end
+ end
- SCANNER_TEST_CHUNK_SIZE =
- Integer((ENV["CSV_PARSER_SCANNER_TEST_CHUNK_SIZE"] || "1"), 10)
+ SCANNER_TEST = (ENV["CSV_PARSER_SCANNER_TEST"] == "yes")
+ if SCANNER_TEST
+ SCANNER_TEST_CHUNK_SIZE_NAME = "CSV_PARSER_SCANNER_TEST_CHUNK_SIZE"
+ SCANNER_TEST_CHUNK_SIZE_VALUE = ENV[SCANNER_TEST_CHUNK_SIZE_NAME]
def build_scanner
inputs = @samples.collect do |sample|
UnoptimizedStringIO.new(sample)
@@ -760,10 +846,17 @@ class CSV
else
inputs << @input
end
+ begin
+ chunk_size_value = ENV[SCANNER_TEST_CHUNK_SIZE_NAME]
+ rescue # Ractor::IsolationError
+ # Ractor on Ruby 3.0 can't read ENV value.
+ chunk_size_value = SCANNER_TEST_CHUNK_SIZE_VALUE
+ end
+ chunk_size = Integer((chunk_size_value || "1"), 10)
InputsScanner.new(inputs,
@encoding,
@row_separator,
- chunk_size: SCANNER_TEST_CHUNK_SIZE)
+ chunk_size: chunk_size)
end
else
def build_scanner
@@ -826,6 +919,14 @@ class CSV
end
end
+ def validate_field_size(field)
+ return unless @max_field_size
+ return if field.size <= @max_field_size
+ ignore_broken_line
+ message = "Field size exceeded: #{field.size} > #{@max_field_size}"
+ raise MalformedCSVError.new(message, @lineno)
+ end
+
def parse_no_quote(&block)
@scanner.each_line(@row_separator) do |line|
next if @skip_lines and skip_line?(line)
@@ -835,9 +936,16 @@ class CSV
if line.empty?
next if @skip_blanks
row = []
+ quoted_fields = []
else
line = strip_value(line)
row = line.split(@split_column_separator, -1)
+ quoted_fields = [false] * row.size
+ if @max_field_size
+ row.each do |column|
+ validate_field_size(column)
+ end
+ end
n_columns = row.size
i = 0
while i < n_columns
@@ -846,7 +954,7 @@ class CSV
end
end
@last_line = original_line
- emit_row(row, &block)
+ emit_row(row, quoted_fields, &block)
end
end
@@ -868,31 +976,37 @@ class CSV
next
end
row = []
+ quoted_fields = []
elsif line.include?(@cr) or line.include?(@lf)
@scanner.keep_back
@need_robust_parsing = true
return parse_quotable_robust(&block)
else
row = line.split(@split_column_separator, -1)
+ quoted_fields = []
n_columns = row.size
i = 0
while i < n_columns
column = row[i]
if column.empty?
+ quoted_fields << false
row[i] = nil
else
n_quotes = column.count(@quote_character)
if n_quotes.zero?
+ quoted_fields << false
# no quote
elsif n_quotes == 2 and
column.start_with?(@quote_character) and
column.end_with?(@quote_character)
+ quoted_fields << true
row[i] = column[1..-2]
else
@scanner.keep_back
@need_robust_parsing = true
return parse_quotable_robust(&block)
end
+ validate_field_size(row[i])
end
i += 1
end
@@ -900,13 +1014,14 @@ class CSV
@scanner.keep_drop
@scanner.keep_start
@last_line = original_line
- emit_row(row, &block)
+ emit_row(row, quoted_fields, &block)
end
@scanner.keep_drop
end
def parse_quotable_robust(&block)
row = []
+ quoted_fields = []
skip_needless_lines
start_row
while true
@@ -916,32 +1031,39 @@ class CSV
value = parse_column_value
if value
@scanner.scan_all(@strip_value) if @strip_value
- if @field_size_limit and value.size >= @field_size_limit
- ignore_broken_line
- raise MalformedCSVError.new("Field size exceeded", @lineno)
- end
+ validate_field_size(value)
end
if parse_column_end
row << value
+ quoted_fields << @quoted_column_value
elsif parse_row_end
if row.empty? and value.nil?
- emit_row([], &block) unless @skip_blanks
+ emit_row([], [], &block) unless @skip_blanks
else
row << value
- emit_row(row, &block)
+ quoted_fields << @quoted_column_value
+ emit_row(row, quoted_fields, &block)
row = []
+ quoted_fields = []
end
skip_needless_lines
start_row
elsif @scanner.eos?
break if row.empty? and value.nil?
row << value
- emit_row(row, &block)
+ quoted_fields << @quoted_column_value
+ emit_row(row, quoted_fields, &block)
break
else
if @quoted_column_value
+ if liberal_parsing? and (new_line = @scanner.check(@line_end))
+ message =
+ "Illegal end-of-line sequence outside of a quoted field " +
+ "<#{new_line.inspect}>"
+ else
+ message = "Any value after quoted field isn't allowed"
+ end
ignore_broken_line
- message = "Any value after quoted field isn't allowed"
raise MalformedCSVError.new(message, @lineno)
elsif @unquoted_column_value and
(new_line = @scanner.scan(@line_end))
@@ -1034,7 +1156,7 @@ class CSV
if (n_quotes % 2).zero?
quotes[0, (n_quotes - 2) / 2]
else
- value = quotes[0, (n_quotes - 1) / 2]
+ value = quotes[0, n_quotes / 2]
while true
quoted_value = @scanner.scan_all(@quoted_value)
value << quoted_value if quoted_value
@@ -1058,11 +1180,9 @@ class CSV
n_quotes = quotes.size
if n_quotes == 1
break
- elsif (n_quotes % 2) == 1
- value << quotes[0, (n_quotes - 1) / 2]
- break
else
value << quotes[0, n_quotes / 2]
+ break if (n_quotes % 2) == 1
end
end
value
@@ -1098,18 +1218,15 @@ class CSV
def strip_value(value)
return value unless @strip
- return nil if value.nil?
+ return value if value.nil?
case @strip
when String
- size = value.size
- while value.start_with?(@strip)
- size -= 1
- value = value[1, size]
+ while value.delete_prefix!(@strip)
+ # do nothing
end
- while value.end_with?(@strip)
- size -= 1
- value = value[0, size]
+ while value.delete_suffix!(@strip)
+ # do nothing
end
else
value.strip!
@@ -1132,22 +1249,22 @@ class CSV
@scanner.keep_start
end
- def emit_row(row, &block)
+ def emit_row(row, quoted_fields, &block)
@lineno += 1
raw_row = row
if @use_headers
if @headers.nil?
- @headers = adjust_headers(row)
+ @headers = adjust_headers(row, quoted_fields)
return unless @return_headers
row = Row.new(@headers, row, true)
else
row = Row.new(@headers,
- @fields_converter.convert(raw_row, @headers, @lineno))
+ @fields_converter.convert(raw_row, @headers, @lineno, quoted_fields))
end
else
# convert fields, if needed...
- row = @fields_converter.convert(raw_row, nil, @lineno)
+ row = @fields_converter.convert(raw_row, nil, @lineno, quoted_fields)
end
# inject unconverted fields and accessor, if requested...
diff --git a/lib/csv/row.rb b/lib/csv/row.rb
index 62e429f..500adb1 100644
--- a/lib/csv/row.rb
+++ b/lib/csv/row.rb
@@ -703,7 +703,7 @@ class CSV
# by +index_or_header+ and +specifiers+.
#
# The nested objects may be instances of various classes.
- # See {Dig Methods}[https://docs.ruby-lang.org/en/master/doc/dig_methods_rdoc.html].
+ # See {Dig Methods}[https://docs.ruby-lang.org/en/master/dig_methods_rdoc.html].
#
# Examples:
# source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
diff --git a/lib/csv/table.rb b/lib/csv/table.rb
index c5daf1a..fb19f54 100644
--- a/lib/csv/table.rb
+++ b/lib/csv/table.rb
@@ -890,9 +890,8 @@ class CSV
if @mode == :row or @mode == :col_or_row # by index
@table.delete_if(&block)
else # by header
- deleted = []
headers.each do |header|
- deleted << delete(header) if yield([header, self[header]])
+ delete(header) if yield([header, self[header]])
end
end
@@ -999,9 +998,15 @@ class CSV
# Omits the headers if option +write_headers+ is given as +false+
# (see {Option +write_headers+}[../CSV.html#class-CSV-label-Option+write_headers]):
# table.to_csv(write_headers: false) # => "foo,0\nbar,1\nbaz,2\n"
- def to_csv(write_headers: true, **options)
+ #
+ # Limit rows if option +limit+ is given like +2+:
+ # table.to_csv(limit: 2) # => "Name,Value\nfoo,0\nbar,1\n"
+ def to_csv(write_headers: true, limit: nil, **options)
array = write_headers ? [headers.to_csv(**options)] : []
- @table.each do |row|
+ limit ||= @table.size
+ limit = @table.size + 1 + limit if limit < 0
+ limit = 0 if limit < 0
+ @table.first(limit).each do |row|
array.push(row.fields.to_csv(**options)) unless row.header_row?
end
@@ -1038,9 +1043,13 @@ class CSV
# Example:
# source = "Name,Value\nfoo,0\nbar,1\nbaz,2\n"
# table = CSV.parse(source, headers: true)
- # table.inspect # => "#<CSV::Table mode:col_or_row row_count:4>"
+ # table.inspect # => "#<CSV::Table mode:col_or_row row_count:4>\nName,Value\nfoo,0\nbar,1\nbaz,2\n"
+ #
def inspect
- "#<#{self.class} mode:#{@mode} row_count:#{to_a.size}>".encode("US-ASCII")
+ inspected = +"#<#{self.class} mode:#{@mode} row_count:#{to_a.size}>"
+ summary = to_csv(limit: 5)
+ inspected << "\n" << summary if summary.encoding.ascii_compatible?
+ inspected
end
end
end
diff --git a/lib/csv/version.rb b/lib/csv/version.rb
index d1d0dc0..e05d63d 100644
--- a/lib/csv/version.rb
+++ b/lib/csv/version.rb
@@ -2,5 +2,5 @@
class CSV
# The version of the installed library.
- VERSION = "3.2.2"
+ VERSION = "3.2.6"
end
diff --git a/lib/csv/writer.rb b/lib/csv/writer.rb
index 4a9a35c..030a295 100644
--- a/lib/csv/writer.rb
+++ b/lib/csv/writer.rb
@@ -1,11 +1,8 @@
# frozen_string_literal: true
require_relative "input_record_separator"
-require_relative "match_p"
require_relative "row"
-using CSV::MatchP if CSV.const_defined?(:MatchP)
-
class CSV
# Note: Don't use this class directly. This is an internal class.
class Writer
@@ -42,7 +39,10 @@ class CSV
@headers ||= row if @use_headers
@lineno += 1
- row = @fields_converter.convert(row, nil, lineno) if @fields_converter
+ if @fields_converter
+ quoted_fields = [false] * row.size
+ row = @fields_converter.convert(row, nil, lineno, quoted_fields)
+ end
i = -1
converted_row = row.collect do |field|
@@ -97,7 +97,7 @@ class CSV
return unless @headers
converter = @options[:header_fields_converter]
- @headers = converter.convert(@headers, nil, 0)
+ @headers = converter.convert(@headers, nil, 0, [])
@headers.each do |header|
header.freeze if header.is_a?(String)
end