New Upstream Snapshot - ruby-threach

Ready changes

Summary

Merged new upstream version: 0.2.0+git20220104.1.cc3c5ed (was: 0.2.0).

Resulting package

Built on 2022-11-20T12:52 (took 5m59s)

The resulting binary packages can be installed (if you have the apt repository enabled) by running one of:

apt install -t fresh-snapshots ruby-threach

Lintian Result

Diff

diff --git a/.gitignore b/.gitignore
deleted file mode 100644
index c1e0daf..0000000
--- a/.gitignore
+++ /dev/null
@@ -1,21 +0,0 @@
-## MAC OS
-.DS_Store
-
-## TEXTMATE
-*.tmproj
-tmtags
-
-## EMACS
-*~
-\#*
-.\#*
-
-## VIM
-*.swp
-
-## PROJECT::GENERAL
-coverage
-rdoc
-pkg
-
-## PROJECT::SPECIFIC
diff --git a/README.markdown b/README.markdown
index 02eb76b..defe8aa 100644
--- a/README.markdown
+++ b/README.markdown
@@ -1,14 +1,12 @@
 # threach
 
-`threach` adds to the Enumerable module to provide a threaded
-version of whatever enumerator you throw at it (`each` by default).
+-----
 
-## Warning: Deadlocks under JRuby if an exception is thrown
+**Deprecated and archived**. Never really worked reliably. Archived in case I ever want to revisit this space.
 
-`threach` works fine, so long as nothing goes wrong. In particular, there's no safe way (that I can find; see below) to break out of a `threach` loop without a deadlock under JRuby. This is, shall we say, an Issue. 
+-----
 
-Under vanilla ruby, `threach` will exit as expected, but who the hell wants to 
-use `threach` where there are no real threads???
+`threach` monkeypatches the Enumerable module with a new method `#threach` that provides a threaded version of `#each` (or whatever enumerator you throw at it). It's a very simple producer-consumer model. 
 
 ## Installation
 
@@ -19,59 +17,49 @@ use `threach` where there are no real threads???
 
 ## Use
 
-    # You like #each? You'll love...err.."probably like" #threach
     require 'rubygems'
     require 'threach'
     
     # Process with 2 threads. It assumes you want 'each'
     # as your iterator.
     (1..10).threach(2) {|i| puts i.to_s}  
+    
+    # If you want to watch it work...
+    (1..50).threach(2) do |i|
+      puts "Thread #{Thread.current[:tnum]}: #{i}"
+    end
 
-    # You can also specify the iterator
+    # You can also specify the iterator as the second argument
     File.open('mybigfile') do |f|
-      f.threach(2, :each_line) do |line|
+      f.threach(3, :each_line) do |line|
         processLine(line)
       end
     end
 
     # threach does not care what the arity of your block is
-    # as long as it matches the iterator you ask for
+    # as long as it matches the iterator specifed
 
     ('A'..'Z').threach(3, :each_with_index) do |letter, index|
       puts "#{index}: #{letter}"
     end
 
-    # Or with a hash
+    # Same thing with a hash, where the default #each actually returns two values
     h = {'a' => 1, 'b'=>2, 'c'=>3}
     h.threach(2) do |letter, i|
       puts "#{i}: #{letter}"
     end
 
-## Major problem
-
-I can't figure out how to exit gracefully from a threach loop. 
-
-  begin
-    ('a'..'z').threach(2, :each_with_index) do |letter, i|
-      break if i > 10  # will deadlock under jruby; fine under ruby
-      # raise StandardError if i > 10 # deadlock under jruby; find under ruby
-      puts letter
-    end
-  rescue 
-    puts "Rescued; broke out of the loop"
-  end
+## Things you need to know
 
-The `break` under jruby prints "Exception in thread "Thread-1" org.jruby.exceptions.JumpException$BreakJump," but if there's a way to catch that in the enclosing code I sure don't know how. 
+* The number you provide to `threach` is the number of *consumer* threads. It's assumed that the time to iterate once on the producer is much less than the work done by a consumer, so you need multiple consumers to keep up.
+* `threach` doesn't magically make your code thread-safe. That's still up to you.
+* Using `break` under JRuby works as expected but writes a log line to STDERR. This is something internal to JRuby and I don't know how to stop it.
+* Throwing exceptions as `raise "oops'` under JRuby is so slow that if you have more than one consumer, the time between the `raise` and the time you exit the `threach` loop is long enough that a *lot* of work will still get done. You need to use use the three-argument form `raise WhateverError, value, nil`. [The last `nil` tells JRuby to not bother making a full stack trace](http://jira.codehaus.org/browse/JRUBY-5534) and reduces the penalty, but you shouldn't use `raise` for flow control; use `catch` (or, if you can, just regular old `break`).
 
-Use of `catch` and `throw` seemed like an obvious choice, but they don't work across threads. Then I thought I'd use `catch` within the consumers and throw or raise an error at the producer, but that doesn't work, either. 
-
-I'm clearly up against (or well beyond) my knowledge limitations, here.
-
-If anyone has a solution to what should be a simple problem (and works under both ruby and jruby) boy, would I be grateful.
 
 ## Why and when to use it?
 
-Well, if you're using stock (MRI) ruby -- you probably shouldn't bother with `threach`. It'll just slow things down. But if you're using a ruby implementation that has real threads, like JRuby, this will give you relatively painless multi-threading.
+Well, if you're using stock (MRI) ruby -- you probably shouldn't bother with `threach` unless you're doing IO-intensive stuff. It'll just slow things down. But if you're using a ruby implementation that has real threads, like JRuby, this will give you relatively painless multi-threading.
 
 You can always do something like:
 
@@ -83,8 +71,15 @@ You can always do something like:
 
     my_enumerable.threach(numthreads) {|i| ...}
 
+...since `threach(0)` is exactly the same as `each`
+
 Note the "relatively" in front of "painless" up there. The block you pass still has to be thread-safe, and there are many data structures you'll encounter that are *not* thread-safe. Scalars, arrays, and hashes are, though, under JRuby, and that'll get you pretty far.
 
+## Change Notes
+
+* 0.3 Successfully deal with `break` without deadlocks by using another SizedQueue as, basically, a thread-safe counter of how many threads have finished.
+* 0.2 Undo attempts to deal with non-local exit
+* 0.1 first release
 
 
 ## Note on Patches/Pull Requests
@@ -99,4 +94,4 @@ Note the "relatively" in front of "painless" up there. The block you pass still
 
 ## Copyright
 
-Copyright (c) 2010 Bill Dueber. See LICENSE for details.
+Copyright (c) 2010-2011 Bill Dueber. See LICENSE for details.
diff --git a/VERSION b/VERSION
index 0ea3a94..0d91a54 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-0.2.0
+0.3.0
diff --git a/debian/changelog b/debian/changelog
index 9bd5737..6e1b596 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,4 +1,4 @@
-ruby-threach (0.2.0-3) UNRELEASED; urgency=medium
+ruby-threach (0.2.0+git20220104.1.cc3c5ed-1) UNRELEASED; urgency=medium
 
   [ Utkarsh Gupta ]
   * Add salsa-ci.yml
@@ -17,8 +17,9 @@ ruby-threach (0.2.0-3) UNRELEASED; urgency=medium
   * Bump debhelper from old 12 to 13.
   * Update standards version to 4.5.1, no changes needed.
   * Update standards version to 4.6.1, no changes needed.
+  * New upstream snapshot.
 
- -- Utkarsh Gupta <guptautkarsh2102@gmail.com>  Tue, 13 Aug 2019 07:59:02 +0530
+ -- Utkarsh Gupta <guptautkarsh2102@gmail.com>  Sun, 20 Nov 2022 12:49:16 -0000
 
 ruby-threach (0.2.0-2) unstable; urgency=medium
 
diff --git a/lib/threach.rb b/lib/threach.rb
index 809a9a0..3575381 100644
--- a/lib/threach.rb
+++ b/lib/threach.rb
@@ -1,35 +1,138 @@
 require 'thread'
+
+# Get a unique error class. We need this because
+# consumer threads may exit in a couple ways; first by
+# running out of input from the producer, and second by
+# having a 'break' thrown.
+class ThreachDone < LocalJumpError; end
+
+# Monkey-patch Enumerable to allow threach (threaded-each)
+
 module Enumerable
-  
+  # Provide an each-like iterator that uses the main thread to produce
+  # work (generally via the underlying #each, but can use any method with
+  # the optional arguments, e.g., each_with_index or each_line) and a
+  # set of consumer threads to do the work specified in the passed block.
+  #
+  # Under the hood, threach populates a thread-safe queue with work from a
+  # producer thread and uses the specified number of consumer threads to do the work.
+  #
+  # Note that threach is designed to be thread-safe internally, but the code you pass
+  # in via the block also has to be thread-safe (e.g., if your database isn't
+  # thread-safe, you can't be using it willy-nilly with three threads at a time). threach
+  # is syntactic sugar only; it doens't magically make things thread-safe.
+  #
+  # NOTE: threach is just sugar over normal Thread operations. You can set thread-local variables
+  # in the normal way -- by calling Thread.current[:var] = 'value'. If for some reason you want it, 
+  # the thread number is stored in Thread.current[:tnum]. 
+  #
+  # @param [Integer] threads The number of consumer threads to spin up. A value of zero indicates that work should just be done serially
+  # @param [Symbol] iterator The already-existing iterator to use to create work for the consumers. The output of the iterator is fed to the passed block, so if your chosen iterator produces two values (e.g.,  each_with_index) your block should, too (see below) 
+  # @param [Block] &blk The block representing the conumer's work.
+  #
+  #
+  # @example Use two threads to check URLs
+  #   urls.threach(2) {|url|
+  #     see_if_url_is_there(url)
+  #   }
+  #
+  # @example Process lines of a file using three threads
+  #   File.open('myfile') do |f|
+  #     f.threach(3, :each_line) do |line|
+  #       process_line(line)
+  #     end
+  #   end
+  #
+  # @example Process items in a hash (to show two-valued items for consumption)
+  #   myBigHash.threach(2, :each_with_index) do |k,v|
+  #    puts "The value of #{k} is #{v}"
+  #   end
+
   def threach(threads=0, iterator=:each, &blk)
+    
+    # If 0 is passed, just treat it like any sequential call. 
+    # Hence arr.threach(0) is exactly the same as arr.each
     if threads == 0
       self.send(iterator) do |*args|
         blk.call *args
       end
     else
-      bq = SizedQueue.new(threads * 2)
+      # Hang onto the main thread so we can bail out of it if need be
+      producer_thread = Thread.current
+      
+      # Create two SizedQueues (which are guaranteed thread-safe)
+      
+      # bq is where we put the work from the producer; make it quite a bit bigger
+      # than the number of threads so they don't spend too much time waiting on 
+      # the producer.
+      bq = SizedQueue.new(threads * 3)
+      
+      # doneq is, essentially, a thread-safe counter. MRI doesn't have thread-safe
+      # integer operations on variables, so I'm just using this because I'm lazy.
+      # We know when doneq.size == number_of_threads that the producer should be 
+      # bailing if it isn't already done. This can happen when there is a 
+      # "break" statement in the passed block.
+      doneq = SizedQueue.new(threads)
+      
+      # Build up the consumers.
       consumers = []
       threads.times do |i|
         consumers << Thread.new(i) do |i|
-          until (a = bq.pop) === :end_of_data
-            blk.call(*a)
+          begin
+            # Internal variable for debugging.
+            Thread.current[:tnum] = i
+            
+            # Check to see if the popped value is the magical symbol
+            # :end_of_data. If it is, stop, because the producer has 
+            # run out of work. Otherwise, make the call.
+            until (a = bq.pop) === :end_of_data
+              blk.call(*a)
+            end
+          ensure
+            # If we get to this ensure block, it means there was a non-normal
+            # exit from the block via break. If that's the case, we push another
+            # entry into the doneq.
+            doneq << :threach_all_done
+            
+            # When the size of doneq == the number of threads, that means all
+            # of the threads are done and we need to manually break out of the
+            # producer thread by raising an error
+            if doneq.size == threads
+              producer_thread.raise(ThreachDone.new, :all_threads_done, nil)
+            end            
+            
           end
+            
         end          
       end
     
       # The producer
-      count = 0
-      self.send(iterator) do |*x|
-        bq.push x
-        count += 1
-      end
-      # Now end it
-      threads.times do 
-        bq << :end_of_data
+      begin
+        count = 0
+        self.send(iterator) do |*x|
+          bq.push x
+          count += 1
+        end
+        # Here we've run out of stuff, so we need to signal to the 
+        # threads that it's time to die. Next time they pop a value
+        # off the queue, it'll be :end_of_data and they'll stop.
+        #
+        # Make sure we push one for each thread!
+        threads.times do 
+          bq << :end_of_data
+        end
+        
+        
+        # That's the end of the producer proper. Now we just join all the
+        # consumer threads and we're set.
+        consumers.each {|t| t.join}
+        
+      rescue ThreachDone => e
+        # Do nothing; if we get here, it's because all the consumer threads
+        # bailed via "break" for some reason.
       end
-      # Do the join
-      consumers.each {|t| t.join}
     end
+  ensure
   end
     
 end
diff --git a/metadata.yml b/metadata.yml
deleted file mode 100644
index fb111f0..0000000
--- a/metadata.yml
+++ /dev/null
@@ -1,121 +0,0 @@
---- !ruby/object:Gem::Specification 
-name: threach
-version: !ruby/object:Gem::Version 
-  hash: 23
-  prerelease: false
-  segments: 
-  - 0
-  - 2
-  - 0
-  version: 0.2.0
-platform: ruby
-authors: 
-- Bill Dueber
-autorequire: 
-bindir: bin
-cert_chain: []
-
-date: 2010-08-10 00:00:00 -04:00
-default_executable: 
-dependencies: 
-- !ruby/object:Gem::Dependency 
-  name: thoughtbot-shoulda
-  prerelease: false
-  requirement: &id001 !ruby/object:Gem::Requirement 
-    none: false
-    requirements: 
-    - - ">="
-      - !ruby/object:Gem::Version 
-        hash: 3
-        segments: 
-        - 0
-        version: "0"
-  type: :development
-  version_requirements: *id001
-- !ruby/object:Gem::Dependency 
-  name: yard
-  prerelease: false
-  requirement: &id002 !ruby/object:Gem::Requirement 
-    none: false
-    requirements: 
-    - - ">="
-      - !ruby/object:Gem::Version 
-        hash: 3
-        segments: 
-        - 0
-        version: "0"
-  type: :development
-  version_requirements: *id002
-- !ruby/object:Gem::Dependency 
-  name: cucumber
-  prerelease: false
-  requirement: &id003 !ruby/object:Gem::Requirement 
-    none: false
-    requirements: 
-    - - ">="
-      - !ruby/object:Gem::Version 
-        hash: 3
-        segments: 
-        - 0
-        version: "0"
-  type: :development
-  version_requirements: *id003
-description: An addition to the Enumerable module that allows easy use of threaded each and each-like iterators
-email: bill@dueber.com
-executables: []
-
-extensions: []
-
-extra_rdoc_files: 
-- LICENSE
-- README.markdown
-files: 
-- .document
-- .gitignore
-- LICENSE
-- README.markdown
-- Rakefile
-- VERSION
-- features/step_definitions/threach_steps.rb
-- features/support/env.rb
-- features/threach.feature
-- lib/threach.rb
-- test/helper.rb
-- test/test_threach.rb
-has_rdoc: true
-homepage: http://github.com/billdueber/threach
-licenses: []
-
-post_install_message: 
-rdoc_options: 
-- --charset=UTF-8
-require_paths: 
-- lib
-required_ruby_version: !ruby/object:Gem::Requirement 
-  none: false
-  requirements: 
-  - - ">="
-    - !ruby/object:Gem::Version 
-      hash: 3
-      segments: 
-      - 0
-      version: "0"
-required_rubygems_version: !ruby/object:Gem::Requirement 
-  none: false
-  requirements: 
-  - - ">="
-    - !ruby/object:Gem::Version 
-      hash: 3
-      segments: 
-      - 0
-      version: "0"
-requirements: []
-
-rubyforge_project: 
-rubygems_version: 1.3.7
-signing_key: 
-specification_version: 3
-summary: Threaded each
-test_files: 
-- test/helper.rb
-- test/test_threach.rb

Debdiff

[The following lists of changes regard files as different if they have different names, permissions or owners.]

Files in first set of .debs but not in second

-rw-r--r--  root/root   /usr/share/rubygems-integration/all/specifications/threach-0.2.0.gemspec

Control files: lines which differ (wdiff format)

  • Ruby-Versions: all

More details

Full run details