Rubyチュートリアル 〜英文小説の最頻出ワードを見つけよう!(その13)

Version17

次にVersion07で示したような
最長ワードトップ30を出力するメソッド
top_by_lengthも定義しましょう

 class WordDictionary
   def top_by_length(nth, &blk)
     list = take_by_key(nth, lambda { |key| -key.length }, &blk)
     list.map { |word, freq| [word, freq, word.length] }
   end
 
   private
   def take_by_value(nth, sort_opt)
     @freq_dic.select { |key, val| block_given? ? yield(val) : val }.take_by(nth) { |key, val| sort_opt[val] }
   end
 
   def take_by_key(nth, sort_opt)
     @freq_dic.select { |key, val| block_given? ? yield(val) : val }.take_by(nth) { |key, val| sort_opt[key] }
   end
 end

 wdic = WordDictionary.new(ARGF)
 p wdic.top_by_length(30) { |val| val > 100 }


ここでは将来に備えて
take_by_valueと同じようにtake_by_keyを定義して
top_by_lengthはこれを使うようにします


top_by_lengthはその語と出現数に加えて
語長を返すようにしています
Arrayクラスのmapメソッドをここでは使っています
mapメソッドはinjectメソッド同様とても便利なメソッドです
配列の各要素の内容をブロックの処理結果で置き換えます
上の例は list.map { |item| item << item[0].length }
でもいいです


出力はこんな感じです

#> [["illustration", 160, 12], ["therefore", 127, 9], ["catherine", 126, 9], ["jerusalem", 120, 9], ["gutenberg", 285, 9], ["elizabeth", 636, 9], ["prophecy", 322, 8], ["together", 105, 8], ["anything", 117, 8], ["pleasure", 103, 8], ["judgment", 134, 8], ["believe", 110, 7], ["collins", 180, 7], ["between", 114, 7], ["wickham", 194, 7], ["bingley", 306, 7], ["replied", 136, 7], ["history", 189, 7], ["himself", 178, 7], ["against", 164, 7], ["because", 116, 7], ["however", 179, 7], ["through", 185, 7], ["nothing", 235, 7], ["sabbath", 215, 7], ["herself", 312, 7], ["another", 144, 7], ["project", 262, 7], ["without", 263, 7], ["thought", 215, 7]]

Version18

またも問題発生!
DRY違反です!

   def take_by_value(nth, sort_opt)
     @freq_dic.select { |key, val| block_given? ? yield(val) : val }.take_by(nth) { |key, val| sort_opt[val] }
   end
 
   def take_by_key(nth, sort_opt)
     @freq_dic.select { |key, val| block_given? ? yield(val) : val }.take_by(nth) { |key, val| sort_opt[key] }
   end


take_by_key_or_valメソッドを定義して
これを回避します

   def take_by_value(nth, sort_opt, &blk)
     val = lambda { |key, val| val }
     take_by_key_or_val(nth, sort_opt, val, &blk)
   end
 
   def take_by_key(nth, sort_opt, &blk)
     key = lambda { |key, val| key }
     take_by_key_or_val(nth, sort_opt, key, &blk)
   end
 
   def take_by_key_or_val(nth, sort_opt, by)
     @freq_dic.select { |key, val| block_given? ? yield(val) : val }.take_by(nth) { |key, val| sort_opt[by[key, val]] }
   end

ふぅ

Version19

さて次は何ですか?
そうですね...


せっかくクラスを作ったのに
コマンド引数しか取れないっていうのは寂しいです
では次はWordDictionaryクラスが
ファイル名か文字列を直接受け取れるようにしましょう


そのためにinput_to_stringメソッドを定義し
initializeメソッドで入力を適切に変換するようにします

 class WordDictionary
   def initialize(input)
     input = input_to_string(input)
     @words = input.downcase.scan(/[a-z]+/)
     @freq_dic = @words.inject(Hash.new(0)) { |dic, word| dic[word] += 1 ; dic }
   end
   
   private
   def input_to_string(input)
    case input
    when String
      begin
        File.open(input, "r") { |f| return f.read }
      rescue
        puts "Argument has assumed as a text string" 
        input
      end
    when ARGF.class
      input.read
    else
      raise "Wrong argument. ARGF, file or string are acceptable."
    end
   end
 end

 wdic1 = WordDictionary.new(ARGF)
 wdic2 = WordDictionary.new('11.txt')
 wdic3 = WordDictionary.new(<<-EOS)
 It was all very well to say 'Drink me,' but the wise little Alice was not going to do THAT in a hurry. 'No, I'll look first,' she said, 'and see whether it's marked "poison" or not'; for she had read several nice little histories about children who had got burnt, and eaten up by wild beasts and other unpleasant things, all because they WOULD not remember
 the simple rules their friends had taught them: such as, that a red-hot poker will burn you if you hold it too long; and that if you cut your finger VERY deeply with a knife, it usually bleeds; and she had never forgotten that, if you drink much from a bottle marked 'poison,' it is almost certain to disagree with you, sooner or later.
EOS
 p wdic1.top_by_frequency(10)
 p wdic2.top_by_frequency(10)
 p wdic3.top_by_frequency(10)
 
 #> [["the", 4507], ["to", 4243], ["of", 3728], ["and", 3658], ["her", 2225], ["i", 2069], ["a", 2012], ["in", 1936], ["was", 1848], ["she", 1710]]
 [["the", 1818], ["and", 940], ["to", 809], ["a", 690], ["of", 631], ["it", 610], ["she", 553], ["i", 545], ["you", 481], ["said", 462]]
 [["it", 5], ["you", 5], ["and", 5], ["that", 4], ["had", 4], ["a", 4], ["if", 3], ["she", 3], ["to", 3], ["not", 3]]


input_to_stringにおいて
case式を使って入力の種類を切り分けました
when Stringでは最初ファイル名として処理できるか試み
できない場合は文字列として処理できるようにしました
うまくいっているようです


WordDictionary.new(<<-EOS)...は
ヒアドキュメントという記法を使っています
任意記号EOSで挟まれた行が
文字列として解釈されます


(次回に続く)