Rubyチュートリアル 〜英文小説の最頻出ワードを見つけよう!(その13)
Version17
次にVersion07で示したような
最長ワードトップ30を出力するメソッド
top_by_lengthも定義しましょう
class WordDictionary def top_by_length(nth, &blk) list = take_by_key(nth, lambda { |key| -key.length }, &blk) list.map { |word, freq| [word, freq, word.length] } end private def take_by_value(nth, sort_opt) @freq_dic.select { |key, val| block_given? ? yield(val) : val }.take_by(nth) { |key, val| sort_opt[val] } end def take_by_key(nth, sort_opt) @freq_dic.select { |key, val| block_given? ? yield(val) : val }.take_by(nth) { |key, val| sort_opt[key] } end end wdic = WordDictionary.new(ARGF) p wdic.top_by_length(30) { |val| val > 100 }
ここでは将来に備えて
take_by_valueと同じようにtake_by_keyを定義して
top_by_lengthはこれを使うようにします
top_by_lengthはその語と出現数に加えて
語長を返すようにしています
Arrayクラスのmapメソッドをここでは使っています
mapメソッドはinjectメソッド同様とても便利なメソッドです
配列の各要素の内容をブロックの処理結果で置き換えます
上の例は list.map { |item| item << item[0].length }
でもいいです
出力はこんな感じです
#> [["illustration", 160, 12], ["therefore", 127, 9], ["catherine", 126, 9], ["jerusalem", 120, 9], ["gutenberg", 285, 9], ["elizabeth", 636, 9], ["prophecy", 322, 8], ["together", 105, 8], ["anything", 117, 8], ["pleasure", 103, 8], ["judgment", 134, 8], ["believe", 110, 7], ["collins", 180, 7], ["between", 114, 7], ["wickham", 194, 7], ["bingley", 306, 7], ["replied", 136, 7], ["history", 189, 7], ["himself", 178, 7], ["against", 164, 7], ["because", 116, 7], ["however", 179, 7], ["through", 185, 7], ["nothing", 235, 7], ["sabbath", 215, 7], ["herself", 312, 7], ["another", 144, 7], ["project", 262, 7], ["without", 263, 7], ["thought", 215, 7]]
Version18
またも問題発生!
DRY違反です!
def take_by_value(nth, sort_opt) @freq_dic.select { |key, val| block_given? ? yield(val) : val }.take_by(nth) { |key, val| sort_opt[val] } end def take_by_key(nth, sort_opt) @freq_dic.select { |key, val| block_given? ? yield(val) : val }.take_by(nth) { |key, val| sort_opt[key] } end
take_by_key_or_valメソッドを定義して
これを回避します
def take_by_value(nth, sort_opt, &blk) val = lambda { |key, val| val } take_by_key_or_val(nth, sort_opt, val, &blk) end def take_by_key(nth, sort_opt, &blk) key = lambda { |key, val| key } take_by_key_or_val(nth, sort_opt, key, &blk) end def take_by_key_or_val(nth, sort_opt, by) @freq_dic.select { |key, val| block_given? ? yield(val) : val }.take_by(nth) { |key, val| sort_opt[by[key, val]] } end
ふぅ
Version19
さて次は何ですか?
そうですね...
せっかくクラスを作ったのに
コマンド引数しか取れないっていうのは寂しいです
では次はWordDictionaryクラスが
ファイル名か文字列を直接受け取れるようにしましょう
そのためにinput_to_stringメソッドを定義し
initializeメソッドで入力を適切に変換するようにします
class WordDictionary def initialize(input) input = input_to_string(input) @words = input.downcase.scan(/[a-z]+/) @freq_dic = @words.inject(Hash.new(0)) { |dic, word| dic[word] += 1 ; dic } end private def input_to_string(input) case input when String begin File.open(input, "r") { |f| return f.read } rescue puts "Argument has assumed as a text string" input end when ARGF.class input.read else raise "Wrong argument. ARGF, file or string are acceptable." end end end wdic1 = WordDictionary.new(ARGF) wdic2 = WordDictionary.new('11.txt') wdic3 = WordDictionary.new(<<-EOS) It was all very well to say 'Drink me,' but the wise little Alice was not going to do THAT in a hurry. 'No, I'll look first,' she said, 'and see whether it's marked "poison" or not'; for she had read several nice little histories about children who had got burnt, and eaten up by wild beasts and other unpleasant things, all because they WOULD not remember the simple rules their friends had taught them: such as, that a red-hot poker will burn you if you hold it too long; and that if you cut your finger VERY deeply with a knife, it usually bleeds; and she had never forgotten that, if you drink much from a bottle marked 'poison,' it is almost certain to disagree with you, sooner or later. EOS p wdic1.top_by_frequency(10) p wdic2.top_by_frequency(10) p wdic3.top_by_frequency(10) #> [["the", 4507], ["to", 4243], ["of", 3728], ["and", 3658], ["her", 2225], ["i", 2069], ["a", 2012], ["in", 1936], ["was", 1848], ["she", 1710]] [["the", 1818], ["and", 940], ["to", 809], ["a", 690], ["of", 631], ["it", 610], ["she", 553], ["i", 545], ["you", 481], ["said", 462]] [["it", 5], ["you", 5], ["and", 5], ["that", 4], ["had", 4], ["a", 4], ["if", 3], ["she", 3], ["to", 3], ["not", 3]]
input_to_stringにおいて
case式を使って入力の種類を切り分けました
when Stringでは最初ファイル名として処理できるか試み
できない場合は文字列として処理できるようにしました
うまくいっているようです
WordDictionary.new(<<-EOS)...は
ヒアドキュメントという記法を使っています
任意記号EOSで挟まれた行が
文字列として解釈されます
(次回に続く)