|||

Rake Routes

by Stephen Ball

Let’s Use Hwacha to Scan URLs

I’ve written a gem, Hwacha, as a wrapper around Typhoeus to allow for quick and easy response checking for multiple URLs. Let’s go through some simple use cases!

I started to write a script to go through Ruby Weekly issues and add them to my pinboard. I wanted to do this so I could use pinboard’s awesome fulltext archive to easily find interesting articles weeks later. Fun!

I’ve also really taken Donald’s Dependency Isolation talk to heart. So when I started to build out my simple script I began by isolating the dependencies like the library, Typhoeus, that I’d use to hit the URLs and check their response.

Well one thing led to another and I ended up just turning that first dependency isolation into its own gem: Hwacha! At it’s core Hwacha is just a wrapper around Typhoeus’s hydra method and is just setup to yield the response and URL to the given block. I also added find_existing that only executes the block if the page exists.

So that’s cool, but what are some things we can do with it? How about that importing Ruby Weekly issues project?

Import Ruby Weekly Issues into Pinboard

Bookmarks and RubyWeekly are my dependency isolations for the Pinboard API and for generating the array of potential Ruby Weekly issue URLs. They aren’t perfect, but they got the job done (and were tested)!

require 'pinboard'
require 'delegate'

module PinboardCredentials
  USERNAME = 'xyzzyb'
  PASSWORD = '[redacted]'

  def self.hash
    {
      :username => USERNAME,
      :password => PASSWORD,
    }
  end
end

class Bookmarks < SimpleDelegator
  include PinboardCredentials

  def initialize
    super(Pinboard::Client.new(PinboardCredentials.hash))
  end
end
require 'open-uri'

class RubyWeekly
  def self.base_url
    'http://rubyweekly.com'
  end

  def self.issue_url(number)
    "#{base_url}/issues/#{number}"
  end

  def self.potential_issue_urls
    1.upto(current_issue_number).map do |number|
      issue_url(number)
    end
  end

  def self.current_issue_number
    Integer(landing_page_text.match(/issues\/(\d+)/).to_a.last)
  end

  def self.landing_page_text
    open('http://rubyweekly.com').read
  end
end
require 'hwacha'
require_relative '../lib/bookmarks'
require_relative '../lib/rubyweekly'

bookmarks = Bookmarks.new
hwacha = Hwacha.new

hwacha.check(RubyWeekly.potential_issue_urls) do |url, response|
  next if response.body == 'no such issue'

  number = url.scan(/\d+/).first
  description = "Ruby Weekly #{number}"
  tags = 'petercooper,ruby,rubyweekly'

  bookmarks.add(:url => url, :description => description, :tags => tags)

  p "Added #{url} ^_^"
end

I’d have liked to use the find_existing method, but rubyweekly.com responds with a successful http response and the body no such issue” for any issue url that doesn’t exist. But by design, Hwacha lets us dig into the response so we can make smart decisions.

Check responses for all URLs on a given page

Here’s a fun one, let’s scan all the URLs we can find on a given webpage.

require 'open-uri'

page_text = open('http://rakeroutes.com').read
Hwacha.new.check(page_text.scan(/http:\/\/[^" ;]+\b/)) do |url, response|
  puts "#{url} is invalid!" unless response.success? || response.code == 302
end

Or maybe we want to see all the nice successful URLs.

require 'open-uri'

page_text = open('http://rakeroutes.com').read
Hwacha.new.find_existing(page_text.scan(/http:\/\/[^" ;]+\b/)) do |url|
  puts "#{url} is valid!"
end

Since it’s just a wrapper around Typhoeus it means that Hwacha is actually got a first class response object to work with. We could even dig more into the response body for some content checking or light page scraping.

require 'open-uri'

page_text = open('http://rakeroutes.com').read
Hwacha.new.check(page_text.scan(/http:\/\/[^" ;]+\b/)) do |url, res|
  puts res.body[0..50]
end
Up next Deliberate Git Hello Internet! Here’s my talk “Deliberate Git” in blog post form. There’s also video of my presentation of Deliberate Git at Steel City Ruby 2013. What’s a $PATH anyway? You’ve probably seen $PATH or just PATH before as the place you need to add new commands for your prompt. What is that? Where is it? How does it
Latest posts Where did the recent Elixir posts go? A subtle Go bug that types cannot help with swapcase with the tr command nice go test output See where vim settings came from Containers in the real world and backpressure in distributed systems Elixir Phoenix and “role postgres does not exist” From awk to a Dockerized Ruby Script Finding leap years with the cal command The Problem of State Clojure Functions in Four Ways See Some Clojure A simple language spec isn’t a feature when you’re building applications The Fastest Possible Tests Shrink your data into bitfields (and out again) Every “if” statement is an object waiting to be extracted Choose Generic Tools Hyperlinks you might find interesting — #4 Running bundle install on rails master Use tldr for command line examples Friday Lunch Links — #3 Friday Lunch Links — #2 Logical Solver: Turn facts into conclusions Programming with jq Command line tools - jq Friday Lunch Links — #1 Why diversity matters Music for coding - October 2019 Code puzzles are a poor way to gauge technical candidates Add vim to a pipeline with vipe Connecting Objects with Observable