My (Test) First Ruby Program
One of my reasons for starting this blog was to find out more about web application frameworks based on dynamic languages in general, and about Ruby on Rails in particular. The only problem being, I’d never actually written any Ruby before.
Now, back when I started out as a programmer I never took a huge interest in learning computer languages — I just figured out what existing code was doing then fiddled around with it until it seemed to do what I wanted. Some of the time I got away with it.
These days I’m more interested in computer languages, but I still think that reading and tweaking existing code is a good way to learn. Ruby, being a dynamic, interpreted language, is perfect for such experimentation. The Ruby on Rails framework turns out to be equally dynamic; by running the development environment, I could see my code changes instantly reflected in my Typo application. Even better, the exact same code that I tested at home on my Windows machine could be deployed on my live shared UNIX server. Best of all, I soon discovered the test framework for the module I needed to alter. By developing the tests and code in parallel, I deployed my first ever Ruby code with reasonable confidence that it worked.
The Requirement
I wanted to be able to post code snippets to this blog, and I wanted
the code to be nicely syntax-highlighted. Digging through the
Typo admin pages revealed that this was already supported for
Ruby (of course!), XML and YAML. Futhermore, the
syntax highlighting scheme was open to extension, which was good,
since I intended to highlight Python and C++ snippets — and possibly others too.
All you had to do was extend Syntax::Tokenize
, implementing the #step
method.
A few minutes of googling didn’t turn up any existing solutions to this particular problem, so I decided to have a crack at it myself.
Emacs ruby mode
Before I could even contemplate working with Ruby code, I needed to get my editor to recognise it. This was straightforward.
Locating the code to change
Grepping the Typo code for syntax
yielded several hits:
config/environment.rb # Adds vendor/syntax/lib to the load path components/plugins/textfilters/code_controller.rb # Does the syntax highlighting vendor/syntax # The syntax module itself
Fiddling around with the code
So, the first thing I did was start hacking at code_controller.rb
, adding a
new class and registering it, just like this:
class PythonTokenizer < Syntax::Tokenizer def step if digits = scan(/\d+/) start_group :digits, digits elsif words = scan(/\w+/) start_group :words, words else start_group :normal, scan(/./) end end end Syntax::SYNTAX['python'] = PythonTokenizer
This being my first ever attempt at Ruby code, I didn’t even write it myself: I simply cut-and-pasted it direct from the Ruby syntax highlight manual. As you can see, I made no attempt to implement a real Python tokenizer — I just wanted to see if I could get any syntax highlighter working. Sure enough, when I started up my Typo development environment and posted a code snippet
<code lang="python"> abc 123
then examined the resultant HTML (CTRL-U in Firefox). It read:
<div class="typocode"><pre> <code class="typocode_python "> <span class="words">abc</span> <span class="digits">123</span>
Perfect!
Portability
Incidentally, my home development environment is on the Windows platform; my live blog runs on a shared server running FreeBSD. Identical Typo code runs on both — the only difference being that I use WEBrick as my development webserver and lighttpd on the live blog.
Hot updates
Wouldn’t it be nice if you could edit code_controller.rb
, hit F5
in the web browser and see your changes take immediate effect? I gave it a go,
switching words
for worms
for a bit of fun.
class PythonTokenizer < Syntax::Tokenizer .... start_group :worms, words end
Sure enough, the updated HTML page read:
<span class="worms">abc</span>
which is how things should be. I was pleased to see that the
syntax highlight module created the new CSS class "worms"
without
complaining. I didn’t even have to enter the string literal "worms"
anywhere in the code — some sort of reflection must have figured out how
to process the :worms
symbol correctly.
Overenthusiasm
Enthused by this early success, I tried editing my PythonTokenizer
class to do what it was really meant to do: namely, identify comments,
strings, keywords. Typo reported back the inevitable syntax errors
through the web interface in a friendly enough way, but I soon
realised that this was not the correct way to develop code.
What I really ought to be doing was developing my new PythonTokenizer
class in isolation, then integrating it into the Rails
application.
Running the Syntax Unit Tests
So, I went looking in the vendor/syntax
directory.
+---api | +---classes | | \---Syntax | | \---Convertors | \---files | \---lib | \---syntax | +---convertors | \---lang +---doc | +---manual | | +---parts | | \---stylesheets | \---manual-html | \---stylesheets +---lib | \---syntax | +---convertors | \---lang \---test \---syntax
I found the Ruby, XML and YAML tokenizers in lib/lang/ruby.rb
,
lib/lang/xml.rb
and lib/lang/yaml.rb
respectively. I found accompanying
unit tests in test/syntax/tc_ruby.rb
, test/syntax/tc_xml.rb
and test/syntax/tc_yaml.rb
. Running the test/ALL-TESTS.rb
gave:
c:\thomas\typo\vendor\syntax\test>ALL-TESTS.rb ALL-TESTS.rb Loaded suite c:/thomas/typo/vendor/syntax/test/ALL-TESTS Started ............................................................ Finished in 0.359 seconds. 122 tests, 761 assertions, 0 failures, 0 errors
My new strategy was clear: develop lib/lang/python.rb
and
test/syntax/tc_python.rb
in parallel until my new syntax
highlighter passed all the tests — then integrate my new
Python highlighter into Typo. I reverted my changes
to code_controller.rb
and started again.
Adding a testcase
So, I created tc_python.rb
, using tc_ruby.rb
as an example.
Here’s what the my first test looked like:
require File.dirname(__FILE__) + "/tokenizer_testcase" class TC_Syntax_Python < TokenizerTestCase syntax "python" def test_empty tokenize "" assert_no_next_token end end
Running ALL-TESTS.rb
again gave me:
Started ...F........................................................ Finished in 0.282 seconds. 1) Failure: test_empty(TC_Syntax_Python) [./syntax/tokenizer_testcase.rb:34:in `assert_no_next_token' ./syntax/tc_python.rb:9:in `test_empty']: <false> is not true. 123 tests, 762 assertions, 1 failures, 0 errors
This at least confirmed my test was being run. Actually, I was a little surprised to get a failure and not an error, since I hadn’t even registered a Python syntax highlighter.
Getting started on python.rb
My first cut at python.rb
simply reproduced the simple tokenizer I’d
put into code_controller.rb
.
require 'syntax' module Syntax class Python < Tokenizer # Step through a single iteration of the tokenization process. def step if digits = scan(/\d+/) start_group :digits, digits elsif words = scan(/\w+/) start_group :words, words else start_group :normal, scan(/./) end end end SYNTAX["python"] = Python end
With this implementation, all the tests passed. Now I wrote a test case
for finding comments — about the simplest syntactic element of a Python
program. Perhaps “wrote” overstates things. Actually, I just cut-and-pasted
a testcase from tc_ruby.rb
.
def test_comment_eol tokenize "# a comment\ foo" assert_next_token :comment, "# a comment" assert_next_token :normal, "\ " assert_next_token :ident, "foo" end
This caused the tests to hang. By playing with the code, I soon
figured out the problem. My tokenizer wasn’t getting past the newline.
I’d seen enough Perl in my time to figure out what to do. Clearly the
scan
function accepted a regular expression, and the else
case
used the regex special character .
to eat any single character
except an end-of-line. I modified the regex so the code read
start_group :normal, scan(/./m)
(notice the m
), and now my test
failed instead of hanging:
1) Failure:
test_comment_eol(TC_Syntax_Python)
[./syntax/tokenizer_testcase.rb:29:in `assert_next_token'
./syntax/tc_python.rb:13:in `test_comment_eol']:
<[:comment, "# a comment", :none]> expected but was
<[:normal, "# ", :none]>.
It was time to start making my Python tokenizer look like it really wanted to tokenize Python.
class Python < Tokenizer def step if comment = scan(/#.*$/) start_group :comment, comment else start_group :normal, scan(/./m) end end end
With this change, my failure moved on a line:
1) Failure:
test_comment_eol(TC_Syntax_Python)
[./syntax/tokenizer_testcase.rb:29:in `assert_next_token'
./syntax/tc_python.rb:14:in `test_comment_eol']:
<[:normal, "\
”, :none]> expected but was <[:normal, “\ foo”, :none]>.
Good! My tokenizer had at least recognised the comment. Hardly
surprisingly, it then treated the rest of the string as normal
,
which is what the test failure indicates.
Rinse and Repeat
You can probably work out the rest. I added code and test cases until my Python syntax highlighter did all I wanted it to do: namely, pick out comments, strings, triple quoted strings. This post is far too long already — I’ll post my code and the accompanying tests in another post.
Deploying the Python Highlighter
I didn’t need to do anything to deploy the code in my development environment. It was already there, since I’d developed it in place. I ran some system level tests to convince myself all was indeed OK, then copied it across to my shared server.
Just to show it all works, here’s a simple Python program to generate all the subsets of a set.
def generate_subsets(the_set, m): """ Generate all m element subsets of the input set. If the input set is empty or m is 0, yield the empty set. Otherwise, use a recursive solution. Pick any element from the set, and yield the subsets which contain this element, followed by those which don't. """ if m > len(the_set): pass elif len(the_set) == 0 or m == 0: yield set() else: e = the_set.pop() for subset in generate_subsets(the_set, m - 1): subset.add(e) yield subset for subset in generate_subsets(the_set, m): yield subset the_set.add(e)