Group When

2014-07-16, , Comments

Phil Nash’s recent tweet intrigued me.

He later clarified what he was after — and had now found — linking to a solution posted a couple of years ago by Tomas Petricek. The function groupWhen splits a sequence into groups, starting a new group whenever the predicate returns true.

 module Seq =
   /// Iterates over elements of the input sequence and groups adjacent elements.
   /// A new group is started when the specified predicate holds about the element
   /// of the sequence (and at the beginning of the iteration).
   ///
   /// For example: 
   ///    Seq.groupWhen isOdd [3;3;2;4;1;2] = seq [[3]; [3; 2; 4]; [1; 2]]
   let groupWhen f (input:seq<_>) = seq {
     use en = input.GetEnumerator()
     let running = ref true
     
     // Generate a group starting with the current element. Stops generating
     // when it founds element such that 'f en.Current' is 'true'
     let rec group() = 
       [ yield en.Current
         if en.MoveNext() then
           if not (f en.Current) then yield! group() 
         else running := false ]
     
     if en.MoveNext() then
       // While there are still elements, start a new group
       while running.Value do
         yield group() |> Seq.ofList }

Here’s a nice Haskell version coded up by @sdarlington.

Maybe takewhile and dropwhile could power a Python solution, but my first choice would be itertools.groupby. Groupby chops a sequence into subsequences, where the elements of each subsequence have the same key value. A suitable key function, in this case, must change its return value every time the sequence yields an element for which the predicate holds. It could toggle between a pair of values, for example. Or it could just count the number of times the predicate holds.

class count_p:
    ''' Return a value which increments every time the predicate holds.
    '''
    def __init__(self, pred):
        self._n = 0
        self._pred = pred
    
    def __call__(self, v):
        self._n += self._pred(v)
        return self._n

def group_when(pred, xs):
    return (gp for _, gp in groupby(xs, count_p(pred)))

Here, group_when accepts an iterable and returns an iterable sequence of iterable groups. Clients choose how to consume the results.

>>> def odd(v): return v % 2
>>> xs = group_when(odd, [3, 3, 2, 4, 1, 2])
>>> print([list(g) for g in xs])
[[3], [3, 2, 4], [1, 2]]

Note that count_p does something very like itertools.accumulate. Here’s another version of group_when which takes advantage of this.

def group_when(pred, xs):
    xs, ys = tee(xs)
    accu = accumulate(map(pred, ys))
    return (gp for _, gp in groupby(xs, lambda _: next(accu)))

§

After a short break, here’s a third version of group_when. This is the first time I’ve found a use for takewhile and dropwhile. Beware: as the teed streams xs and ys diverge, the amount of backing storage required will grow … only for the stored values to then be dropped!

from itertools import *
 
def group_when(p, xs):
    def notp(x): return not p(x)
    xs = iter(xs)
    while True:
        x = next(xs)
        xs, ys = tee(xs)
        yield chain([x], takewhile(notp, xs))
        xs = dropwhile(notp, ys)
 
def odd(x):
    return x % 2
 
[list(g) for g in group_when(odd, [3, 3, 2, 4, 1, 2])] # [[3], [3, 2, 4], [1, 2]]

Word Aligned, hosted by Github

2014-03-13Comments

To anyone who subscribes to this site’s feed, my apologies if your reader recently got filled with items you’d already seen. I can explain.

About seven years ago, I signed up for a great deal on hosting for life with a company called TextDrive. For most of the time since then, this service was actually provided by Joyent — who took over TextDrive. Then they, Joyent, said the hosting for life deal was being “sunsetted”, i.e. canned. Happily TextDrive’s original founder, Dean Allen, stepped in to revive his company and honour the original offer, which has indeed happened, though it’s been all too clear that he’s been hard to get hold of whilst the operations staff have been over-stretched.

Last week, a tweet from one of these hardworking ops tipped me off that TextDrive would soon be gone.

Tomorrow!

What to do?

After some googling I’ve chosen Github pages as the new host for Word Aligned. I’ve had to relinquish a little control over URLs to make the site truly static. (A side-effect being old content appearing with new identifiers in the RSS feed). I also won’t be configuring a web server or rooting around in rotating log files. The experts at Github can take care of that.

I do get to keep my own domain, which was paramount. I don’t have to pay anything, which is nice.

Previously, I used version control to track my site’s contents and rsync over SSH to publish. Now I simply use version control: pushing a change to github is publication.

It’s early days yet, but I’m happy with the change. The feed should settle down now, so stay subscribed. Please let me know if you spot any problems with the site.

Go for short variable names

2014-03-06Comments

Recently Brad Fitzpatrick promoted the Go style guide on twitter, which prompted Tim Penhey to take issue with its advice on variable naming.

A brief and inconclusive exchange followed. Twitter’s fine for opinions and one-liners but flawed for discussions — even when the subject happens to be brevity.

I’m not going to tweet about it, but I like Go and I like its style. I’d rather read code which uses short variable names. Long descriptive names, which may appear to provide more information, often obscure the structure and flow of the code. The narrower the scope, the shorter names can be; so the style guide implicitly sanctions short functions and shuns globals. All good.

How short is short?

Those variables are certainly short rather than descriptive but they aren’t scary: c could be a character; ch for channel, maybe, or another character; d for data, difference or distance; m, midpoint; s string; st state. All guesses, of course, but in context I’d expect to see — in the space of just a few lines of code — where each variable lives and how it’s used, a more clear and correct indication of what it means than a lengthy name could ever be.

Single character variables are just fine, says the style guide:

Prefer c to lineCount. Prefer i to sliceIndex.

Some languages allow you to go further. Omit a variable in Perl and it defaults to being what you’d like it to be. Usually.

A single character variable name is an extreme form of abbreviation. It works nicely for small things, like pixels and characters.

Pixel pixel, pel, px, p;
Character character, char, ch, c;

Some less terse abbreviations hurt my ear: mngr, svr, cnt.

The style guide is, after all, a guide, and common sense applies. Some abbreviations are ugly and others save so little space they aren’t worth it.

Go’s advice on naming tips a hat to the language’s C heritage and to C’s great application, UNIX, which is unsurprising when you realise one of Go’s prominent contributors, Ken Thompson, had a hand in both. When Thompson was asked what he would do differently if he were redesigning UNIX he replied:

I’d spell creat with an “e”.

So, working on Go, he did.

You wait all day for a bus…

2013-10-02, Comments

Any and all didn’t appear in Python until version 2.5, released in 2006, when the language was already well into its teens.

Why the delay in offering such fundamental functions? An oversight? Or simply that they’re so easy to implement they weren’t thought necessary. Either way, they’re here now.

The functions are closely related and complementary. We can define any in terms of all and vice-versa.

def any_(xs):
    return not all(map(operator.not_, xs))

def all_(xs):
    return not any(map(operator.not_, xs))

C++ reached its 30s before introducing its own versions of these logical algorithms, any_of and all_of, but made up for lost time by finding room for a third, none_of, which is not any_of.

template <class Iter, class Pred>
bool none_of_(Iter b, Iter e, Pred p)
{
    return std::find_if(b, e, p) == e;
}

template <class Iter, class Pred>
bool any_of_(Iter b, Iter e, Pred p)
{
    return !none_of_(b, e, p);
}

template <class Iter, class Pred>
bool all_of_(Iter b, Iter e, Pred p)
{
    return !any_of_(b, e, std::not1(p));
}

Reverse, Esrever

2013-09-27, Comments

Reverse is a member of the C++ standard library, but its reverse, esrever, isn’t. Similarly keep isn’t but peek is.

Can anyone think of a C++ standard library member whose reverse is also a member?

Answers in the comments below.

Go!

Clown, Flee, Jump

2013-09-11Comments

The clown is running away from the circus. The contortionist wants nothing more to do with him. She’s confessed everything to her husband, the strongman, who’s after the clown’s blood. The clown has no time to pack. Hurrying from the big top he snatches up his most treasured possessions and some refreshments:

  • a makeup case
  • a box camera, with tripod attached
  • a cactus
  • a roasted goose
  • a magnum of champagne

Each item weighs exactly 3kg.

Soon he reaches the edge of a ravine. A rope bridge connected to the other side has a sign in front of it.

WARNING
UNSTABLE STRUCTURE
SAFE TO A MAXIMUM OF 75KG

The bridge is 100m long. The clown weighs 70kg. The strongman, who’s closing in, weighs considerably more. The clown must cross the bridge at once to effect his escape. How can he do so without abandoning any of his baggage?

?

Elsewhere, it’s school sports day. Conditions are perfect for the high jump — warm, sunny, still — and a talented young athlete has raised the bar to 1.85m, which happens to be his own height. On the first two attempts he fails. On the third attempt he succeeds.

High Jump

“Chapeau!” says the French teacher.

“Awesome!” says the Chaplain.

“Unbelievable!” says the head of Mathematics.

“Actually,” the sports coach says, “it’s quite simple: a combination of talent, training, and technique. He cleared his own height but his centre of gravity didn’t.”

“What nonsense!” says the mathematician.

??

Grimaldi

The clown barely breaks stride. Juggling with mismatched objects is part of his act and quick as a flash case, camera, cactus, fowl and fizz are in the air. At no point does he have more than one object in either hand so his weight never exceeds 73kg. The bridge holds. The clown gets away.

“Grrrr!” says the strongman, shaking his fists.

???

Angle brackets hurt my eyes

2013-05-01, Comments

“All these angle brackets, they’re hurting my eyes.”

I’m quoting a colleague who sits next to me and I sympathise. Perhaps a section from a build file was causing him pain?

<target name="all" description="Build product.">
  <mkdir dir="build" if="${not directory::exists('build')}" />
   ....
</target>

Or maybe it was some XSL to generate XML by applying XML to XML?

<xsl:if test="$index-make or @index!='false'">
  <xsl:if test="//index">
    <fo:block text-align-last="justify" space-before="5pt">
      <fo:basic-link internal-destination="index-page">
        <xsl:choose>
          <xsl:when test="(/doc/@lang = 'ja') or (/doc/@lang = '')</xsl:when>
          <xsl:otherwise>INDEX</xsl:otherwise>
        </xsl:choose>
      </fo:basic-link>
      <fo:page-number-citation ref-id="index-page"/>
    </fo:block>
  </xsl:if>
</xsl:if>

Or perhaps he was wrestling with a C++ compile error.

print.cpp: In function 'void print(const
std::map<std::basic_string<char, std::char_traits<char>,
std::allocator<char> >, int, std::less<std::basic_string<char,
std::char_traits<char>, std::allocator<char> > >,
std::allocator<std::pair<const std::basic_string<char,
std::char_traits<char>, std::allocator<char> >, int> > >&)':
print.cpp:7: error: no match for 'operator<<' in 'std::cout <<
details' /usr/include/c++/4.2.1/ostream:112: note: candidates are:
std::basic_ostream<_CharT, _Traits>& std::basic_ostream<_CharT,
_Traits>::operator<<(std::basic_ostream<_CharT, _Traits>&
(*)(std::basic_ostream<_CharT, _Traits>&)) [with _CharT = char,
_Traits = std::char_traits<char>]

What makes angle brackets so vexatious. Could it be their sharp corners?

Lisp is as elegant as XML is prickly, yet it too is famous for brackets, albeit round ones — and lots of them.

Imagine a parallel universe with angle-bracket Lisp. I wonder if the language would have proved so enduring?
Not so pretty now?
<define <euler-transform s>
  <let <<s0 <stream-ref s 0>>
        <s1 <stream-ref s 1>>    
        <s2 <stream-ref s 2>>>
    <cons-stream <- s2 </ <square <- s2 s1>>
                          <+ s0 <* -2 s1> s2>>>
                 <euler-transform <stream-cdr s>>>>>

Looking more closely at the first code snippet, the section from a build file, we can see the problem isn’t so much the angle brackets as all the others.
<mkdir dir="build" if="${not directory::exists('build')}" />

The braces and parentheses may be embedded in a double quoted string but that only makes things worse. The nested levels hurt my eyes and, if you can bear to look at the code long enough, you’ll realise there’s a simpler message trying to get out: mkdir -p build.

“Solutions”

2013-04-17, , Comments

I like working around enthusiasts and optimists but that doesn’t mean I want to use chirpy or bombastic software.

These days I build programs using visual studio. Sure, it’s a decent tool but part of me cringes every time I’m asked to open a “Solution” especially when what I get seems more like a “Muddle”. A progress bar slides into action after my program links: “Updating Intellisense …”

Who coined that galumphing portmanteau? It means auto-completion and I don’t want to know it’s going on — especially since I edit code using Emacs.

Perforce is new to me and I lean on a graphical client so heavily I sometimes trip over it. So when I’m trying to dance round client workspaces and their half-baked integration with microsoft muddles solutions, the last thing I want is to be asked to “Choose a Favorite Connection”. When it comes to Perforce Servers I don’t have favourites let alone favorites. Sorry.

ACCU 2013

2013-04-11, , Comments

ACCU 2013

Brian Marick opened the second day of ACCU 2013 with a keynote presentation entitled: “Cheating Decline: Acting now to let you program well for a really long time”. He drew a distinction between effortful and automatic thinking. For example, we can drive a car along a clear road automatically but it requires considerable concentration to parallel park that same car. By tuning out unwanted signals crickets can locate their mates using minimal brainpower, and cricket players have no need of Newtonian dynamics to track the trajectory of a ball — they apply a simple visual self-calibrating algorithm to catch out batsmen. Tango dancers disturb and re-establish invariants. A robot can walk for hours without thinking about what it’s doing. Actually, if it’s your job to park cars, you can probably do that without thinking; and this was Brian Marick’s main cheat — find the work practices which allow you to lean on your perceptions and so avoid effortful thinking.

Anthony Williams’ talk on C++11 Features and Real World code did require effortful thinking but that was what I’d hoped for. He provided a concise and expert summary of the new language features in action, focusing on the biggest early winners. Leading with auto, lambda, range-for, he went on to talk about concurrency and move semantics. I learned that lambda functions can have a mutable qualifier. Ha!

I couldn’t resist Olve Maudal’s C++11 pub quiz, appropriately held in the Marriot Hotel bar, for which we formed teams and mentally compiled and executed dodgy code, capturing standard output on an answer sheet. Some of the answers may well have have been implementation dependent but Olve specified an implementation: our answers should match this laptop running this software. I was simultaneously appalled by the limits of my knowledge on fundamental subjects such as integral promotion and initialisation order, and surprised by my ability to correctly predict the behaviour of some esoteric and perverse code. I’m chastened and will be studying the answers in the cold light of day. Brian Marick may have advocated programming after a beer or two in his morning session, but the afternoon pub quiz proved that coffee works better for me!

A programmer’s dozen (13, which is 12 counting from zero!) lightning talks kept the day crackling with energy. Ewan Milne chaired the session expertly, adeptly dispatching a birthday cake as proceedings commenced. I wish I could describe all the talks but you really had to be there. Phil Nash’s use of the little known left arrow operator ← got a well deserved response from the audience. Sander Hoogendoorn stuck the boot into “Agile Fluffiness”. James Grenning’s talk on embedded development was a lightning keynote: hilarious, moving and, ultimately, tragic.

An Exploration of the Phenomenology of Software Development

2013-04-02Comments

I was lucky to be in the audience last week, when Charles Tolman visited Accu Bristol to preview his Accu 2013 conference talk: “An Exploration of the Phenomenology of Software Development”.

The talk precis is entirely accurate — but maybe not so helpful. What this talk actually delivers is a highly original and thoughtful examination of what the software development revolution is and how we can make sense of it.

Charles Tolman’s central insight is that we have crossed a boundary. New tools and technologies have extended our powers. Just as machinery developed in the industrial revolution extended our physical abilities, so software developed in the information technology revolution extends our capacity for thought.

Essentially, software development is thinking, so to analyse it we need to think about thought. With this realisation we can view our industry as a continuation of the efforts of earlier thought workers — philosophers such as Descartes and Goethe.

If this all sounds a bit heavy, Charles made space for anecdotes, humourous insights and pictures of gliders. I look forward to future episodes.

Patience Sorted

2013-03-14, , Comments

I gave a lightning talk today about patience sorting and its application to the longest increasing subsequence problem. It’s a subject I’ve written about before. My computer has been put through several million simulations. I’ve even coded up a javascript demo which deals out virtual playing cards and sorts them at the click of a button.

Ace of Spades 10 of Spades 6 of Spades 7 of Spades 5 of Spades King of Spades 9 of Spades Queen of Spades 2 of Spades 8 of Spades 4 of Spades 3 of Spades Knave of Spades

Today I used real playing cards; a linen-finished standard deck. For any talk it’s nice to have a prop.

Rock & Pop Legends

Now, I thought I understood the patience sort algorithm but until yesterday I’d never actually played it with real cards. I’ve been surprised by how much this physical dimension has developed my understanding.

I’ll be testing my new cards on some other sorting algorithms; I have high hopes. It would be good to find a similarly simple prop for linked data structures so I can balance trees, flip lists and walk graphs.

Hosting for Life? TextDrive revived!

2013-03-06, Comments

Towards the end of last year I was emailed by Jason Hoffman from Joyent. Joyent owned TextDrive, the company which, 6 years ago, offered web hosting for life in return for a single up-front payment. Jason’s email said this lifetime hosting — legacy service, he called it — was nearing End of Life (!) and would be sunsetted (!!) at the end of October 2012.

TextDrive banner

As a TextDrive customer I reckon I’d had good value from the original deal, but while six years may seem like forever on the internet it’s surely not a lifetime. Happily TextDrive’s original founder, Dean Allen, felt the same way: he emailed me to say he’d be relaunching TextDrive on a new, modern, infrastructure.

Quite how this pans out remains to be seen but I’m sticking it out. If you’re reading this page, great: it’s been served up by the new, modern infrastructure which Dean mentioned. Whilst migrating wordaligned.org I’ve had to fiddle with rewrite rules and tinker with DNS records and file permissions. Any glitches or oddities, please let me know.

More adventures in C++

2013-02-21, Comments

bool operator<(version const & v1, version const & v2)
{
    if (v1.major != v2.major)
        return v1.major < v2.major;
    if (v1.minor != v2.minor)
        return v1.minor < v2.minor;
    if (v1.patch != v2.patch)
        return v1.patch < v2.patch;
    return v1.build < v2.build;
}

C++ programmers are sticklers for tradition and unlikely to be swayed by what’s in fashion. C++ suits those who want to control the machine, and who respect the rigour and discipline this imposes. C++ programmers are generally a conservative bunch.

Some history: C++ was standardized in 1998. The next major revision of the language was developed under the working title of C++0x, where the “0x” stood for the year the job would be finished. The X gave the standardizers some slack, but not enough. C++0x became C++11 which is now, thankfully, simply C++.

Although the language’s development has been painstakingly slow the developments themselves have been extensive and radical. What’s more, users are rushing to use the new features — even before they have access to compilers which support them! I’ve seen answers to C++ topics on Q&A sites which use aspects of the language the contributors cheerfully admit they have no access to. I’ve worked on a project which used elaborate shims to hide the fact that GCC 4.6 couldn’t compile C++ as well as GCC 4.7 does, and this despite the fact that GCC’s C++11 support remains, officially, “experimental”.

Singly Linked Lists in C++

2013-02-07, , Comments

In a recent post I wrote about removing items from a singly linked list. I presented a couple of alternative implementations, and in the comments section readers suggested yet more versions.

My implementations were written in C: the post was inspired by something Linus Torvalds had said about low-level programming skills, and I’m guessing he meant C programming skills. The fact is, C programmers are condemned to reimplement these basic functions on this basic structure because the C standard library has nothing to say about singly linked lists. Until recently the C++ standard library was similarly silent on the subject, only offering a doubly linked list.

C++ introduces … the linked list!

That’s all changed now with the introduction of std::forward_list. The class interface doesn’t mention links or pointers but a quick glance through its contents makes it clear that if you imagine the container to be a templated version of a classic singly-linked list, you won’t go far wrong.

This gives forward_list some members you won’t find elsewhere in the std:: namespace. For example, std::forward_list::before_begin(), which returns an iterator just before the beginning of the list — much as the more familiar end() is just past the end.

Why is before_begin() necessary? You can’t dereference it and you can’t reverse through a forward list till you get to it. Well, since forward list iterators can only go forwards, instead of the familiar sequence insert(), erase() and emplace() methods we have insert_after(), erase_after() and emplace_after(), not to mention splice_after(). The before-the-beginning iterator allows you to use these operations to modify the node at the head of the list.

Quick question: what’s the complexity of std::list::size()?

Trick question: and how about std::forward_list::size()?

Folded files and rainbow code

2013-01-23, Comments

In one of my first programming jobs I worked at a software house which grew its own tools. We had a home made version control system, build system, test harness and programming language. We even had our own editor!

The language was C, lightly extended to support the primitive types of our particular problem domain. The editor was more esoteric. You drove it using the numeric keypad and it modeled source code as nested blocks:

  • files contained blocks of:
    • includes
    • documentation
    • data
    • functions
  • functions contained groups of:
    • parameters:
      • in parameters
      • out parameters
    • initialisation blocks
    • assertions
    • code blocks
    • loops
    • machine dependent sections
    • returns

The editor facilitated navigation of this nested structure, with keypresses to fold and unfold.

You don’t need to write your own editor to get the benefits of code folding: most editors support it to some degree. With this particular editor, however, folding reached its apotheosis. You could crimp and tuck and pleat, nesting layer upon layer within a single source file.

Origami Dragon