Dworkinian Integrity and (Sub-)Symbol Minimisation: Prescriptions for Consistent Software

How the perfect unbuilt API already exists, floating in the zeitgeist, and how it is up you to pay attention to the totality in front of you

This article is part of my Confessions of an Unintentional CTO book, which is currently available to read for free online.

I have a friend who is an outstandingly fast, imaginative, and accurate programmer. According to GitHub, he ranks within the top five contributors to open source in his homeland. Given all this, you’ll likely be as baffled as I to learn that he got rejected after applying for a job at a different buddy’s company (tech team size: about forty engineers). After speaking both to the programmer and the company owner, I have a theory about why this whiz-kid wasn’t a good fit: he’s too goddamn innovative.

Innovation is usually a wonderful and rare gift, but there can be a cost to exerting this talent. That cost is paid in project-wide consistency, and it is levied whenever the innovator prefers to use their own (objectively superior) solution rather that adhere to (objectively inferior) project-wide and community-wide standards.

This article is not about my friend, nor is it about the farce that is the technical interview. Instead, I want to take a tour of the valuable kinds of consistency in a software project with the hope that us programmers can consciously aim for more of the good stuff in our future work.

The scale of the beast–Why Consistency is Needed (skip this if you don’t need convincing)

These days, it’s not at all unreasonable to build a thriving software business with 10,000 lines of code, assuming you program in a high-level language and leverage open source libraries. Case in point: Basecamp shipped with 2,000 lines of code in 2003 and had only 15,000 lines by 2013, despite being a formidable global brand. Amy Hoy’s Freckle generated serious revenue with about 6,000 lines. And, in case you’re curious, my baby, Oxbridge Notes, has about 9,000 lines.

Even though ten-thousand-line software projects would be considered concise from the perspective of software engineers, transformed into printed books, they would nevertheless be considered sizeable reads—not exactly the sort of texts one could expect to learn off and recite from heart. Ten thousand lines of code, printed with twenty-five lines per page, would become four hundred pages of not plain English but of code, of formal logic. And these pages cannot be read linearly by starting on page one and ending on page four hundred; instead they form a labyrinth-like choose-your-own-adventure book in which the reader must frantically skip back and forth between the pages.

Consider next that the programmer is expected not only to understand this book, but also to surgically modify it, in order to make changes that nevertheless keep it coherent and sensible. This task, to be done right, demands an obscene familiarity with the book’s contents. But no one, bar the rare savant, can remember all the details and wording of even a regular, plain English book, never mind the daunting quest of remembering a program in detail.

For this reason, programmers need mental shortcuts: special techniques to represent software in concise yet accurate ways within their own brains. Amongst the greatest of techniques for simplifying representation is a reliance on consistency, be that with respect to programmer community memes, to programming history, to naming conventions, to the existing architecture of software, or to little details like presentation of code within a code editor.

Prescription A: Apply Dworkinian Integrity to Programming

Judges who accept the interpretive ideal of integrity decide hard cases by trying to find, in some coherent set of principles about people’s rights and duties, the best constructive interpretation of the political structure and legal doctrine of their community’” - Ronald Dworkin

The quote above comes from Ronald Dworkin, a legal and political philosopher proffering philosophical guidance to judges faced with the problem of deciding a legal case when no perfectly fitting precedent exists for the present set of facts. Using his idea of “law as integrity”, Dworkin believes that the judge should decide the case by drawing on the entire body of active law along with the current political and moral climate. The kernel of his insight is that the law is already there, floating in the zeitgeist; it is for the judge to act as its conduit, its spokesperson.

As programmers, we work in an increasingly populous technical ecosystem, a zeitgeist of our own that is grounded in mathematics and algorithms, then built up with operating system components, programming languages, standard libraries, design patterns, command names, best practices, and even lore. When we consciously reference and imitate these memes within our own systems (e.g. in our software architectures, API designs, method and parameter names, etc.), we create a web of intuitiveness that enables our coworkers to repurpose their existing knowledge for the present project.

To illustrate my point with specifics, I’d like to give examples of which memes I consciously adopt. These suggestions come from my point of view as a Rails programmer on Linux. Those working with other languages or in different domains will, naturally, arrive at other conventions.

1. Linux command names

There are far fewer operating systems than programming languages, and we can reasonably assume that programmers know their operating system fairy well. As a result, consciously adopting the command names found in Linux commands is a great way to ensure that other Linux users will quickly grok your code. For example, imagine you were writing a FileUtilities library. If you had functions with names like rm, ls, rmdir, and cd, then it’s possible that programmers using your library would not even need to consult your documentation; by leveraging their existing knowledge of Linux, they would already understand (and perhaps even be able to guess) much of your API’s signature and functionality.

2. Linux option/parameter names

Spend enough time with UNIX and you’ll develop intuition for the conventional names and shorthand designations of option flags. Some common ones are:

  • “-d” delimiter (e.g. cut command)

  • “-f” force

  • “-g” global (as in sed commands or Vim)

  • “-i” ignore case

  • ” -p” port/pid

  • “-r”/”-R” recursive

  • “-s” substitute (as in sed commands or Vim)

  • “-U” user

  • “-v” reverse (grep)

  • -V “verbose” / “version”

3. Linux command argument order

The linux command to copy a file (cp) expects the source file as its first argument and the destination of the copy as its second. This {source, destination} argument ordering is so ingrained into Linux users’ fingers that any deviations from this convention in your API design would not only be unintuitive but subtly dangerous. If users of your library strongly expect a certain argument ordering because of their past experiences elsewhere, they may not refer to your documentation and, as such, could be in for a nasty surprise when they discover you inverted the usual ordering.

To expand this idea a bit further, I’d like to give some other examples of deeply ingrained argument orderings:

  • {needle, haystack}

     $ ack wally train-station.txt
  • {input, output}

     # cp appears again when we view its functionality more abstractly
     $ cp original.txt copy.txt
     $ convert fox-pic.jpg fox-pic.png
  • {algorithm, receiving_object}

     $ sed 's/happy/sad/' greetings.txt
     $ chmod g+rw remove\_author
  • {main_program, sub_program, command/parameter}

     xargs perl …
     # Heroku is a company providing servers
     heroku run …
     # rake is a command from the Ruby world
     rake db:migrate
     brew install …
  • {target/thing_to_do, what_to_do_afterwards}

     jquery.get(url_to_hit, successCallback)

** OK, this last one is not from UNIX but rather from the jQuery library. I put that command here on purpose, because the principle of alluding to familiar argument orderings (or command names or flags) is dependent on your team’s domain and experience. If you do frontend development and happen to work primarily with jQuery, then jQuery memes are appropriate to adopt.

4. Data structures and their corresponding algorithms

In Ruby there are a great deal of functional programming algorithms that operate on collections (e.g. arrays, sets, etc.). Some examples are each, map, select/filter, all, any, none, and reject. I remember when Underscore.js added these functions to JavaScript’s collections, choosing pretty much the exact same function names as were used in Ruby. As a result, it was incredibly easy for Ruby on Rails programmers to adopt the Underscore library. In fact, I believe the API similarity was a strong driver for that library’s success in the Rails community.

5. File naming conventions

Throughout the programming ecosystem, you can’t help but notice that certain file naming patterns signal concrete things about the file’s contents:

  • Appending the file name with the letters “rc” (e.g. .ackrc, .bashrc ) indicates that the file is a configuration file for whatever program is spelled out by the preceding letters of that filename (e.g. ack and bash).

  • Appending the filename with a tilde (e.g. chapter4.txt~) indicates that the file is a backup, most often a short-term backup automatically generated by text editors and their ilk to store contents before you save.

  • Appending the filename with .bak (e.g. chapter4.txt.bak) indicates the file is a backup, but it has the connotation, at least to me, that the backup is meant to stick around for longer.

  • Appending the filename with _history (.bash_history, .pry_history) indicates the file contains a history of the commands run within that program.

6. Database column naming conventions

Database migrations automatically generated by Rails signal date column types with x_on and time types with x_at. As such, a Rails programmer is advised to adopt the same distinction in naming their other database tables (and in their general method signatures throughout their codebase, even for methods/attributes that are not backed by database tables).

7. Significance of uppercase vs. lowercase

Sometimes a Linux command designates different yet connected meanings to option flag letters given in their uppercase and lowercase forms. This can be quite elegant when there is some underlying symmetry or continuum. Below are a few examples.

  • {lowercase: old content, uppercase: new content} To upgrade a Postgres database to the latest version, you run the following command: pg_upgrade -d old_database_binary -D new_database_binary.

  • {lowercase: safe, uppercase: dangerous} Git distinguishes between a safe and a dangerous branch deletion operation with case: Git branch -d mybranch throws an error when asked to delete a branch not yet merged, whereas the same command with an uppercase D deletes the branch no matter what.

  • {lowercase: instance, uppercase: class} In the Rails ecosystem, Customer is the class, whereas customer is an instance of that class.

8. Keyboard shortcuts

Every halfway geeky computer user on the planet—regardless of whether they program or not—associates keyboard combinations containing “v” with pasting, “c” with copying, “p” with printing, and “z” with undoing. As such, the intuitiveness of your software is increased when you adopt these same conventions in your program’s own keyboard shortcuts.

9. Method naming patterns

In the core library of the Ruby programming language, side-effect-free methods with boolean return values have method signatures that end with a question mark (e.g. Product#on_sale?); methods that mutate the input end with an exclamation mark (e.g. map!).

10. Community code/style rules

At any point, there’ll be an evolving complex of style guidelines trending in your language’s community. These norms address touches like whitespace conventions, indentation rules, and project folder structures. These rules aren’t just about being finicky and not wanting to soil another programmer’s sense of elegance. Rather, predictable indentation/whitespace makes loops, iterations, abstraction levels, if/else branches, and other “blocks” stand out as cohesive chunks to other programmers, enabling them to quickly navigate your code while blocking out noise.

These days, we usually don’t do this styling work manually; in many environments, it has been automated to the point where we simply install code beautifiers into our text editors and set them to execute automatically upon saving. But you ought to ensure that these beautifiers are installed at the start of your project’s life; later integrations are more difficult, owing to the higher number of latent infractions these tools will then discover. Cleaning up these infractions causes additional commits in your source control, which muddies up its history by marking files as changed, when in reality, you have only made a cosmetic, presentational difference. This marking as changed is problematic because tools like git blame, which indicates who changed each line in a file and in what commit, will now be clobbered by the beautifier commits. For this reason, git blame will no longer be able to fulfill its primary purpose of enabling programmers to easily learn the intention behind each line and its author.

Prescription B: Minimise Your Symbol/Name Count

When I was at school, my teachers would return my English essays with red circles around words I had overused. I was taught I ought to vary my vocabulary, searching for suitable synonyms whenever possible. It turns out this reasoning is not only spurious but downright perilous when your goal is precise communication. Imagine for a second an instruction manual for a mobile phone that defines the same navigational menu variously as “the main menu”, “the top screen”, and “the home page”. Obviously you’ll end up confused, uncertain as to whether these three phrases point to the menu you already encountered or whether they refer to a menu you haven’t yet found. In general the problem is this: using another word carries the connotation that there is a relevant difference—either in author point of view or in referent. When no such difference exists, our expectations are thwarted and we become linguistically fuddled.

The parallel idea in programming is our use of symbols to refer to functions, variables, arguments, classes, files, URL routes, database column names, and so on. Even though we have the full English language available to us when choosing symbol names, we would do well to limit their number. Your database schema shouldn’t have “address_line_1” in the customers table and “address1” in the suppliers table: Both should be harmonised to address_line_1. Similarly, it makes no sense to refer to a HTTP GET “tag” parameter as “?t=” in some of your URLs and “?tag=” in others.

The motivation behind this restriction is much the same as that behind the sensible prescription not to repeat yourself in code (the DRY principle). The difference here is that the dryness is now applied on the lexical plane. The goal is this: Each entity should have one name and one name only, no matter where it appears in the code. This sort of thinking is the bedrock of elegance, for it forces you to cluster up entities according to their underlying similarities, much like how the mathematician reuses the multiplication symbol (x) for scalar multiplication, vector cross product, matrix dimensions, etc.

Indeed, efforts to minimise your count of lexical symbols need not be limited to individual symbols. The same ideas can equally apply to your vocabulary of prefixes, suffixes, or (moving from single symbols to multiple ones) phrasal constructions. Think about the idea of negation as expressed in function names. Your method that checks whether a user #has_sent_messages? may be negated with the method #has_unsent_messages? Elsewhere, your blog post model’s “#live?” method is negated by “not_live? and your digital document upload’s #sampleable_filetype? method is negated by #non_sampleable_filetype? Considered individually and as English phrases, these method names seem articulate, grammatical, and descriptive. But taken together and considered as a formal language intended for predictable and intuitive use, we could criticise these function signatures for failing to harmonise the idea of negation across the entire project. Across these three different signatures, the programmer employs three different negation prefixes: “un”, “not”, and “non”. Someone reading the docs for this project cannot simply command+f (search the page) for a known prefix and determine whether or not a negation method exists; they must read through all the method names before concluding their search. Similarly, metaprogramming and static code analysis tools are hampered, for they also depend on this same sort of predictability. Furthermore, inconsistent (and therefore unpredictable) negation techniques mean that refactors, such as the deletion of method “x”, run a greater risk of failing to also delete the now-defunct negative analogue.

To improve intuitiveness for programmers using our API (and remedy the issues above that arise from API unpredictability), I would strive to always denote negations with the same prefix. As a consequence, my negation function names won’t always sound particularly eloquent or even grammatical, but this is a tradeoff I’m willing to make. In my mind, the poetry of programming comes from simplicity in arrangement; the concerns of English grammar pale in comparison to this elegance. All this means that I would rewrite the method names given earlier as:


Side note: Before someone objects that defining distinct negative methods is superfluous when the positive version already exists, let me point out that focused negation methods enable resource usage optimisations (e.g. more efficient database queries) which are not generally possible when one simply takes the inverse of the positive method.

Now, if this section were only about negation, it would be narrow indeed. As such, let’s broaden our conceptual net and ask what other conceptual candidates for function building blocks exist. Naturally, the specific conventions depend on your domain, but to seed your thinking I’d like to share a few candidates I pulled from my latest web application:

  • Permission - #may_x?

    • Before I started harmonising the linguistic building blocks of my functions, I had various function signatures for permission, such as #allowed_to_x, #authorised_to_x, and even #can_x. This last function name is not only inelegant but confusing, because the word “can” denotes ability more often than permission. That brings us to my next candidate…
  • Ability - #can_x?

    • This would harmonise method names such as #able_to_x, #check_that_it_xes
  • Validations - #must_be_x

    • Harmonising #should_be_x, #assert_x
  • Filtering by an attribute - #find_by_x

    • Harmonising #with_x, #filter_by
  • Run a command/service - #execute

    • Harmonising #run, #work
  • Removal - # remove_?

  • Inclusion - #includes_x?

    • Harmonising #contains_x?
  • Count - #**x_count **

    • Harmonising #x_number, #number_of_x, #size
  • Object/output building - #generate

    • Harmonising #build_x, #make_x, #create_x, #assemble_x
  • Sums - #total_x

    • Harmonising #sum_x
  • Try an action again - #retry_x

    • Harmonising #attempt_x_again
  • Sending messages - #alert

    • Harmonising #message, #tell, #contact, #send
  • State (when using state machines) - #state

    • Harmonising #status, #stage, #step
  • Setting variables - #x=

    • Harmonising #set_x, #change_x
  • Reformatting / massaging some data - #format_x

    • Harmonising #clean_x, #massage_x
  • Indicate conceptual relation / inheritance - (this one more usually applies to class names than function names) SubclassVariationFoundationalObject (e.g. TriangularMatrix, AuthorEmailer, where Matrix and Emailer are foundational objects)

  • Return a collection - object name in plural (e.g. #comments)

    • Harmonising #get_collection, #all_x
  • Age - #**latest_x, #oldest_x, #current_x **

    • Harmonising #newest, #most_recent, #active

The above examples were principally method names but this is just the start; the same harmonising idea can be applied to parameters/argument names, class names, URLs, and even filenames.

I said in an earlier paragraph that you want each entity to have one name only, no matter where it appears in your code. As you might glean from my other articles, my definition of code reaches beyond the confines of a program’s project folder; relevant too are connections to all the other nodes and entities that make your software tick. By way of example, many web applications mesh with third-party software services, and so you ought to minimise the symbol count used and sent to these external systems. By this I mean things like the event tags shuffled off to Google Analytics or conversion information sent off to various advertising platforms. Since some web applications send up to one hundred different parameters to analytics suites, a lack of consistency and elegance here could lead to a veritable lexical mess.

So how are we to keep tabs on our symbol usage and prevent inconsistencies from creeping in? There’s no easy answer right now. A good, if trite start, is simply aspiring for consistency and low symbol counts. This can be enhanced by using primitive technologies like text editor autocomplete features and project-wide searches (e.g. searching for the partial “address” before adding a new address_line database column). A final point for hacker types wishing to save time: This stuff matters more for your public interfaces than for your private implementation details. As such, you could cut down on these harmonising efforts when within the guts of a class definition.

More Articles:

The Bug Slip

A bug hunting ritual that makes things a little easier

Textmate to VIM

Where I explain how to reproduce over 110 commands from Textmate in VIM.

Taking Data Integrity Srsly

Data Validity Spot Checks, No-Delete Policies, Database Constraints, and Care with NULLs vs. FALSEs etc.