Todd Sedano

Software Engineering, Improv, Craftsmanship

OS X Filesystem and Case Sensitive Bugs

Problem:

On one project, our test cases would be green locally, but then fail on CI due to filesystem case sensitivity differences between OS X (our development environment) and linux (our QA and production environments.). By default, OS X treats ~/project/myFile as equals to ~/projectMyFile. Thus if the code had a typo on an import, it might pass locally when it shouldn’t.

Solution:

I wanted to use the base pivotal OS X image and sprout wrap. The strategy is to create a case sensitive partition and symlink the workspace directory to it. Sprout wrap has issues with case sensitivity (pivotal_ide_prefs in particular), so we would run that from the user’s home directory, but run all project materials in the case sensitive directory.

Alternative solution: getting the pivotal OS X image on a USB key might allow us to configure the default partition’s setting on installation.

These steps could work with an existing machine, but I only tested on a clean install

Step 1) install the Pivotal Labs OS X image

Step 2) using Disk Utility, split the only partition in half. For the new partition select “Mac OS Extended (Case-sensitive, Journaled)”

Step 3) create a workspace directory in the new partition mkdir /Volumes/CASE_SENSITIVE/workspace

Step 4) symlink ~pivotal/workspace to the new partition’s workspace directory. ln -s /Volumes/CASE_SENSITIVE/workspace /Users/pivotal/workspace

Step 5) install sprout-wrap from ~, NOT from ~/workspace. I noticed that the pivotal_ide_prefs wouldn’t fully work with case sensitive on. (There may be other recipes that have issues too.)

Step 6) install your project code in ~/workspace

Note that everything in Applications et al. will be typical Mac Setup, but we were fine with that.

Estimated vs Actual Story Points

On my last project, I tried an experiment looking at estimation accuracy.

My results run counter the conventional wisdom of software engineering research and experience, but are consistent with my experience at Pivotal. Conventional wisdom says that engineers are optimistic and horrible at estimating work which is why some managers “double” estimates given to them.

On my last team, we tended to be cautious in estimating work and not overly optimistic about the risk and complexity of our work. This is similar to the “under commit, over deliver” adage.

Here is how we went about collecting the data 1) We limited our pointing scale to “Easy”, “Medium” and “Hard” — one of the pivots advocated this scale and I liked the simplicity of it. We mapped “Easy” to 1 story point, “Medium” to 3 story points and “Hard” to 8 story points. In the meeting we would hold up 1, 2, or 3 fingers, I would call out Easy/Medium/Hard and the PM would record the correct story points.

While I don’t have evidence for this, I felt that the IPMs were very efficient as there was less quibbling over minor point differences. We went with the majority point value. If a large number of people said Easy and a large number of people said Medium, we’d have a discussion. If most of the team said Easy and a few number of people said Hard, we’d have a discussion.

2) At the end of each story, the pair would assign a Pivotal Tracker label to reflect the actual point value. We used the labels “Actual Easy Points” “Actual Medium Points” “Actual Hard Points”, eventually we had to add a “Actual Zero Points”. I asked the pair not to look at the estimate on the story while recording the actual, but there could be some anchoring bias with the data.

3) I monitored tracker and reminded developers to put labels on stories they had finished but had not labeled.

Periodically, I would show results to the team. (After 1 month, 2 months, and end of project.)

  We were… (number of stories)
Estimate Conservative Accurate Optimistic
0 5 1
1 7 48 5
3 13 14 2
8 0 2
Estimate Conservative Accurate Optimistic
0 83.33% 16.67%
1 11.48% 48.28% 6.90%
3 44.83% 48.28% 6.90%
8 0.00% 100.00% 0.00%

Frustration Quote

I am frustrated when my expectations do not align with reality – Todd Sedano

Rethinking SEMAT Card Affordance

A possible solution

While I’m an avid player of card games and board games, the SEMAT card format does not reflect how I think about the alphas, which are a collection of states.

Here is a mock prototype of an alternative physical format for the SEMAT alphas.

Each alpha is a strip of cards folded much like a scroll with the “highest” state on the inside, and the lowest state on the outside. Starting with the “lowest” state, the user of SEMAT, can incrementally unfold the strip comparing the current state with the next possible state. If the next possible state is achieved, then the user can continue to unroll the strip.

Now it is impossible to accidentally loose a state in the alpha, and displaying the current state for all alphas in a project takes up roughly 1/6 of the room of the SEMAT board.

So what do we call these new SEMAT cards? SEMAT strips, SEMAT rolls, SEMAT scrolls? I’m open to suggestions.

Background — the problem

When I first saw a set of SEMAT cards, my instinct told me something wasn’t quite right. I collect playing cards. I like unusual cards sets such as my agile estimation cards, XP training cards, improv feeling cards.

When I laid out six cards for an alpha, it felt messy. I could easily get these out of order, and the order matters in a single alpha. If I piled up several alphas without a rubber band, I could easily mix them together.

I pictured myself introducing this at a training session. With agile estimation cards, I just hand out a deck, yet for SEMAT I would want to hand one alpha at a time. If SEMAT cards were printed in a deck, taking out one alpha at a time would be time consuming.

I do think agile estimation cards work well. There are four sets in one deck, just like regular playing cards. Assuming that my five year old daughter found the deck and randomized it, sorting it wouldn’t take too long with four sets. However SEMAT cards, there are many alphas, and sorting it would be tedious.

Abacus as an alternative metaphor

I stared considering an abacus. Each rod of the abacus could represent an alpha. Each bead on a rod could represent a state card. The space between the beads on the left and the beads on the right could represent the current state. Yet creating an abacus for SEMAT seemed unfeasible. Then it occurred to me, I could tape the SEMAT cards into a strip.

Improving Code Readability – Turning Comments Into Methods

I’m working with 21 developers to improve their code readability. Through a code read-through, they listen to another developer try and read their code. (See Code Readability Process for more details.)

In reviewing one programmer’s code, a sixty line method had a visual rhythm to it. There would be a blank space, a comment, then about ten lines of code, and the cycle would repeat. The comment would explained the code just following it.

The programmer realizes that the narrative is lost in their code, and feels compelled to add these comments to help the reader understand what is going on. These comments serve as section breaks or chapter headings.

Instead, the code could be split up into smaller methods, where each method name would clearly revel the intent of the code. The comment would be better served as a method invocation.

Here’s the pattern

1
2
3
4
//determine interest rate  (comment about the code intention)
code
code
code

Becomes

1
determine_interest_rate()

Here’s the original code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    bool Tictactoe::determine_game(int row, int column, char move){
        bool flag = true;
        int i;

        // Check the row of latest play
        if(!flag){
           flag = true;
           i=0;
           while((i<dimension) && (flag==true)){
               if(board[row][i] != move){
                   flag = false;
               }
               i++;
           }
        }

        // Check the column of latest play
        if(!flag){
            flag = true;
            i=0;
            while((i<dimension) && (flag==true)){
                if(board[i][column] != move){
                    flag = false;
                }
                i++;
            }
        }
        ...
        return flag;
    }
  

Here’s the revised code.

1
2
3
4
5
6
7
8
9
10
     bool Tictactoe::determine_game(int row, int column, char move){
        bool winner_found;

        winner_found = check_row_of_latest_play(row) ||
                       check_column_of_latest_play(column) ||
                       check_...

        return winner_found;
    }
  

(Note that I have not shown other refactoring that I would do, as it would distract from the point.)

By changing the comment to a method call, the code is now more “self documenting” and the intent is clear by the method call.

When I suggested this to the programmer, she resisted the idea noting that a method call would affect performance. For a tic-tac-toe problem, this is a specious argument. However, is there merit to it? Will a modern compiler optimize this kind of refactoring? And this brings up a broader question, should we optimize code for performance or readability when we are writing it? Conventional wisdom says we should write code that is clean and easy to understand, and when we are done and have performance analysis with production data, then we know where to spend engineering effort to optimize critical sections. The one exception would be algorithm complexity and running times. (e.g. O(N) vs O(NxN)

I’m now curious, is this a “comment smell”? Can comments be indicators to us programmers that the code we just wrote isn’t very clear. The comment itself my inform us on how we need to refactor the code to make it more readable.

Improv Game for Software Engineers: Program Counter

I invited this improv game themed around software development for some Brazilian computer science students visiting my campus. It’s a variant of the many reaction based, warm-up games (e.g. “Whoosh Ball”) that encourage quick response time and discourages over-thinking or planning a response. I like playing it with software developers because it makes more sense to them than these other games.

Instructions

Have the group form a circle.

Explain that we are going to mimic a program counter moving around the circle. Each person gets to say the instruction that the program counter is going to do. The instructions are “Op”, “Loop N”, “Method Call”, “If true”, “If false”

“Op” — There are a variety of operations that a normal CPU would do, such as add, subtract, store to a register. For this game, we’ll simplify all of these possible operations into a single command “op” — have the entire circle practice that command going around. Pretty simple. Let’s make it more interesting.

“Loop N” — A basic control flow of most programming languages is the ability to go through a loop and do multiple instructions each time. If someone says “loop 3” this indicates that we are iterating over a collection with three elements. The next person says “one” indicating that the first set of instructions is now happening. The next person says “two”. The next person says “three”. Then we proceed as normal. (I’ve seen one group say “Loop Zero” which we treated as a finished loop. It’s just like saying “Op”) We do this for awhile until people get it.

“Method Call” and point to someone or say their name out loud — Often programs re-use code by calling a method on that section of the code. If someone says method call and points to someone we are jumping to that section of the code. (That person needs to decide which way the program counter will continue.) We keep executing instructions until someone says “return” at which point the program counter goes back to the person who said “method call” (Sometimes people will think that it returns to the person who was pointed at, but it returns to the person who started the method call, just like a real program.) Yes, method calls can be nested multiple times and even have recursion.

I personally like this operation. In many improv games, the equivalent operation is often a chance for the person to pass control to someone else without cost. E.g. I panic, I don’t want this thing, I think it’s a “hot potato” so I’m going to give it to you quickly by saying “Zoom” so that I don’t have to deal with it. However, in this game, there is a cost of saying “Method Call” for the person, they have to remember that they said it. Everyone else in the room, just really needs to track the depth or number of method calls that have been said, where as the people who say method call need to remember where they are on the stack.

“If true” and “if false” — eventually we get tired of going around in the same direction — Our program counter is pretty simple and can’t deal with branch predictions so whenever we use the IF statement, we pay a performance penalty and skip the next step, e.g. the next person. “If true” then will skip the next person. “If false” then reverses the direction and skips the next person. (Here’s an example, if we moving clockwise and you say “if true”, we skip the person to the left. If you say “if false”, it skips the person to the right and proceed clockwise.)

“Cheer” — As soon as the first mistake happens, agree upon a verbal saying that symbolizes, “We are having fun, we made a mistake, and we get to restart!” In my improv training that’s been “Ah-ooo-ga”, the Brazilian students preferred “Ciao!” and I’ve seen other positive vocalizations.

Other considerations

After introducing these instructions, I allow the group to invite any programming constructs that they can think of, and I say “yes” to any suggestion no how bizarre it is. Sometimes I’ll allow the group to tweak it if it isn’t clear. If you try something let me know.

One variant of “op” is to allow them to create any normal single instruction operation. They could say “add”, “store”, “multiply” instead of “op” — I suspect that doing this might be best at the beginning, but I have not tried that experiment.

Feel free to use this game. I’m assuming that no-one will ever remember that I invited it. =)

Game History

On February 20, 2012, a student group from Uniasselvi University in Brazil visited Carnegie Mellon University in Silicon Valley. Since I don’t know Portuguese, Professor Jan Charles Gross graciously translated my instructions.

Professor Gross’ son, Professor Sedano, Chris Zeise, Professor Gross

TDD: Small Ah-Ha Moment on When to Use a Hash Instead of an Array

I’m sharing a pleasant surprise I had during a recent Test Driven Development coding session. My tests had found a design that was delightful to me. TDD suggested that I use a hash where my natural tendency is to use an array.

For the purpose of clarity, I’m simplifying a very complicated data structure for this example. Let’s say we wanted to show the user the most popular cheat codes for a set of video games. For the sake of the example, let’s assume that this information is stored in the database in a way that is rather difficult to access. Thus the need for a method “most_popular_cheats” to do the heavy lifting.

Let’s recall some popular cheat codes. Contra’s cheat code is “UP, UP, DOWN, DOWN, LEFT, RIGHT, LEFT, RIGHT, B, A, START” and Mike Tyson’s cheat code is “007-373-5963”

From the control flow, I would already know the order of the video games that needed cheat codes, and expected that the method “most_popular_cheats” would just return an array.

However, as I wrote the test first, I realized that the test wouldn’t know the exact order of the video games. After I created some test data in the database, I wasn’t certain how they would be retrieved, would the default sorting be by ID, or by name? The test didn’t know and I didn’t think the test should care. If the method returned a hash, I could just see if the hash contained the key->value pairs that I expected.

Hash: {contra.id => "UP, UP….", mike_tyson.id => "007-373-6963"}

On previous projects, following the traditional “code then test” development style, I have generated two parallel arrays to solve this problem kind of problem. One that contained the answer (what is my value in my hash), and the other that contained the index (what is my key in my hash.) On those projects, it had not occurred to me that a hash was a better data structure. My tests informed me on a programming nuance that I had previously missed.

Here’s the simplified version of test case that lead me to this small ah-ha moment.

contra = FactoryGirl.create(:video_game_with_popular_cheats)
mike_tyson = FactoryGirl.create(:video_game_with_popular_cheats)

popular_game_cheats = Game.most_popular_cheats
popular_game_cheats = should be_a_kind_of(Hash)
popular_game_cheats[contra.id].should = "UP, UP, DOWN, DOWN, LEFT, RIGHT, LEFT, RIGHT, B, A, START"
popular_game_cheats[mike_tyson.id].should = "007-373-5963"

Learning Test Driven Development (TDD) Through Katas

In my graduate course, “craft of software development” students created individual learning plans to accomplish their goals. Many choose to enhance their testing and design skills by focusing on Test Driven Development. (TDD)

While the data sample is low (5 students), it appears that doing katas followed by a project is preferred to just doing katas alone. By working through a kata, you practice the the skill in a very focused, tactile manner on a small problem. Once done, you can compare many posted kata solutions on the internet and use them for reflection. Then by working on a project, you can practice TDD while dealing with domain specific issues and complexities that arise from a larger problem. One student found that re-implementing a previous project was immensely valuable, as he was able to compare his new solution to his previous implementation.

Not all katas are created equal for the purpose of learning TDD. Some are too simple; some are too algorithmic in nature. (For these, creating the test suite is straightforward, yet improving running time is not.) Swapna Varghese ordered a set of katas for how easy they are to implement in TDD. Note that the ones at the end of the list are not necessarily better at teaching TDD, in fact, it may be hard to complete them using TDD.

A suggested path then would be to take an easy one (e.g. one of the first three) as a warm-up exercise to validate your test environment, and then move onto some in the middle. I’m partial towards Gilded Rose. Mars Rover was a definite favorite among my students. As with Goldlocks, it wasn’t too simple, it wasn’t too algorithmic, it “was just right.”

Exhibit 1: Katas sorted by how easy it is to apply TDD.

  1. Fizz Buzz
  2. Prime Factors
  3. String Calculator
  4. Gilded Rose
  5. Word Wrap
  6. Tennis Game
  7. Bowling Game
  8. Mars Rover
  9. Roman Numerals
  10. Coin Change
  11. Game of Life
  12. Potter

Not helpful in learning TDD: Weighing with Stones

Software Engineering Isn’t a Solo Activity

Often as engineers we like to go for it alone. It’s me versus the machine. Yet, involving others can be helpful with us in our career. In my course “craft for software development” I challenged my students to think of ways to add a social component to their software development experience. Here is some of their feedback: (list is unsorted)

  1. Ask a friend to review your code
  2. Pair program with a friend
  3. Pair program with a stranger (there are websites that do this)
  4. Attend a meetup or an unconference. Meet beginners, intermediates, and experts.
  5. Find out where people in your community hang-out (mailing list, IRC, etc)
  6. See who is around you that you might not be considering. (For example, students in a different masters program, PhD students.)
  7. Post to CMU SV facebook group.
  8. Contact alumni who might know working professionals with the expertise in the field.
  9. Posting code for review online. Post on a blog and have people comment on it.
  10. Attend a “hackathon,” “hackernoon,” or spend time at a software dojo
  11. Use linked-in to do a search on appropriate skill to see who is in your network
  12. CMU has a mentoring program in the bay area