rsync Progress Demo

2024-03-15T09:22:00-05:00

A couple weeks ago, I wrote up how to get overall progress with rsync. I’ve been meaning to use Asciinema to show terminal screen recordings, and figured this would make a good example.

Asciinema is nice in that the recording is just a JSON file, not a video, so it is very small. It is played back via Javascript, and you can even select the text that is displayed. Unfortunately, due to the Javascript, it won’t play back in the Atom feed, so you’ll need to follow the link through to the page, if you are reading in a news reader. I thought about using an animated GIF in the news feed, but then you’d lose part of the magic of Asciinema.

]]>

Linear Fit using Python and NumPy

2024-02-18T10:41:35-06:00

I was working on a side project where I needed to find the linear fit to a set of data points. A linear fit is also known as a “linear approximation” or “linear regression”. This is quite easy using a Numbers spreadsheet. Numbers will even show you equation of the line in slope-intercept form:

\[y = mx + b\]

Unfortunately, there is no way that I know of to get the slope and Y-intercept from the Numbers plot besides visual inspection. I also wanted a way to do this from the command line. This post will explain how to do this using Python and the NumPy library.

TLDR: Python One-Liners

While the rest of the post goes into more detail, here are two quick Python one-liners to find the slope and Y-intercept, given two NumPy arrays, x and y. First, with Polynomial.fit():

b, m = np.polynomial.polynomial.Polynomial.fit(x, y, 1).convert().coef

And second, with Polynomial.polyfit():

b, m = np.polynomial.polynomial.polyfit(x, y, 1)

Both of these give you the slope in m and the Y-intercept in b. I also created a Jupyter notebook demonstrating these APIs.

I’m honestly not sure why you would pick one over the other. Polynomial.polyfit() is less code, so that seems like the better option, to me. But if you know, please contact me! And please read on for more details.

Using Numbers

Here’s the sample data we’ll be using throughout the rest of the post:

X	Y
0	-1
1	0.2
2	0.9
3	2.1

In Numbers, put this X and Y data into a table. Then add a Scatter Plot and enable a “Linear Trendline” with “Show Equation” checked. You should see a plot, as in this screenshot:

The plot shows the points in blue and a line in red as the “best fit” line for the points. The legend shows the formula of the line as:

\[y = x - 0.95\]

In other words, the “best fit” line has a slope of 1 and a Y-intercept of -0.95. The goal here is to take the same input data and come up with the same slope and Y-Intercept using Python.

Using Python and NumPy

For those that don’t know, NumPy is a fantastic Python library for doing numerical calculations. And one of the many things it can do is a linear fit. Unfortunately, it also has multiple ways to do this, which I find a bit confusing. Here are all the ways I found:

The first option, numpy.polyfit(), is considered legacy, and the documentation says to use numpy.polynomial:

As noted above, the poly1d class and associated functions defined in numpy.lib.polynomial, such as numpy.polyfit and numpy.poly, are considered legacy and should not be used in new code. Since NumPy version 1.4, the numpy.polynomial package is preferred for working with polynomials.

Because this is deprecated, we’ll skip over this option and look at the other two.

Using `Polynomial.fit`

The Polynomial transition guide specifically says that Polynomial.fit() is the replacement for numpy.polyfit(), so we’ll look at this option first.

The first order of business is to get the data into Python. For demonstration purposes, we’ll create two separate NumPy arrays in code:

import numpy as np

x = np.array([0, 1, 2, 3])
y = np.array([-1, 0.2, 0.9, 2.1])

It is, however, better to store this data outside the code, for example in a CSV or JSON file. You can easily read these in Python using the Standard Library (csv or json modules) or the Pandas Library (pandas.read_csv() or pandas.read_json() functions). Pandas is nice because it automatically parses into NumPy arrays of floating point numbers.

With the data in arrays, we can use Polynomial.fit() to create a linear “polynomial” from the x and y arrays:

import numpy.polynomial.polynomial as poly

linear = poly.Polynomial.fit(x, y, 1)

This returns a Polynomial object of degree 1, meaning it is linear. But what we really want are the coefficients of this polynomial. The docs tell us how to get them:

If the coefficients for the unscaled and unshifted basis polynomials are of interest, do new_series.convert().coef.

So that’s what we’ll do:

coefs = linear.convert().coef

And if we print them out, we get:

>>> print(coefs)
[-0.95  1.  ]

These are the same numbers we got from Numbers, but the Y-intercept is first and the slope is second. Why is that? Because Polynomial represents an $n$-degree polynomial (sometimes called the order of the polynomial) of this form:

\[y = c_0 + c_1x + c_1x^2 + \dots + c_nx^n\]

And with a degree of 1, the equation is:

\[y = c_0 + c_1x\]

Plugging the coefficient values of [-0.95 1. ] into that equation give us:

\[y = -0.95 + x\]

This is backwards from the slope-intercept form of $y = mx + b$. The first coefficient, $c_0$, is the Y-intercept $b$, and the second coefficient, $c_1$, is the slope $m$. Python’s destructuring makes it easy to assign separate variables, m and b, from the coefficients:

b, m = coefs

This can even be done in one line:

b, m = poly.Polynomial.fit(x, y, 1).convert().coef

Using `Polynomial.polyfit`

While you can use Polynomial.fit() in one line, I find it a bit verbose. It creates an intermediate Polynomial object and I’m honestly not sure what the convert().coef really does. There’s a simpler way to get the coefficients using Polynomial.polyfit():

coefs = poly.polyfit(x, y, 1)

Again, the 1 is the degree, meaning a linear polynomial. Printing out coefs shows the same results as above:

>>> print(coefs)
[-0.95  1.  ]

Also like above, these are in the opposite order as the slope-intercept form, so we can use destructing to get m and b:

b, m = coef

This gives us the expected slope of 1 and Y-intercept of -0.95:

>>> print(f"{m=:.2} {b=:.2}")
m=1.0 b=-0.95

All in one line, this looks like:

b, m = poly.polyfit(x, y, 1)

I personally find this the more readable option. There’s less code and it does not create an intermediate Polynomial object. Again, if there’s some reason to prefer Polynomial.fit(), please let me know. And as a reminder, check out the Jupyter notebook demonstrating these APIs.

Bonus: Plotting with Python

One nice thing that Numbers gave us was a plot of the data and the line. Python can do this, too, using another fantastic library called Matplotlib. It has good integration with NumPy, and can use the x and y arrays directly. This code (which is also in the notebook):

import matplotlib.pyplot as plt

plt.plot(x, y, "o", label="Original data")
plt.plot(x, m*x + b, "red",
         label=f"Fitted line: y = {m:.2}*x + {b:.2}")
plt.grid(True)
plt.legend()
plt.show()

Produces this plot:

]]>

Overall Progress with rsync

2024-01-21T14:39:06-06:00

A feature you may not know about recent versions of rsync is that you can display an overall progress of the transfer. And by “recent” I mean since version 3.1.0, released in September 2013. To use it, you want to add these options instead of -v and/or --progress:

rsync --info=progress2 --human-readable --no-inc-recursive

For example:

> rsync --info=progress2 --human-readable --no-inc-recursive -a /Applications /tmp
          9.53G  21%  317.26MB/s    0:00:28 (xfr#83063, to-chk=443926/538653)

Update 2024-03-15: See this post for a screen cast demo of this progress.

One niggle here is that macOS ships with an old version of rsync. On macOS Sonoma 14.2.1, I have version 2.6.9:

> rsync --version
rsync  version 2.6.9  protocol version 29
...

So you’ll need to install a newer version of rsync. I use Homebrew:

brew install rsync

And now I have version 3.2.7:

> rsync --version
rsync  version 3.2.7  protocol version 31
...

I found out about this feature from this Server Fault question. --info=progress2 is the main new option to display an overall progress. From the rsync(1) man page:

There is also a --info=progress2 option that outputs statistics based on the whole transfer, rather than individual files. Use this flag without outputting a filename (e.g. avoid -v or specify --info=name0) if you want to see how the transfer is doing without scrolling the screen with a lot of names. (You don’t need to specify the --progress option in order to use --info=progress2.)

And also from --info=help:

> rsync --info=help
...
PROGRESS   Mention 1) per-file progress or 2) total transfer progress
...

The --human-readable option formats bytes nicely, like the 9.53G above.

The --no-inc-recursive (or --no-i-r) option provides a more accurate progress, as it does an initial file scan up front. From the man page:

Disables the new incremental recursion algorithm of the --recursive option. This makes rsync scan the full file list before it begins to transfer files. See --inc-recursive for more info.

While this can be beneficial, it may be slow for lots of files or over a network, so you may not want to use this one all the time. From one of the Server Fault answers:

This will build the entire file list at the beginning, rather than incrementally discovering more files as the transfer goes on. Since it will know all files before starting, it will give a better report of the overall progress. This applies to the number of files - it does not report any progress based on file sizes

This involves a trade-off. Building the entire file list ahead of time is more memory-costly and it can significantly delay the start of the actual transfer. As you would expect, the more files there are, the longer the delay will be and the more memory it will require.

I’ve found this pretty useful and fast enough most of the time, so I typically use it.

]]>

Improving Zsh Performance

2024-01-01T16:00:00-06:00

Zsh itself is a speedy shell, but it’s all too easy to blindly add stuff to its startup scripts and prompt that drastically slow it down. I’ve been using Zsh since around 2002 (narrator: that’s over 20 years ago, which is making me feel really old!), and my Zsh config has accumulated a lot of cruft. A few years back, there was a very noticeable delay when opening a new terminal tab where I’d stare at a blank screen for a bit. And worse, typing commands felt very sluggish even when the commands executed quickly.

Once I started digging into it, I found some great optimizations to make it fast, without losing any functionality. In fact, by the time I was done, I had a much better prompt than I previously had, yet it was orders of magnitude faster.

If you don’t want to read the whole post, the single best thing you can do is to use Powerlevel10k. And the next best thing is to avoid using eval $(some other command), if possible. But read on for the details.

Measuring Performance

Before going into the changes I made, I need to talk about taking quantitative measurements. As with all optimizations, taking measurements is the first step. How do you measure the speed, and how do you know if you’ve improved anything? Fortunately, this is very easy with zsh-bench. Once you clone the git repository, you run the zsh-bench command, wait a bit, and it’ll print out some numbers. Here’s what mine first looked like:

creates_tty=0
has_compsys=1
has_syntax_highlighting=0
has_autosuggestions=0
has_git_prompt=1
first_prompt_lag_ms=446.035
first_command_lag_ms=451.599
command_lag_ms=328.553
input_lag_ms=0.606
exit_time_ms=99.853

There’s a lot here, but there are two sets of important numbers. The first set is the shell startup time:

first_prompt_lag_ms=446.035
first_command_lag_ms=451.599

first_prompt_lag_ms is the time until you see a prompt. Basically, how long are you staring at a blank screen. This is almost a half a second! first_command_lag_ms is how long before you can actually type a command. In this case, they are pretty much the same.

The second set of important numbers is:

command_lag_ms=328.553
input_lag_ms=0.606

command_lag_ms is probably the most important number, which represents the time between one command completing and getting a prompt to start typing another command. You see this delay for every command you type, but also in the simplest case, when you just hit Return and wait until you get another prompt. 328ms here is very noticeable. input_lag_ms is the time in between each keystroke.

Both first_prompt_lag_ms and command_lag_ms were noticeably slow for me. Hundreds of milliseconds is easily perceptible. The zsh-bench README has numbers for “indistinguishable from zero”. For first_prompt_lag_ms, it is 50ms, and for command_lag_ms, it is 10ms. Both of my numbers were an order of magnitude above this at 446ms and 328ms, respectively

After making some changes, I was able to drastically improve these numbers:

first_prompt_lag_ms=24.802
first_command_lag_ms=205.137

command_lag_ms=15.747
input_lag_ms=2.148

These are both very close to the “indistinguishable from zero” goals. And these numbers were on an Intel iMac Pro. On my shiny new M3 Max MacBook Pro, they clock in about twice as fast:

first_prompt_lag_ms=16.288
first_command_lag_ms=100.066

command_lag_ms=7.247
input_lag_ms=1.318

Both numbers are now well under the “indistinguishable from zero” goals. Let’s dig into how I made this improvement.

Profiling

With a way to measure “what is slow?” and to measure if I was making things better, I needed a way to find out what part of my Zsh config was slow. It’s grown to hundreds of lines spread over many files, so it’s not possible to just intuit what is slow. Here are a number of resources I used:

Profiling ZSH startup time, by Kevin Burke
Speeding up my ZSH load time, by Carlos Becker
How to profile your zsh startup time, by Benjamin Esham

I found Kevin Burke’s page most helpful, and it boils down to adding a bit of code at the start of your ~/.zshenv, as that is the first file in your home directory to run, for interactive shells:

# Profiling via:
# https://kev.inburke.com/kevin/profiling-zsh-startup-time/
: "${PROFILE_STARTUP:=false}"
: "${PROFILE_ALL:=false}"
# Run this to get a profile trace and exit: time zsh -i -c echo
# Or: time PROFILE_STARTUP=true /bin/zsh -i --login -c echo
if [[ "$PROFILE_STARTUP" == true || "$PROFILE_ALL" == true ]]; then
    # http://zsh.sourceforge.net/Doc/Release/Prompt-Expansion.html
    PS4=$'%D{%H:%M:%S.%.} %N:%i> '
    #zmodload zsh/datetime
    #PS4='+$EPOCHREALTIME %N:%i> '
    exec 3>&2 2>/tmp/zsh_profile.$$
    setopt xtrace prompt_subst
fi
# "unsetopt xtrace" is at the end of ~/.zshrc

This produces a file in named /tmp/zsh_profile. which contains every command run, along with a timestamp, down to the millisecond. From this, you can infer which commands are slow. Here’s an example:

24:09.378 redacted:7> computer_name=
24:09.379 redacted:7> /usr/sbin/scutil --get ComputerName
24:09.378 redacted:7> computer_name=guts
24:09.385 redacted:29> redacted

It’s a bit convoluted, but you can see this scutil command takes 7ms to run: 09.385 - 09.378. While this is not a lot, this is basically death by a thousand cuts, and you have to identify all these cases where you bleed out ~7ms or more.

Making it Faster

Command lag was my priority, and the biggest slowdown, by far, was my Git status prompt. I was using the git-prompt.sh file found in the Git repository’s contrib/ directory. This runs multiple git commands and it turns out this gets really slow for large repositories.

I did a lot of searching for how to make my Git prompt faster, and eventually I found gitstatus. This project is truly amazing. The author, Roman, is obsessed with performance. He also wrote zsh-bench. gitstatus uses a combination of tricks like async prompt updates and avoiding executing a separate executable by using a deamon to drastically speed up git status.

The easiest way to use gitstatus is to use Powerlevel10k, also written by Roman. I liked how my prompt was setup, and was a little reluctant to switch, but I’m glad I did. I won’t go into the details on my Powerlevel10k setup, but suffice to say, the speedup was impressive, and the benefits of Powerlevel10k don’t stop with Git status. Performance is the headlining feature of Powerlevel10k, and it shows.

It also turns out executing commands, any command, adds noticeable delay. There’s fixed overhead in spawning a process, so you want to avoid this, where possible.

I’m a big fan of direnv, but the way it works requires running a command as a pre-command hook, so it directly affects command lag. direnv adds about 5ms of command lag on my M3 Max MacBook Pro, but for me, it’s worth it. The total command lag of 8ms is still imperceptible. I wish there was a faster alternative which used the same tricks as gitstatus, but I don’t know of one.

While I am able to live with the direnv command lag, executing commands is more of an issue for shell startup time. For example, Homebrew says you should add this to your shell startup:

eval $(/opt/homebrew/bin/brew shellenv)

All this does is set some environment variables:

> brew shellenv
export HOMEBREW_PREFIX="/opt/homebrew";
export HOMEBREW_CELLAR="/opt/homebrew/Cellar";
export HOMEBREW_REPOSITORY="/opt/homebrew";
export PATH="/opt/homebrew/bin:/opt/homebrew/sbin${PATH+:$PATH}";
export MANPATH="/opt/homebrew/share/man${MANPATH+:$MANPATH}:";
export INFOPATH="/opt/homebrew/share/info:${INFOPATH:-}";

However, this executes the command brew, just to set these environment variables. brew is written in Ruby, so this also has to startup the full Ruby runtime environment. The values of the environment variables almost never change. It’s much faster to just paste the output of brew shellenv directly into your .zshrc.

Similarly, rbenv recommends you eval the output of rbenv init - zsh:

echo 'eval "$(~/.rbenv/bin/rbenv init - zsh)"' >> ~/.zshrc

Again, this has to run an executable, and again, it is much faster to just paste the output into your .zshrc.

Another way to speed up startup is to run some stuff asynchronously and defer it out of the startup “hot path”. zsh-defer (also written by Roman) is one such way to do this. But Powerlevel10k has an Instant Prompt feature which builds upon this. By avoiding running commands and using Instant Prompt, my startup time is now quite fast.

Conclusion

Hopefully this gives you some tips to speed up your Zsh config. Bringing startup time and especially command lag down from hundreds of milliseconds will make your shell feel much faster. Also, use Powerlevel10k. Between its Git prompt, Instant Prompt, and deferred startup features, it’s incredible.

]]>

git Merge Commit Messages

2018-10-21T10:41:10-05:00

The default git commit message for merge conflicts lists any files that were conflicts. However, it includes them as a comment with the # prefix. This means they’ll get stripped from the real commit message, by default. I like to keep them in the commit message, because it can useful to know which files were conficts later on. To do this, I would manually remove the comment prefix. Unitl now… TLDR: Use git commit --cleanup scissors, but the following example will explain how this works. I also answered this on Stack Overflow, but figured it would make a good blog post.

When you use git commit, there are different cleanup modes which determine how the message is automatically cleaned up, specified by the --cleanup option. Here’s an excerpt from the git-commit(1) man page:

       --cleanup=
           This option determines how the supplied commit message should be
           cleaned up before committing. The  can be strip, whitespace,
           verbatim, scissors or default.

           strip
               Strip leading and trailing empty lines, trailing whitespace,
               commentary and collapse consecutive empty lines.

           whitespace
               Same as strip except #commentary is not removed.

           verbatim
               Do not change the message at all.

           scissors
               Same as whitespace except that everything from (and including)
               the line found below is truncated, if the message is to be
               edited. "#" can be customized with core.commentChar.

                   # ------------------------ >8 ------------------------

           default
               Same as strip if the message is to be edited. Otherwise
               whitespace.

           The default can be changed by the commit.cleanup configuration
           variable (see git-config(1)).

The default mode is essentially strip. Here’s an example of a commit message with a merge conflict using strip mode:

% git commit --cleanup strip

Merge branch 'branch'

# Conflicts:
#       baz.txt
#       foo.txt
#
# It looks like you may be committing a merge.
# If this is not correct, please remove the file
#       .git/MERGE_HEAD
# and try again.


# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# On branch master
# All conflicts fixed but you are still merging.
#
# Changes to be committed:
#       modified:   bar.txt
#       modified:   baz.txt
#       modified:   foo.txt
#

As you can see, it shows the list of file conflicts. But if you just accept the defult message, the real commit message will be:

% git log -1
commit 86cb4bbcc5f2ed54641a0f2a58a4b03bb73be0a3 (HEAD -> master)
Merge: efd152d 474a8f4
Author: Dave Dribin 
Date:   Fri Oct 19 23:38:47 2018 -0500

    Merge branch 'branch'

It stripped every line with the comment prefix, so you get a nice short message. However, that handy list of conflicts is missing. If you want them included, you need to remove the comment prefix. Let’s try that again with scissors mode:

% git commit --cleanup scissors

Merge branch 'branch'

# Conflicts:
#       baz.txt
#       foo.txt
# ------------------------ >8 ------------------------
# Do not modify or remove the line above.
# Everything below it will be ignored.
#
# It looks like you may be committing a merge.
# If this is not correct, please remove the file
#       .git/MERGE_HEAD
# and try again.


# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# On branch master
# All conflicts fixed but you are still merging.
#
# Changes to be committed:
#       modified:   bar.txt
#       modified:   baz.txt
#       modified:   foo.txt
#

This time, if you accept the default commit message, you get the following commit message:

% git log -1
commit 1eb84d98faf502e64f29ffca0766ad2e708aa61d (HEAD -> master)
Merge: efd152d 474a8f4
Author: Dave Dribin 
Date:   Fri Oct 19 23:43:32 2018 -0500

    Merge branch 'branch'
    
    # Conflicts:
    #       baz.txt
    #       foo.txt

The list of conflicts is included, even though they still start with the # prefix. This works because it will keep everything before the special “scissors” line, including the conflict list, despite them having a comment prefix. I found this so useful that I’ve changed my default cleanup mode to scissors by setting commit.cleanup in my ~/.gitconfig.

]]>

Moving to HTTPS with Let’s Encrypt

2018-03-11T10:45:07-05:00

I recently moved my site, www.dribin.org, over to HTTPS. For other HTTPS sites I’ve setup, I’ve bought a certificate from name.com. Their cheapest option is $10/year, which is very affordable. But this time I decided to try out Let’s Encrypt and their Certbot. The installation was pretty painless and seemed to go smoothly. I haven’t yet gone threw a renew cyle, though, which is required every three months. I’ll report back in June if I have problems.

This change did break all my embedded YouTube videos becuase those still used http. I needed to update the iframe URLs to also use https, and now all seems to be fine.

And finally, I updated all the links to my own site, replacing http with https. This wasn’t strictly necessary, as I have a redirect in place, but I figured it’d be nice to avoid the extra round trip for the redirect. I don’t think I’ve broken anything, but please let me know if you see anything wrong.

]]>

Linode Recommendation

2017-05-10T22:42:22-05:00

I’ve been using Linode since about 2012, and I cannot recommend them enough. A few months ago, they announced a cheaper $5/month plan. I recently switched to that, and saved a bunch of money. I pay for their backup plan, so the total is $7/month. On their previouls plan, I was paying $25/month. I was worried the downgraded specs would be a problem. Then again, I don’t get much traffic these days. I’ve been running on this downgraded server a couple months ago, and it’s working perfectly fine. A static website doesn’t require that much CPU or memory.

The downside is that you need to do all your own admin yourself. But I find it’s not that hard to setup Apache and some email services. If you’re willing (or wanting) to get your hands a bit dirty, Linode is great. If you want to give them a shot, you can try them out with my referral code: fb7465a0d5bef7335873ccdfc31bb8d3367c1945

It’s actually pretty amazing at how cheap hosting has become. When I first started hosting my own domain back in 2002, I used a colocation service. I supplied the Linux box, and for $100/month, I was ready to go with full root access. At the end of 2003, the hard drive on that machine died, and I moved my hosting to Server Matrix for $60/month. This was a dedicated physical server, but I did not own it. The upside is that if a disk died, I wasn’t responsbile for fixing it. And I still had full root access to install whatever I wanted. I jumped to a few hosting providers around this time, some via acquisition. By 2012, I was paying $70/month on SoftLayer.

But virtual private servers were becoming more popular around this time, and I didn’t need my own physical server. The cheapest Linode plan was only $20/month. So in 2012, I switched to Linode. Adding in $5 for backup, it was $25/month. But that was still way cheaper than SoftLayer. And now, with the new price reduction, I’m down to $7/month. And that’s quick summary of how, in the span of 15 years, my hosting costs have gone from $100/month down to $7/month, all with full root access. And I couldn’t be happier!

]]>

Switching to Jekyll

2017-02-27T08:53:01-06:00

After many years of faithfully using Movable Type, I have finally switched to another blogging system: Jekyll. I’ve been meaning to move off Movable Type for literally years, since they dropped SQLite support in version 5, released in 2010. Jekyll meets everything I laid out in my 2009 post, Requirements for a New Blogging System. What finally pushed me to switch was my desire to write some new posts and fix some of the link rot, such as the link to my C4[3] slides.

After my server died in 2012, I was able to bring over the static content to the new server just fine with a simple rysnc. That alone is a huge testament to static content. However, I had no desire to try and get an old version of Movable Type up and running, which meant I could no longer edit posts or create new ones.

I tried twice to switch to Octopress, once in 2012 and again in 2013, but I kept running into various snags. It’s a good think I waited because Octoproess has been pretty stagnant since then. Octopress is basically a customized Jekyll setup, and since 2013, Jekyll has improved enough on its own that I am able to use it straight-up.

I’m now up and running on Jekyll and fixed the link to my slides. The installation went pretty smoothly, and everything seems to look okay. If not, please let me know. Right now, this is just the standard Jekyll install with a slightly tweaked Minima theme, but I’d like to customize it a bit more at some point.

]]>

My Unit Testing C4[3] Presentation

2011-04-05T21:12:04-05:00

I should have posted this literally years ago. ~~Since I just learned that Keynote can easily publish presentations to iWork.com, I figure I have no excuse.~~ iWork.com is dead. Thus, here’s my C4[3] presentation on unit testing: DaveDribin-UnitTesting-C4-3.pdf

]]>

Joining Apple

2010-10-01T18:09:19-05:00

Next week, I fly out to Cupertino to start working at Apple as a full time employee. This is a big move for me, as I’ve been working for myself since 2001. But this is truly a once-in-a-lifetime opportunity that I cannot pass up. As is the case for a lot of people my age, I began my computing experience on an Apple ][ and have been a huge fan of Apple ever since (with a slight defection to Linux in the late 90s and early 00s). Apple-related technology (Mac OS X and iOS) has been my main source of income since around 2006, so in many ways, this is a dream come true. I’m very excited to work with a great team and for a great company.

Because everything needs an FAQ:

What will you be working on?

I’m not sure that I can say, just yet. But it will involve heavy amounts of coding in Objective-C.

So you're moving to California?

Nope. I’ll be staying right here in Chicago.

Will you still be writing The Road to Code for MacTech Magazine?

Unfortunately, I will be unable to continue writing The Road to Code for MacTech, and I’ve already submitted my last article. It has been an amazing three years, and I am truly grateful to have worked with MacTech.

What about public speaking?

I will also be unavailable for future speaking engagements. I’ve already had to turn down opportunities to speak at Voices that Matter, the MacTech conference, and SecondConf. I’m sure I will be an attendee at a number of the conferences, though, and I plan to stay active in the community as much as possible.

What does this mean for Bit Maki and Textcast?

This is still preliminary and in progress, but Bit Maki Software, Inc., the joint company between Jonathan Rentzsch and I, is in process of being dissolved. I plan to keep Bit Maki, Inc., my contracting company, alive for now, though I will not be doing any more contracting. Textcast will become free and be available from the Bit Maki website.

What's the future of MAME OS X?

It has been a few years since I’ve been able to dedicate much time towards MAME OS X, and this is the nail in the coffin. I will not be able to contribute to MAME OS X. If anyone want’s to pick up the effort, please let me know. This also goes for most of my other open source software. Truth be told, with the two recent additions to our family, I’m going to be sufficiently distracted for a while, anyways.

]]>

Welcome Lily and Zach Dribin

2010-09-16T22:04:51-05:00

My wife and I recently brought home two new additions to the Ross-Dribin family: Lily and Zach! All four of us are doing great, but everything since then has been a blur.

]]>

trigint - An Integer-based Trigonometry Library

2010-08-15T10:06:26-05:00

Back when I was writing the A440 sample code for my iPad Dev Camp presentation, I needed to generate a 440 Hz sine wave. The simplest way to do this is to use the sin() or sinf() standard C library functions. But there was one minor caveat: these functions required using floating point data types. However, floating point on the iPhone hardware is (or at least was) not that fast compared to integer calculations, especially when compiled in Thumb mode, which is the default. You could disable Thumb mode, but then the code size would be ~35% larger.

Or, you could avoid floating point altogether. Thus, I wrote trigint, a 100% integer-based trigonometry library. It uses a 16-entry lookup table plus linear interpolation so it’s quite memory efficient, yet still accurate enough for most purposes. Since it’s written in ANSI C99, it can be used on 8-bit microcontrollers, like an Atmel AVR with avr-gcc. Here’s the code and the API documentation.

How fast is it? Take a look at the numbers on a 1st generation iPod Touch. In summary, trigint_sin16() is about 4.4 times faster than sinf() and 6.7 times faster than sin() in Thumb mode. Without Thumb mode, the gap closes a bit to 3.8 times faster and 6.2 times faster, respectively. I haven’t, yet run the benchmark on the iPhone 4 or the iPad, so it may not matter as much on the more modern hardware; however, it’s still useful for the the 8-bit MCUs like the AVR, though.

trigint is essentially a C version of the Scott Dattalo’s sine wave routine for the PIC microcontroller. Credit goes to Scott for coming up with the algorithm. He’s also got a whole write up of sine wave theory. Scott is one of the most brilliant assembly coders I’ve ever run into. He helped me optimize a CRC-16 routine in PIC assembly, years ago.

]]>

My Chiptune Cover of Don’t Stop Believin’

2010-07-18T09:27:17-05:00

Here’s a chiptune cover of Journey’s Don’t Stop Believin’ I made during some time off I had last December. I just haven’t gotten around to posting it until now (though I did make a few tweaks yesterday). I think we were watching Glee at the time and remember thinking a chiptune version would be even better than theirs. It was made with FamiTracker, and this video is of the song being played in FamiTracker:

You can listen to and download the MP3 from its 8bitcollective page. Or download the NSF (what’s an NSF?) and play it in an NSF player, such as my own Chip Player. Or perhaps something a bit more stable like Audio Overload.

]]>

Fun with C99 Syntax

2010-05-15T08:52:20-05:00

The C99 language added some pretty neat features to the ANSI C we know and love (now known as C89). I used a construct called compound literals in my iPad Dev Camp presentation, and it seemed new to a lot of people. Here’s a summary of some lesser know features about C99 that are worth knowing. And, since Objective-C is a strict superset of C, all this applies to Objective-C, as well. Best of all, as of recent Xcode (3.0? 3.1?), C99 is the default C dialect for new projects, so you don’t need to do anything to start taking advantage of these.

Structure Initialization

Structure initialization received a lot of love in C99. In C89, you could initialize a structure variable like this:

    NSPoint point = {0, 0};

The caveat here is that the structure elements must be provided in the order that they are listed in the definition. In this case, NSPoint is declared as like:

    typedef struct _NSPoint {
        CGFloat x;
        CGFloat y;
    } NSPoint;

So, when we initialize a variable, x comes first, followed by y. This can be a drawback because if the order in the definition ever changed, all existing code that depended on the order would break. As of C99, you can initialize a structure by specifying the structure element names:

    NSPoint point = { .x = 0, .y = 0};

This not only decouples the order of the definition from the order of the initialization, but it’s more readable. As an added benefit, any structure elements not initialized are set to zero. This means you only need to fill out the portions of the structure that are relevant. And if new elements to the structure are added in later versions, they get initialized to a known value.

The benefit of this becomes even more clear for nested structures like NSRect:

    typedef struct _NSRect {
        NSPoint origin;
        NSSize size;
    } NSRect;

To initialize an NSRect in straight C89 looked something like this:

    NSRect rect = {{0, 0}, {640, 480}};

Because this is kind of awkward, there’s a function to make this a bit easier:

    NSRect rect = NSMakeRect(0, 0, 640, 480);

But, honestly, that’s not much of an improvement in readability. NSMakeRect is more useful in structure assignment, but we’ll see an alternate way to do this below. With C99, we can initialize this variable like:

    NSRect rect = {
        .origin.x = 0,
        .origin.y = 0,
        .size.width = 640,
        .size.height = 480,
    };

Again, we can order the structure elements however we please; we’re not required to list them in the order of the definition. We have some flexibility in how we assign the nested structures, too. This is also legal:

    NSRect rect = {
        .origin = {.x = 0, .y = 0},
        .size = {.width = 640, .height = 480},
    };

As as this:

    NSRect rect = {
        .origin = NSMakePoint(0, 0),
        .size = NSMakeSize(640, 480),
    };

And even this:

    NSRect rect = {
        .origin = otherRect.origin,
        .size = NSMakeSize(640, 480),
    };

So, as you can see, we get a lot more flexibility on how to initialize structures.

Compound Literals for Structure Assignment

This new syntax of initializing structure variables is great, but it doesn’t help us much for assigning to existing structure variables. The example I used in my iPad Dev Camp talk was setting up a AudioStreamBasicDescription structure, colloquially known as ASBD. This structure is used to describe an audio format for Core Audio. In this case, I had an ASBD instance variable. Because it’s an instance variable, you cannot set its value using the initialization syntax above. Thus, the traditional way to initialize one of these it to clear out it to zero with memset, and then set the fields you need, one by one:

    memset(&_dataFormat, 0, sizeof(_dataFormat));
    _dataFormat.mFormatID = kAudioFormatLinearPCM;
    _dataFormat.mSampleRate = SAMPLE_RATE;
    _dataFormat.mBitsPerChannel = 16;
    // And on and on...

Using new C99 syntax known as a compound literal, you can set an existing structure variable like this:

    _dataFormat = (AudioStreamBasicDescription) {
        .mFormatID = kAudioFormatLinearPCM,
        .mSampleRate = SAMPLE_RATE,
        .mBitsPerChannel = 16,
        // And on and on...
    };

It looks similar to a cast plus an initialization. And under the hood, the compiler is making a anonymous variable. So you can think of the above as equivalent to:

    AudioStreamBasicDescription anon = {
        .mFormatID = kAudioFormatLinearPCM,
        .mSampleRate = SAMPLE_RATE,
        .mBitsPerChannel = 16,
        // And on and on...
    };
    _dataFormat = anon;

Again, the nice thing here as that unset fields are initialized to zero, so you get the equivalent of the memset. Plus, you don’t have to repeat the variable name over and over again.

Oh, and as a quick shorthand for setting the entire structure to zero, you can do something like this:

    _dataFormat = (AudioStreamBasicDescription) {0};

Compound Literals with Primitives

Compound literals can be applied to primitive types, too. Most of the time this is not much use:

    int i = (int) {3};

But because these are anonymous variables, you can take the address of them:

    int * iPointer = &(int) {3};

This can be useful for some Core Audio APIs, such as AudioUnitSetProperty. Typically you create a variable for the sole purpose of taking the address of it:

    UInt32 maxFramesPerSlice = 4096;
    AudioUnitSetProperty(converterAudioUnit,
                         kAudioUnitProperty_MaximumFramesPerSlice,
                         kAudioUnitScope_Global,
                         0,
                         &maxFramesPerSlice,
                         sizeof(UInt32));

Using compound literals, we can do this inline without an extra variable:

    AudioUnitSetProperty(converterAudioUnit,
                         kAudioUnitProperty_MaximumFramesPerSlice,
                         kAudioUnitScope_Global,
                         0,
                         &(UInt32) {4096},
                         sizeof(UInt32));

Conclusion

While C99 was a relatively minor update to C89, there are quite a few gems buried away to make our code more flexible and readable, as you can see.

]]>

iPad Dev Camp Slides

2010-05-02T23:11:00-05:00

First off, a big thanks to David Kinney (@dlkinney) for organizing this year’s iPad Dev Camp Chicago. Here are the slides to my Core Audio presentations:

The source code to the two projects I went over are up on BitBucket:

A440: A Core Audio “Hello World”. Runs on Mac, iPhone, and iPad.
Chip Player: A chiptune player using the Game Music Emu library. Runs on Mac and iPad and plays popular chiptune file formats, such as NSF and GBS.

Use the ipaddevcamp2010 tag to see what they looked like at the time of the presentations.

]]>

A440 - A Core Audio “Hello World”

2010-04-26T09:00:49-05:00

iPad Dev Camp Chicago is coming up this weekend, and I’m going to be giving two talks about Core Audio. One of the applications we are going to go over is called A440, a “Hello World” of Core Audio. It plays a simple a 440 Hz tone using both Audio Queue Services (AudioQueueRef and friends) and Audio Unit Processing Graph Services (AUGraph and friends). In order to keep the code as simple as possible, it does not use any of the “Public Utility” C++ APIs.

I’ve posted the code up on BitBucket if you want to check it out before hand or are unable to attend. I also hope to have a more complex and complete audio playing application ready to release by this weekend, too. Hint, it’ll have something to do with 8-bit chiptunes. So if you’re in Chicago, sign up! It’ll be fun!

]]>

Debugging Asynchronous performSelector - Calls

2010-03-19T23:42:49-05:00

This afternoon, I had to debug an issue that involved performSelectorOnMainThread:. This method is a great way to get a background thread to safely interact with the user interface. But when you set a breakpoint or crash nested down in one of these method calls, sometimes it’s useful to get a backtrace of who called performSelectorOnMainThread:. Unfortunately, by the time our method gets called, we don’t have that information. The same goes for performSelector:withObject:afterDelay:.

After some good suggestions on Twitter, and a bit of ~~casual~~ recreational coding, I’ve come up with a general purpose solution that involves, of course, method swizzling.

Take this code in an application delegate as an example:

- (void)applicationDidFinishLaunching:(NSNotification *)aNotification
{
    dispatch_async(dispatch_get_global_queue(0, 0), ^{
        [self performSelectorOnMainThread:@selector(echo:) withObject:@"foo" waitUntilDone:NO];
    });
    
    [self performSelector:@selector(echo:) withObject:@"bar" afterDelay:3.0];
}

- (void)echo:(id)object
{
    NSLog(@"echo: %@", object);
}

If you set a breakpoint in the echo: method, you’ll see a backtrace like:

(gdb) bt
#0  -[PerformDebugAppDelegate echo:] (self=0x10012c480, _cmd=0x100002462, object=0x1000030e8) at /Users/dave/Desktop/PerformDebug/PerformDebugAppDelegate.m:42
#1  0x00007fff83500647 in __NSThreadPerformPerform ()
#2  0x00007fff80477271 in __CFRunLoopDoSources0 ()
#3  0x00007fff80475469 in __CFRunLoopRun ()
#4  0x00007fff80474c2f in CFRunLoopRunSpecific ()
#5  0x00007fff8101ea4e in RunCurrentEventLoopInMode ()
#6  0x00007fff8101e7b1 in ReceiveNextEventCommon ()
#7  0x00007fff8101e70c in BlockUntilNextEventMatchingListInMode ()
#8  0x00007fff870b21f2 in _DPSNextEvent ()
#9  0x00007fff870b1b41 in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#10 0x00007fff87077747 in -[NSApplication run] ()
#11 0x00007fff87070468 in NSApplicationMain ()
#12 0x0000000100001a09 in main (argc=1, argv=0x7fff5fbff1f0) at /Users/dave/Desktop/PerformDebug/main.m:29

This really isn’t all that helpful. Yes, echo: was called, but we don’t know who called it. If echo: is a method that tends to get called from a number of places with a performSelector... variant, we don’t know which one it is.

This afternoon, I took the easy way out, and set an auto-continuing breakpoint on -[NSObject performSelector:onThread:withObject:waitUntilDone:modes:] with a bt action. This isn’t ideal, though. First, this method gets called quite a bit, so hitting the breakpoint can slow things down a lot. Plus, it dumps the stack trace for each call, cluttering the gdb console with mostly useless garbage.

The breakpoint actually helped me debug the issue today, but I’ve run into this situation a number of times in the past and never had a good way to debug it. Until now.

Swizzling for Fun and Profit

The key is to swizzle the NSObject implementations of these two methods:

performSelector:withObject:afterDelay:inModes:
performSelector:onThread:withObject:waitUntilDone:modes:

In our swizzled implementations, we’d like to capture the backtrace at time they are called. Later, we can dump out the backtrace for debugging, if we want to. To do this, we’ll create an intermediate object to hold the backtrace: DDPerformDebugger. Here are its two most important methods:

- (id)initWithTarget:(id)target selector:(SEL)selector argument:(id)argument;
{
    self = [super init];
    if (self == nil)
        return nil;
    
    _target = [target retain];
    _selector = selector;
    _argument = [argument retain];
    // Capture a backtrace at the time we're created
    _frames = backtrace(_callstack, sizeof(_callstack)/sizeof(*_callstack));
    
    return self;
}

- (void)perform;
{
    [_target performSelector:_selector withObject:_argument];
}

In the initializer, we capture the target, selector, and argument. But we also capture a backtrace using the very handy backtrace(3) function.

In our swizzled performSelector..., we create an instance of DDPerformDebugger, but instead of passing the target and selector to the original implementation, we use this instance as the target and perform as the selector. As you can see above, this just forwards it on to the passed in target and selector. However, as I demonstrate below, we’ll have access to the saved backtrace. Here’s what the swizzled method looks like:

- (void)dd_performSelector:(SEL)selector withObject:(id)argument afterDelay:(NSTimeInterval)delay inModes:(NSArray *)modes;
{
    DDPerformDebugger * debugger = [[DDPerformDebugger alloc] initWithTarget:self
                                                                    selector:selector
                                                                    argument:argument];
    [debugger autorelease];
    
    // This calls the original implementation.  It's not recursing.
    [debugger dd_performSelector:@selector(perform)
                      withObject:nil
                      afterDelay:delay
                         inModes:modes];
}

The swizzled dd_performSelectorOnMainThread:... looks similar. Now, when we stop on our breakpoint in echo:, we get the following backtrace:

(gdb) bt 5
#0  -[PerformDebugAppDelegate echo:] (self=0x10012b170, _cmd=0x100002462, object=0x1000030e8) at /Users/dave/Desktop/PerformDebug/PerformDebugAppDelegate.m:42
#1  0x000000010000207c in -[DDPerformDebugger perform] (self=0x10101ae00, _cmd=0x7fff85a42a1a) at /Users/dave/Desktop/PerformDebug/DDPerformDebugger.m:157
#2  0x00007fff83500647 in __NSThreadPerformPerform ()
#3  0x00007fff80477271 in __CFRunLoopDoSources0 ()
#4  0x00007fff80475469 in __CFRunLoopRun ()
(More stack frames follow...)

This isn’t much help on it’s own. But if we jump to frame 1, where we’re inside the perform method of our intermediate object, we have access to the backtrace of the original calling point. I’ve rigged up the description method to show the backtrace, so it’s easy to get:

(gdb) po self
, selector: , argument: 
 PerformDebug                        0x0000000100001f45 -[DDPerformDebugger initWithTarget:selector:argument:] + 268
 PerformDebug                        0x0000000100001d3c -[NSObject(DDPerformDebugger) dd_performSelector:onThread:withObject:waitUntilDone:modes:] + 99
 Foundation                          0x00007fff83512d54 -[NSObject(NSThreadPerformAdditions) performSelectorOnMainThread:withObject:waitUntilDone:] + 143
 PerformDebug                        0x0000000100001b42 __-[PerformDebugAppDelegate applicationDidFinishLaunching:]_block_invoke_1 + 66
 libSystem.B.dylib                   0x00007fff86eebce8 _dispatch_call_block_and_release + 15
 libSystem.B.dylib                   0x00007fff86eca279 _dispatch_worker_thread2 + 231
 libSystem.B.dylib                   0x00007fff86ec9bb8 _pthread_wqthread + 353
 libSystem.B.dylib                   0x00007fff86ec9a55 start_wqthread + 13

We can see we’re on on a GCD thread inside applicationDidFinishLaunching:. The only downside to the backtrace_symbols(3) function is that it does not utilize debugging information to show line numbers. We just get an offset into the method.

Fortunately, a command line tool called atos(1) can provide us with this information. Given an address, it’ll decode the symbol and line number information. We can even invoke this right from inside gdb:

(gdb) info pid
Inferior has process ID 11017.
(gdb) shell atos -p 11017 0x0000000100001b42
__-[PerformDebugAppDelegate applicationDidFinishLaunching:]_block_invoke_1 (in PerformDebug) (PerformDebugAppDelegate.m:35)

And know we know that performSelectorOnMainThread: was originally called from line 35 of PerformDebugAppDelegate.m. Very helpul!

The full code for DDPerformDebugger is part of my DDFoundation project on BitBucket, but you can grab just the one file on its own and and included it with your project. By default, it does not swizzle anything, so you can compile it in for every build and not worry about taking a performance hit creating those intermediate objects. To enable the swizzling, just set the DDPerformDebug environment variable to YES. Grab a sample project, PerformDebug.tgz, and try it out for yourself.

]]>

What I Miss from Java

2010-03-02T09:28:01-06:00

On Monday, I had a series of tweets ([1], [2], [3]) about what I like about Java that I miss in Objective-C/Cocoa that were probably better off in a blog post. I blame Jeff LaMarche for baiting me into the tweet rant, so here’s the the same thing, in blog post format with more detail and other points I forgot to mention.

What I Miss

Before settling in on Objective-C, I was a Java guy for about six years or so. Overall, I much, much prefer coding in Objective-C to Java, and have no intentions of going back to Java. But that doesn’t mean there’s some things I miss. Here’s a quick list, with more detail below:

A flexible I/O class hierarchy
Everything in java.util.concurrent
Exceptions for errors
Packages, a.k.a namespaces
One file for interface/implementation
Type safe enum classes
Annotations
Awesome IDEs (IntelliJ)

A Flexible I/O Class Hierarchy

Overall, I feel the java.io package is really well designed. I like the difference between InputStream/OutputStream that are byte-oriented and Reader/Writer that are text-oriented. Also, there’s some cool subclasses such as GZIPInputStream, GZIPOutputStream, and LineNumberReader

Java’s InputStream and OutputStream are easier to subclass than NSInputStream and NSOutputStream since there’s fewer methods to deal with. Plus, there’s a lingering bug in NSInputStream that makes it effectively impossible to subclass in some really handy situations. See this mailing list post from 2007 that mentions rdar://problem/322278.

Everything in java.util.concurrent

Ever since I read Java Concurrency in Practice, I’ve been drooling over a lot of stuff Java programmers have available in their arsenal in java.util.concurrent. The Executor is roughly equivalent to a Grand Central Dispatch queue and theExecutorService is roughly equivalent to an NSOperationQueue, but that’s where the similarities end. Granted, I think the Cocoa and GCD APIs are more palatable, especially since the introduction of blocks, but there’s a bunch of classes we don’t have that would have been really useful at some point or another:

BlockingQueue and it’s implementations are handy for passing objects between threads. The need for this is partially mitigated by GCD, but it’d still be nice to have these to feed objects from one queue to another.
ConcurrentHashMapis an awesome, thread safe and scalable map/dictionary.
CopyOnWriteArrayList and CopyOnWriteArraySet are really nice for mostly read-only data.
The atomic classes in java.util.concurrent.atomic are nicer than dealing with the OSAtomic functions.
The Future interface has some very handy uses. Mike Ash just wrote a blog post on futures in Objective-C, but I think I prefer explicit futures to implicit futures.
The SwingWorker class offers nice integration of asynchronous background tasks with the user interface thread. I’ve used a similar pattern in Objective-C, but I think it’d be nice to have this encapsulated in a reusable object.
The java.util.concurrent.locks package has some nice specialty locks such as ReentrantReadWriteLock.

Exceptions for Errors

Objective-C may have exceptions, but they are only used for programming errors or essentially fatal runtime errors. Error handling is done with NSError. This is different than Java, which uses exceptions for expected error cases, such as error reading from a file, or failure to execute a database transaction. This isn’t just a Java thing. Most modern languages like Python, Ruby, and C# use exceptions as Java does. Not using exceptions as error handling was one of the harder habits for me to break; however, I’ve come to embrace NSError and eschew exceptions in Objective-C, even if I don’t like it.

I’ve debated this a number of times with people, but having dealt with both exceptions and NSError, I still prefer exceptions. The big argument against exceptions is that they are non-local jumps causing confusing behavior and subtle bugs. I guess I’ve never seen this happen. Plus there’s also the fact that Foundation and AppKit are not exception safe, so throwing exceptions through these frameworks can and will lead to memory leaks and probably unpredictable behavior. Thus, the best course of action to handle an exception is to gracefully terminate the app.

To me, NSError doesn’t result in more robust code than exceptions. 90% of the time, people just pass NULL or log the NSError in place instead of passing it up to where it’s better handled. A big reason for this is that NSError is returned as an output parameter. Thus to pass an NSError up to a higher level requires that an NSError be added to every method up the chain. Not only is this ungainly, but it’s sometimes not possible to add an NSError to an existing API. This is also a problem with checked exceptions Java, which is why most Java people endorse unchecked exceptions these days.

Exceptions really shine when you call multiple methods in a row that can fail. In Objective-C, your best bet is to return early or use a goto. This still litters your code with error handling that drowns out the real code. With exceptions, the error handling code is nicely separated from the “happy” code path. Perhaps exceptions aren’t the be-all end-all of error handling, as I’ve recently been intrigued by the error handling of Haskell. However, this is not something we’ll see in Objective-C any time soon.

Packages, a.k.a Namespaces

Classes in Objective-C live in a big, flat, global namespace. The common workaround is to prefix class names with two or three letters. Thus, I’d name my classes DDObject or some such. The problem with this approach is that two or three letters is just not enough of a namespace. It helps reduce collisions, but it does not eliminate them. Plus, Apple no longer uses only the NS prefix. It uses CA, CI, CV, QC, AB, UI, PS, IO, QL and probably more. A safe prefix today may be a collision tomorrow.

Things get even more dire when it comes to categories. Category smashing happens, and it’s hard to debug. The only way to avoid it is to prefix your category methods with some unique prefix, as well.

Packages or namespace could theoretically solve both of these collisions. This is not an unknown issue. Many radars have been filed, such as rdar://problem/7025435, that are all duped to a very low radar: rdar://problem/2821039. The problem is this isn’t an easy thing to bolt onto an existing language, because, of course, we want it done in a backwards compatible manner that doesn’t impact the performance of objc_msgSend(). So I understand why we don’t have it, but that doesn’t mean I don’t want it.

One File for Interface and Implementation

There are some good things about the separate of the interface in the .h file and the implementation in the .m file. However, I really despise having to keep these two up-to-date. Many accumulated hours have been lost to compile warnings and errors due to updating one without the other. The repeated method definitions are very un-DRY.

Others suggested that they like this separation of interface from implementation details, only exposing what’s necessary to users of the class, and conceptually I agree. In practice, I find it a chore. Perhaps the repeated code could be mitigated with some Xcode assistance. For example, add a method in the .m and it offers to add it to the .h for you. I think I could deal with this.

However, some of the private details are still exposed in the headers, namely the instance variables. The modern Objective-C runtime, available in 64-bit on Mac OS X and the native iPhone environments, allow synthesized ivars, meaning you can declare properties, and the ivars will automatically be created. But you can’t really use this until you drop 32-bit Mac OS X and/or the simulator supports the modern runtime.

And even so, I’m not sold on synthesized ivars, as it requires using properties where I previously used ivars. To me, a property is conceptually different than an ivar, even if it’s non-public. Properties can be overridden, can have side effects, are available via key-value coding, and also have more overhead. Most of the time, I don’t want these things, and prefer direct ivar access. I much prefer explicit ivars. Perhaps way to solve the exposed privates issue would be to declare them in the .m in say a class extension, instead of the .h. The runtime supports adding ivars at runtime, so this sounds technically possible.

Type Safe Enum Classes

Enums in Objective-C are inherited from C. Thus we get all the weaknesses of C enums for free. C enums are not much more than syntactic sugar for integer constants. There’s no way to get type safety, to introspect them, print them, loop over them, or do anything remotely fancy. The one benefit over a #define used to be that the debugger could decode the integer value into a readable name. But these days, all enums in Cocoa are anonymous, defined as such for 64-bit compatibility:

enum {
    // Pass in one of the "By" options:
    NSStringEnumerationByLines = 0,
    // ...
};
typedef NSUInteger NSStringEnumerationOptions;

Thus most types that are enums are actually typedef’d to NSUInteger. While this is necessary, it means we no longer get debugger integration with enum constants.

Annotations

Annotations in Java are bits of metadata that can be added to classes and methods. Some of this metadata is used at compile time, but it is also available at runtime. The runtime uses are intriguing, but one of the compile time annotations that I like is the @override annotation. This allows you to detect some subtle, yet easy-to-make mistakes when overriding methods in subclasses or a protocol:

Accidentally overriding a superclass’ method.
Misspelling a superclass’ method, thus not actually overriding it.
Misspelling an optional protocol method, thus not actually implementing it.

I’ve requested similar functionality for Objective-C in rdar://problem/7215146.

Awesome IDEs (IntelliJ)

This one has really nothing to do with the language. Xcode has really improved over the years. And while it is generally a very good text editor with project management, I’m afraid it still doesn’t compare to IntelliJ. I don’t know how other Java IDEs stack up (I was never very impressed with Eclipse), but IntelliJ was the first IDE that sold me on the concept of IDEs (I used Emacs prior to it).

Honestly, this probably deserves a whole post in and of itself, but for starters, Xcode could really use intention actions, see rdar://problem/7215136, and much better unit testing integration. IntelliJ has a bunch of other small niceties that all add up to a more pleasant development experience. Unfortunately, it’s been years since I used IntelliJ, so I don’t remember all of the details. Some I can remember are: real-time syntax checking, automatic import statement management, and customizable code reformatting (yes, I need to file bugs on these, too).

All I do remember as that it was the first and only development environment that really took much of the grunt work out of editing code. It not only got out of my way, it actually made me more productive. I dropped the $500 on it out of my own pocket, and it easily paid for itself (in real consulting dollars) in a matter of weeks.

]]>

Mac and iPhone Applications with Unit Tests, Refactored

2010-01-18T20:40:37-06:00

This is a follow-up to my post on why you should unit test Cocoa and iPhone applications. One reason I think Matt is against unit tests, at least given his particular examples, is that the tests themselves are quite large and confusing. They also use category smashing to override methods and inject mock objects and use runtime trickery to gain access to private instance variables.

These are all code smells that something isn’t right. Remember, TDD is about tests driving development; however you must remember to listen to the tests when they are crying out in pain. I’m going to take Matt’s Mac and iPhone projects with unit tests and re-work them so that they are much more manageable and cleaner.

Duplicated Code

This first smell I want to address is in the duplicated code between the Mac and iPhone applications. Both apps have a CLLoationManager delegate that listens for updates and formats the location into HTML for the web view, strings labels for the buttons, and a Google Maps URL. The code between these two apps is identical and is a good example of a violation of the Don’t Repeat Yourself, or DRY, Principle.

So how do we remove this duplicated code? Part of what makes this difficult is that it is highly coupled to the UI. In the Mac app, this code lives in an NSWindowController subclass, and in the iPhone app it lives in a UIViewController subclass. Thus we can’t just share an existing class as-is between projects because you can’t use an NSWindowController in an iPhone app and you can’t use a UIViewController in a Mac app. The answer is to extract this code into a new class that can be used from both applications.

Because the primary function of this new class is to take core location updates and format them to various strings, I’m calling the class MyCoreLocationFormatter. There may be a better name, since using the word formatter may imply an NSFormatter subclass, but let’s stick with it for now.

The sticking point is that we somehow need this new class to update the UI, irrespective of Cocoa vs. Cocoa Touch. We need to break the direct dependency on Cocoa and Cocoa Touch, and I’ve chosen to use a delegate with a single method as a bit of dependency inversion:

@protocol MyCoreLocationFormatterDelegate <NSObject>

- (void)locationFormatter:(MyCoreLocationFormatter *)formatter
 didUpdateFormattedString:(NSString *)formattedString
            locationLabel:(NSString *)locationLabel
           accuractyLabel:(NSString *)accuracyLabel;

@end

Thus, when the location changes, it formats the new location into appropriate strings and sends this message to its delegate. Both WhereIsMyMacWindowController and WhereIsMyPhoneViewController implement this protocol, and when they receive the message, the update their UI accordingly.

There are other ways to achieve similar decoupling. We could use an NSNotification to send out the update with the strings in the user info dictionary. Or we could have properties for the three strings and allow interested parties to monitor their updates using key-value observing. On the Mac side, this enables you to use Cocoa bindings to wire up the UI. Or we could return a dictionary with the three strings. Or we could create a new value object and return that. Any of these alternatives are viable, however I think using a delegate makes the coupling a bit more explicit and easier to follow. What’s important is that the direct dependency to the UI layer is broken.

Here’s the full interface of MyCoreLocationFormatter:

@interface MyCoreLocationFormatter : NSObject <CLLocationManagerDelegate>
{
    id<MyCoreLocationFormatterDelegate> _delegate;
    NSString * _formatString;
}

@property (nonatomic, assign, readwrite) id<MyCoreLocationFormatterDelegate> delegate;
@property (nonatomic, copy, readonly) NSString * formatString;

- (id)initWithDelegate:(id<MyCoreLocationFormatterDelegate>)delegate
          formatString:(NSString *)htmlFormatString;

- (NSURL *)googleMapsUrlForLocation:(CLLocation *)currentLocation;

- (void)locationManager:(CLLocationManager *)manager
    didUpdateToLocation:(CLLocation *)newLocation
           fromLocation:(CLLocation *)oldLocation;

- (void)locationManager:(CLLocationManager *)manager
       didFailWithError:(NSError *)error;

@end

Ignoring the CLLocationManager delegate methods, there are only two methods. The format string is used as the HTML template that gets loaded from the application’s bundle. The -googleMapsUrlForLocation: method is used to open the location up in a browser.

To test this class, we use a mock object to ensure the delegate is being called properly. We use instance variables and our fixture methods to setup an instance of MyCoreLocationFormatter and the mock delegate:

- (void)setUp
{
	// Setup
	_mockDelegate = [OCMockObject mockForProtocol:@protocol(MyCoreLocationFormatterDelegate)];
	_formatter = [[MyCoreLocationFormatter alloc] initWithDelegate:_mockDelegate
													formatString:@"ll=%f,%f spn=%f,%f"];
}

- (void)tearDown
{
	// Verify
	[_mockDelegate verify];
	
	// Teardown
	[_formatter release]; _formatter = nil;
}

With these fixtures in place, writing our tests are fairly straight forward:

- (void)testUpdateToNewLocationSendsUpdateToDelegate
{
    // Setup
    CLLocation * location = [self makeLocationWithLatitude:-37.80996889 longitude:144.96326388];
    [[_mockDelegate expect] locationFormatter:_formatter
                     didUpdateFormattedString:@"ll=-37.809969,144.963264 spn=-0.000018,-0.000014"
                                locationLabel:@"-37.809969, 144.963264"
                               accuractyLabel:[NSString stringWithFormat:@"%f", kCLLocationAccuracyBest]];
    
    // Execute
    [_formatter locationManager:nil didUpdateToLocation:location fromLocation:nil];
}

- (void)testUpdateToSameLocationDoesNotSendUpdateToDelegate
{
    // Setup
    CLLocation * location = [self makeLocationWithLatitude:-37.80996889 longitude:144.96326388];
    
    // Execute
    [_formatter locationManager:nil didUpdateToLocation:location fromLocation:location];
}

- (void)testFailedUpdateSendsUpdateToDelegate
{
    // Setup
    NSError * error = [self makeFakeErrorWithDescription:@"Some error description"];
    [[_mockDelegate expect] locationFormatter:_formatter
                     didUpdateFormattedString:@"Location manager failed with error: Some error description"
                                locationLabel:@""
                               accuractyLabel:@""];
    

    // Execute
    [_formatter locationManager:nil didFailWithError:error];
}

In order to keep the tests short and concise, I’ve moved the lengthy code to create a CLLocation and an NSError out of the tests and into helper methods. Remember, you’ve got to keep test code clean and maintainable, too. All three of these tests are only about 35 lines of code.

In contrast, Matt’s two original test methods were over 200 lines of code. The problem is that Matt’s code is testing the string formatting by asserting the web view’s HTML and text field’s strings. Separating the responsibility of formatting the strings from updating the UI not only improves the design, but makes testing much simpler.

So we’ve got on almost order of magnitude less code in our test methods that’s cleaner with the same code coverage. As a bonus, this class can be used in both the Mac and iPhone applications, so we can re-use the core logic. From an MVC perspective, the new MyCoreLocationFormatter class would be considered the model that is used with different views. Pushing logic out of the controller layer and into a model pays homage to the skinny controller, fat model design guideline.

Removing Testing Hacks

With this class complete, we can now move on to testing the WhereIsMyMacWindowController class. Matt’s test code uses a couple of hacks in order to enable testing that I consider code smells.

First, it uses runtime trickery, namely object_getInstanceVariable and object_setInstanceVariable, to gain access to the outlet instance variables. I think this is bad form and prefer to add public accessors for the outlets using properties:

@property (assign) IBOutlet WebView *webView;
@property (assign) IBOutlet NSTextField *locationLabel;
@property (assign) IBOutlet NSTextField *accuracyLabel;
@property (assign) IBOutlet NSButton *openInBrowserButton;

Using IBOutlet on a property means the NIB loading code will go through this public interface as well. Since these properties are already settable by the NIB loading code, I don’t see a big downside to making them public.

Second, the tests use category smashing to inject mock objects of CLLocationManager and NSWorkspace for testing. Again, I think this is bad form and prefer explicit dependency injection as a cleaner way to do this. I’m going to use constructor injection by adding these two initializers to WhereIsMyMacWindowController:

- (id)init;

// Designated Initializer
- (id)initWithLocationManager:(CLLocationManager *)locationManager
            locationFormatter:(MyCoreLocationFormatter *)locationFormatter
                    workspace:(NSWorkspace *)workspace;

The no-argument initializer calls the designated initializer with a new location manager, new location formatter, and the shared workspace. This allows production code to use the no-argument initializer and allows test code to use the designated initializer in order to inject test doubles, such as mock objects.

Some may find it odd to pass in an instance of NSWorkspace. I agree, it is a bit odd, but it’s necessary because it’s a singleton and hard to stub out for testing. There are other ways to achieve this, such as the category smashing Matt uses, or using a custom factory class that can be overridden by unit tests. However, eliminating the use of a singleton in the logic and using dependency injection is far more flexible. What if, for example, we wanted to run tests concurrently, like GHUnit allows? Now we’ve got to deal with thread safety issues when overriding the shared global instance. Remember, a singleton is a global in sheep’s clothing.

With these two hacks addressed, our tests in WhereIsMyMacWindowControllerTests become simpler. Again, I update the fixture to create mock objects for the various dependencies:

- (void)setUp
{
    // Setup
    _mockLocationManager = [OCMockObject mockForClass:[CLLocationManager class]];
    _mockLocationFormatter = [OCMockObject mockForClass:[MyCoreLocationFormatter class]];
    _mockWorkspace = [OCMockObject mockForClass:[NSWorkspace class]];
    _windowController = [[WhereIsMyMacWindowController alloc]
                         initWithLocationManager:_mockLocationManager
                         locationFormatter:_mockLocationFormatter
                         workspace:_mockWorkspace];
}

- (void)tearDown
{
    // Verify
    [_mockLocationManager verify];
    [_mockLocationFormatter verify];
    [_mockWorkspace verify];
    
    // Teardown
    [_windowController release]; _windowController = nil;
}

Here’s a few of the tests to show what a big difference these changes make:

- (void)testWindowDidLoadStartsLocationManager
{
    // Setup
    [[_mockLocationManager expect] setDelegate:_mockLocationFormatter];
    [[_mockLocationManager expect] startUpdatingLocation];

    // Execute
    [_windowController windowDidLoad];
}

- (void)testOpenInDefaultBrowserActionOpensGoogleMapsUrlInWorkspace
{
    // Setup
    [[[_mockLocationManager stub] andReturn:nil] location];
    NSURL * dummyUrl = [NSURL URLWithString:@"http://example.com/"];
    [[[_mockLocationFormatter stub] andReturn:dummyUrl] googleMapsUrlForLocation:nil];
    [[_mockWorkspace expect] openURL:dummyUrl];
    
    // Execute
    [_windowController openInDefaultBrowser:nil];
}

- (void)testCloseStopsLocationManager
{
    // Setup
    [[_mockLocationManager expect] stopUpdatingLocation];

    // Execute
    [_windowController close];
}

Note that I also moved the call to -stopUpdatingLocation from -dealloc to -close. I try to use -dealloc only for cleaning up memory related resources, which is especially important in a garbage collected environment. This also means we don’t have to stub out the call to -stopUpdatingLocation everywhere, making our tests simpler, too.

The test to ensure that the location formatter delegate updates the web view and test fields is still a bit lengthy:

- (void)testLocationFormatterDelegateUpdatesUI
{
    // Setup
    id mockWebView = [OCMockObject mockForClass:[WebView class]];
    id mockWebFrame = [OCMockObject mockForClass:[WebFrame class]];
    id mockLocationLabel = [OCMockObject mockForClass:[NSTextField class]];
    id mockAccuracyLabel = [OCMockObject mockForClass:[NSTextField class]];
    
    _windowController.webView = mockWebView;
    _windowController.locationLabel = mockLocationLabel;
    _windowController.accuracyLabel = mockAccuracyLabel;
    
    [[[mockWebView stub] andReturn:mockWebFrame] mainFrame];
    [[mockWebFrame expect] loadHTMLString:@"html string" baseURL:nil];
    [[mockLocationLabel expect] setStringValue:@"location"];
    [[mockAccuracyLabel expect] setStringValue:@"accuracy"];
    
    // Execute
    [_windowController locationFormatter:_mockLocationFormatter
                didUpdateFormattedString:@"html string"
                           locationLabel:@"location"
                           accuracyLabel:@"accuracy"];
    
    // Verify
    [mockWebFrame verify];
    [mockLocationLabel verify];
    [mockAccuracyLabel verify];
}

Personally, I’m not a big fan of these kinds tests, as well as tests that ensure outlets are setup properly after the NIB loads. If you stick with the skinny controller, fat model principle, the controllers become so simple they’re almost not worth testing. Just as you don’t test simple accessors because there’s little benefit to doing so, I don’t know that it’s necessarily worth it to get 100% code coverage on your controller classes. You still get a big benefit if the bulk of your code is in testable model classes, so this isn’t much of a loss.

There’s more changes I’ve made to the Mac application, and I’ve given the same treatment to the iPhone application, but I don’t want to make this post any longer than it already is. View the code I put up on Bitbucket, or download a tarball, to see the final result.

]]>

Why Unit Test Cocoa and iPhone Applications

2010-01-18T12:33:45-06:00

Matt Gallagher, on his absolutely fantastic Cocoa with Love blog, recently wrote a series of posts about test driven development (TDD) and unit testing Mac and iPhone applications:

However, I heartily disagree with a major point in the conclusion of his final post:

Unit testing an application is filled with difficulties and problems. In my development style, I consider the time cost of unit testing an application outweighs its benefits — especially since a unit tested application still requires system tests like user-interface and regression tests for proper validation.

Before I talk about why I disagree, I want to say that I do agree with the next part of the conclusion:

Regardless of whether you use unit tests, formalized system testing — either automated or manual and methodical — is required to fully validate an application and ensure the lowest possible low bug rates.

Unit testing, while practicing TDD, is not about testing, in the traditional sense. Unit testing does not tell you that your final application is actually easy to use by the end user. Unit testing does not tell you that you actually built an application that solves the problems you intended to address. Unit testing does not tell you if all your units work together once wired up. That’s what higher level acceptance and system tests are for. Acceptance and system tests are required to ensure the overall quality of your application.

However, this doesn’t make unit testing worthless. Applications have two kinds of quality: external and internal. External quality is how the end users perceive the quality of the application: does it crash, is it easy to use, etc. Acceptance and system tests keep the external quality high. Internal quality is how the developers perceive the quality of the code: is easy to understand and maintain, etc. TDD and unit tests are about keeping the internal quality of the application high. High internal quality makes bugs easier to find and features easier to implement. Unit tests provide an essential safety net to be able to refactor your code, improve the design, and avoid code rot. Ultimately this makes software cheaper (and more fun!) to write and easier to stay on schedule.

Thus any real world application should have both unit tests and acceptance tests. They are not mutually exclusive and they do not serve the same purpose. I honestly believe that if writing unit tests for TDD is slowing you down, then you are not properly practicing TDD. Remember TDD is a learned skill, and like any learned skill, it’s not something you pick up overnight. It takes practice. Is TDD a panacea? No, it’s not going to cure cancer and bring world peace, but I think it is the best tool we currently have in our arsenal to keep code clean, flexible, and maintainable.

]]>

Handy sudo Settings

2010-01-13T22:33:08-06:00

By and large, sudo on Mac OS X comes setup with some sane defaults. But here’s two lines I add to my /etc/sudoers file on a fresh install:

Defaults        passprompt="%u@%h's password: "
Defaults        timestamp_timeout=15

The first line changes the password from something generic:

% sudo whoami
Password:
root

To something a lot more useful:

% sudo whoami
dave@fuji's password: 
root

The second line changes the time between needing to re-enter your password from 5 minutes to 15 minutes. This is handy if you’re confident you’re in a fairly secure physical environment, like an iMac at home, and you find sudo asking for your password too frequently. If you work in a more paranoid physical environment, you may want to keep the timeout at 5 minutes.

Remember to edit the file with visudo(8). Don’t edit it directly.

% sudo EDITOR=emacs visudo

]]>

My First NES Homebrew

2009-12-20T13:40:23-06:00

One of my gifts to my wife, Nancy, for her birthday this year was the Happy Birthday song, chiptune style. It’s an actual NES program, hand-coded in 6502 assembly. This is the first original NES program I’ve written, and could be considered my first homebrew or even demo. While you can download the ROM and run it yourself, here’s a video of it running in Nestopia, an NES emulator [YouTube link]:

Some interesting stats about this program:

The song data is 485 bytes for 42 seconds of music.
The assembled code is 575 bytes.
It uses 32 bytes of RAM at runtime, not including stack.
It uses probably 10 bytes of stack.
The graphics are 8 kilobytes.
The ROM file is 24 kilobytes (mostly zeroes, though).
The video file I uploaded to YouTube was 784 kilobytes.

Much thanks to BunnyBoy and MetalSlime for their “Nerdy Nights” NES coding tutorials on Nintedo Age. The NES sound engine is all MetalSlime’s genius. I just plugged in the notes. The music itself was adapted from some sheet music I found. The lame “graphics” are all mine. I don’t know how to do much other than scroll the background and change the color palette, at the moment.

Also, the NesDev archive and wiki were invaluable as a technical references. For the toolchain, I used cc65, which includes a very nice assembler and linker for 6502. It would be nice to run this on actual hardware at some point using flash cartridge like the PowerPak.

I started learning NES programming only a few months ago. It’s just one of those useless skills I’ve wanted to learn for a while now. I probably had a bit of head start since I learned 6502 assembly back in the day on an Apple ][. Granted, that was a long time ago, but it’s a very simple CPU architecture so it comes back pretty fast.

]]>

Mercurial and HTTP Passwords

2009-12-20T10:44:09-06:00

Authenticated Mercurial repositories are generally handled in one of two ways: HTTP authentication or SSH. Prior to Mercurial version 1.3, the best way to handle authenticated repositories so that you didn’t have to enter your password on every transaction was to use SSH with ssh-agent, or, if you were running Mac OS X, use Jonathan Wight’s hgkeychain extension, which stored HTTP passwords in the Mac OS X keychain. As of Mercurial 1.3, there’s a new way to handle authenticated HTTP repositories that I’ve just started using.

Mercurial 1.3 added official support was for storing HTTP authentication information in .hg/hgrc. The official documentation for this is in the hgrc.5 man page, but there’s also a good article on hgtip.com on how to set it up. The gist of it is you put something like this in your .hg/hgrc:

[auth]
bb.prefix = https://bitbucket.org
bb.username = {username}
bb.password = {password}

The problem with this solution is that your password is stored as plaintext. Enter mercurial_keyring, also known as the keyring extension. If you leave the password out of your hgrc, the keyring extension will prompt you for the password and store it in a system specific password database, such as the OS X keychain, Gnome Keyring, or KDE KWallet.

Earlier versions had one drawback: it stored passwords indexed on the repository’s URL. For example, if you had two repositories on BitBucket, you’d have to enter your password twice, once for each repository. This wasn’t ideal, especially when you have many repositories on BitBucket. Once I entered my BitBucket password, I shouldn’t have to enter it again.

So I forked the project and and modified it so that saved passwords are indexed on the URL prefix you setup in hgrc. Thus, all authenticated BitBucket repositories will share the same password. My patches were accepted and included in version 0.4.0 of mercurial_keychain. I’ve only been using it for a day now, but it’s been working out well so far.

]]>

NSConference 2010

2009-12-19T10:23:04-06:00

For those that missed the announcement, NSConference 2010 is now open for registration. NSConference is a 2-day conference for Mac and iPhone developers, plus some optional workshops. If you missed WWDC or C4, or just can’t get enough good technical information related to all things Mac and iPhone, check out NSConference. There’s one in Europe and the US, so there’s no excuse for not going! Yours truly will be giving a talk on writing clean code.

]]>

Bottom Bars in Interface Builder

2009-12-18T00:19:01-06:00

Bottom bars have been an important user interface element for a while now on Mac OS X. They’re that gray status bar you see at the bottom of many of Apple’s applications including iTunes, Finder, iChat, iCal, Address Book, and iPhoto. The Human Interface Guidelines even has a whole section on bottom bars describing what they are and when to use them. In case you’re not familiar with what they look like, here’s a fresh window with a bottom bar:

In Mac OS X 10.5, Apple added a new API to NSWindow to add bottom bars, somewhat cryptically named setContentBorderThickness:forEdge:. There was no support for bottom bars in Interface Builder 3.0 and 3.1, though, so you had to call this method in awakeFromNib:

[window setContentBorderThickness:22 forEdge:NSMinYEdge];

In Snow Leopard, we finally get Interface Builder support for bottom bars. I somehow missed this in the new IB until just this week, and it’s easy to use.

Select the window object and choose the Size tab of the Info Panel. You can now set the Content Border of the window:

Putting this in the Size tab is a bit non-intuitive to me, and would seem to make more sense in the first Attributes tab. This is is probably one reason why I never noticed this before, but, alas, that’s a minor complaint. I’m glad it’s there.

The HIG also has a section on positioning text and controls in a bottom bar. Unfortunately, IB does not provide automatic snapping guides that enforce this positioning. You have to manually count pixels yourself by holding down the Option key or using a separate tool like xScope, thus I’ve filed:

rdar://7483606: ER: Interface Builder should enforce positioning of controls in the bottom bar

]]>

Requirements for a New Blogging System

2009-12-04T01:40:22-06:00

This blog currently runs on Movable Type, and has since I started it back in 2003 on version 2.65 (dang, my first post is quite a snoozer). I even wrote some plugins for version 2 and 3 to cope with some of the limitations of its permalinks.

However, I recently learned that version 5 will only supports MySQL. This is a huge disappointment as SQLite is one of the major reasons I chose Movable Type (see my comments about upgrading to MT 4). I don’t know what’s motivating this decision, but it really makes no sense. Is it really that hard to write database neutral SQL? Surely this is a solved problem by now.

This is so bad that it’s deal killer for me, and I’m starting to investigate my options going forward.

Why Not Movable Type 4

The simplest option is to just keep running version 4. Obviously, it works for me, and it’s the choice with the least amount of effort. But MT4 isn’t perfect, either.

The big issue is it’s templating system. I blame the templating system on why my blog looks like ass and doesn’t match the rest of the site.

Maybe I’m just an idiot, but the whole theming system hurts my brain. First off, the entire theme is stored inside the blog engine. In order to edit the templates, you have to edit them inside the browser. This is incredibly lame.

You can link individual templates to a file on the file system, but you have to manually do this for each and every template. In MT4, there’s something like 20 different templates, and I’m not about to go through each and every one linking them to a file. (Have I mentioned I’m lazy?)

I think you can use a plugin as a theme, but the documentation on that has been very poor. There’s no “simplest possible theme” plugin that you can use as a starting polint and build up as necessary. You’ve got to somehow extract an existing theme into a plugin, which of course, there’s no easy way to do, last I looked into it.

What am I Looking For?

Here’s my must-have list of requirements for a blogging system:

Generates static pages.
Does not require a client-server RDBMS.
Support writing posts in Markdown or Textile.
Simple, yet flexible templating system based on files on the file system.

Static

Basically, this falls under the KISS principle. My posts and theme almost never change. There’s no reason to eat up CPU and memory generating a page each and every time someone hits it. The big downside to static is that it generally means no comments. Services like Disqus, though, mean I can add comments even to a statically generated blog.

Static also means fast it’s fast. I don’t need to install the latest and greatest super-duper-OMGWTFBBQ cache plugin. And getting Fireballed won’t take down my site. The few times my site has been linked by more popular sites, my server hasn’t so much as strained itself, let alone failed to serve up pages.

No Client-Server RDBMS

I don’t understand why so many blog systems require a full-blown RDBMS like MySQL or Postgres. Don’t get me wrong, I love Postgres, but it’s overkill for 99% of the blogs out there, including mine. An RDBMS complicates setup. It complicates backup. It’s one more thing I have to keep secure. It’s, somewhat ironically, a hinderence to scalability (*cough* “Cannot access MySQL” errors *cough*). There’s a lot of overlap here between static generation, but they’re mutually exclusive as Movable Type itself shows.

If you’re writing a blog engine and you really feel the need to use SQL, for all that is good and holy, use SQLite or at least allow it as an option. SQLite will more than scale for most blogs. The database should be almost entirely read-only. Even if you support comments, I doubt you are getting comments added in the hundreds per second category, which means, SQLite will be able to handle your load.

Many static site generators forgo a database completely and just use the file system. You edit the posts by editing ordinary files. Plus, it means I can check the whole thing into version control, too. This is very tempting and seems like it’s even better than SQLite.

Markdown

I’m a geek. I don’t need no stickin’ GUI editor for my posts. Heck, I could even write them in plain HTML, but Markdown does make things like paragraphs, emphasis, lists a little simpler. I’m also lazy, so making things simpler is a win.

Templating

I covered this a bit above in my current dislike for Movable Type 4, but I want a templating system I can edit completely in the file system. This way I can edit them with a real text editor, and I can store them in version control. Having stuff like this in version control is so awesome because it gives you a safety net for mistakes.

Other Considerations

I love MarsEdit for posting. I’ve been using it ever since it was part of NetNewsWire. However, none of the static blogging systems have a remote API like this. I’m perfectly capable of just editing Markdown files, but there’s something to be said for a dedicated blog editing app. In my ideal world, I could post either by adding a file or through MarsEdit, whichever strikes my fancy.

Current Contenders

Unfortunately, static generation and SQLite support rules out the most common blog systems like Wordpress. However, I’ve already got a few contenders in mind. They’re all Ruby-based static generation systems, namely:

Rote
nanoc
Webby
Jekyll
webgen
Something custom

I’ve already got a good idea of the pros and cons of each. My shortlist is Rote, nanoc, or something custom. I need to evaluate each a bit more, though, to really find the winner. Or maybe I’ll just be lazy and stick with MT4.

]]>

Using @rpath - Why and How

2009-11-15T20:54:28-06:00

Last week, Mike Ash wrote a post describing the different install name keywords recognized by the Mac OS X dynamic linker: @executable_path, @loader_path, and @rpath. I wanted to chime in with a bit of advice: if at all possible use @rpath.

This gist is, if you’re targeting 10.5 or later, use @rpath. There’s no reason I can think of to still use @loader_path. If you’re still on 10.4, use @loader_path. And let’s hope that you’re still not targeting 10.3 and earlier, so forget that @executable_path ever existed. There’s really never a good reason to use @executable_path on 10.4 and later.

As Mike wrote, @rpath is the most flexible of the three options. The big benefit that I see is that the same binary of a framework or dynamic library can be used embedded in a app bundle, by a command line tool, or put in ~/Library/Frameworks/. Basically, it allows you to use the framework wherever you want by putting the onus on the linking app or bundle to define where to find it.

The only major downside is that @rpath requires 10.5 or later; however, with 10.6 already shipping, there’s increasingly fewer reasons to support anything earlier than 10.5.

So how do you actually use @rpath? Ideally, I think that targets should be setup to use @rpath out of the box (see rdar://7396127), but unfortunately it takes a bit of legwork.

As of Xcode 3.x, you set the Installation Directory build setting of the framework (or library) target to just @rpath:

This will set the install name to something like this for frameworks:

@rpath/SpiffyKit.framework/Versions/A/SpiffyKit

And something like this for dynamic libraries:

@rpath/libspiffy.dylib

The linking application now needs to define what the @rpath expands out to. For any bundle, such as an application, framework, or plugin, you’d add @loader_path/../Frameworks to the Runtime Search Paths build setting to find embedded frameworks:

Note that you’re still using @loader_path here,as it’s still useful to find the framework relative to the actual binary.

If you want to embed dynamic libraries, it’s probably a good idea to put them in their own directory, say Libraries along side the Frameworks directory in the bundle. If you do this, add @loader_path/../Libraries to the Runtime Search Paths build setting, too. Remember, you can have more than one Runtime Search Path.

I’ve created a sample app project with an embedded framework project, both setup to use @rpath, on BitBucket called rpath-demo

% hg clone http://bitbucket.org/ddribin/rpath-demo/

It should compile and run on 10.5 and 10.6 using Xcode 3.0 or newer.

]]>

Xcode Unit Test Bugs

2009-10-24T12:16:22-05:00

One of the themes of my C4 unit testing talk was that unit testing isn’t as prominent in the Cocoa development community as it is in others, such as Java and Ruby. After I gave my presentation, I was able to talk to other people about unit testing and their experiences with it. I was pleased to hear that a lot of people are doing unit testing and want to do more of it.

Unfortunately, a lot of the discussions also revolved around “I’d like to start, but I just have a hard time getting going with Xcode” or some other Xcode or OCUnit limitation that got in their way. While I am grateful for Xcode’s unit testing integration, coming from IntelliJ’s IDEA for Java, which had great testing integration, it does leave a lot to be desired.

Putting my money where my mouth is, I’ve recently opened up a bunch of bugs about improving unit testing in Xcode:

rdar://7215100 Unit tests should be debuggable with no manual configuration
rdar://7333513 ER: Automatically set build settings when adding a dependent unit test bundle
rdar://7333519 ER: Automatically create a unit test target when creating an target
rdar://7333525 ER: Re-release OCUnit as open source
rdar://7333564 ER: Add an executable for a unit test target instead of a run script phase
rdar://7333580 ER: Make creating custom OCUnit assertions easier
rdar://7333600 ER: Allow running of subset of tests
rdar://7333645 ER: Don’t start NSApplication when injecting tests into a dependent target

Please file duplicates or your own enhancement requests, if you have specific ideas, to let the Xcode team we want better unit testing.

]]>

Book List from my C4 Talk

2009-09-27T22:00:42-05:00

Thanks, Wolf, for putting on another great C4 and giving me the chance to speak about unit testing. At the end of my talk, I put up a list of books for further reading, and wanted to throw up quick and dirty links to those:

Refactoring: Improving the Design of Existing Code

Refactoring is a classic book on the art of cleaning up code as a you go. Ten years later, this is still a fantastic book. It covers code “smells” that indicate code that needs refactoring and describes over seventy different refactorings you can apply.

Test Driven Development: By Example

Test-Driven Development: A Practical Guide

These two books are good if you want to learn more about test-driven development. I’ve only read the Kent Beck book and really enjoyed it, but I’ve heard good things about Astels’ book, too.

Working Effectively with Legacy Code

Unit testing and doing test-driven development is much easier on greenfield projects. If, however, you have an existing code base that you’d like to add tests to, then Working Effectively with Legacy Code is for you. I love that Michael Feathers controversially defines “legacy code” as “any code without tests”.

xUnit Test Patterns: Refactoring Test Code

This is a behemoth of a book, but it’s chock full of good information. It covers a lot of patterns and anti-patterns of test code, and will almost certainly save you time by if you’re new to unit testing.

Clean Code: A Handbook of Agile Software Craftsmanship

This book is basically Pimp My Code on steroids. There’s a few examples of how to clean up code using refactoring and unit testing, and it covers good OO design principles along the way.

]]>

Concurrent Operations on Snow Leopard

2009-09-13T14:24:29-05:00

I previously wrote a post about concurrent operations and how to use them for asynchronous APIs like NSURLConnection. Unfortunately, the code in that post originally contained a serious error that broke when running on 10.6 (I’ve since updated it). The API documentation for NSOperation mentions new behavior for concurrent operations:

Note: In Mac OS X v10.6, operation queues ignore the value returned by isConcurrent and always call the start method of your operation from a separate thread. In Mac OS X v10.5, however, operation queues create a thread only if isConcurrent returns NO. In general, if you are always using operations with an operation queue, there is no reason to make them concurrent.

This new behavior raises a couple of issues. First, if you’re using a main-thread only API, it obviously won’t work well or at all when called from a background thread. Also, our start method is designed to start an asynchronous operation and return as quickly as possible. The thread that where start is called will generally die after the start method returns (most likely due to operation queues being implemented on top of Grand Central Dispatch). Thus, even if the operation is safe to run on a background thread, if it requires a run loop, it won’t work. This is because run loops are tied to a thread, and if the thread dies, then any run loop activity dies with it.

The simple fix I’ve found is to ensure that start is called on the main thread. This makes it safe for main-thread only APIs as well as asynchronous APIs that rely on the run loop:

- (void)start
{
    if (![NSThread isMainThread])
    {
        [self performSelectorOnMainThread:@selector(start) withObject:nil waitUntilDone:NO];
        return;
    }

    [self willChangeValueForKey:@"isExecuting"];
    _isExecuting = YES;
    [self didChangeValueForKey:@"isExecuting"];

    // Start asynchronous API
}

 This fix works well on both 10.5 and 10.6.

I’ve known about this problem for some time, and I apologize for not updating the previous post as soon as Snow Leopard was released.

]]>

Mmm… bagels!

2009-08-24T23:42:30-05:00

A few weeks ago, okay, almost two months ago now, I decided to make homemade bagels. I admit it, I’m a bagel snob. Most bagels you get these days are just round bread with a hole in the middle. Too soft, no crust. Bleh. Not even worth eating.

Granted there are a few places to get good bagels in Chicago, one of my favorite being New York Bagel & Bialy in Niles; however this isn’t very convenient for us city folk. While there are a few places in the city that import their bagels from Skokie (Eleven City Diner and Beans & Bagels come to mind), sometimes you just have to take matters into your own hands.

I ended up trying the recipe from the fine folks at Cooks Illustrated. It’s in their New Best Recipe book or Baking Illustrated book as well as on their website, if you have an account. The recipe is fairly straightforward, though will probably need a few specialty ingredients, like high-gluten flour and barley malt syrup. I had to get both of those at King Arthur Flour. The high-gluten flour sounds like it can be used for pizza dough, too.

The end result was fabulous bagels! Boiling them before baking gave them a nice, chewy crust. The inside itself was not too soft and dense. And as good as they were straight out of the oven, I felt they were even better the next day. These were certainly as a good as any bagel I’ve ever had. My wife even called them as good as Ess-a-Bagel in New York, which is quite a compliment! These were definitely worth making again.

If you like bagels, definitely try making them at home. The only downside is that you need a decent stand mixer. Because the dough is so dry, it’s very stiff and gave my stand mixer a real workout. In fact, I think it was responsible for stripping the gears on my KitchenAid Artisan mixer. I tried to make another batch of bagels a couple weeks ago, and it wouldn’t run at low speeds anymore. I can recall a couple other recipes that stressed the motor, too, but I think this dough just pushed it over the edge.

After investigating the repair options, we decided to get a factory refurbished KitchenAid Pro 600. I have too many attachments to switch to another brand, at this point. Talk about vendor lock-in, eh? The motor is more powerful and the gears and gearbox are all metal, so hopefully they won’t strip again. It’s only got a six month warranty so I just need to put it through its paces for those six months. The new mixer will arrive in about a week, and the first thing I’m going to do is make another batch of these.

]]>

Concurrent Operations Demystified

2009-05-05T22:47:40-05:00

NSOperation and NSOperationQueue are available on Leopard or the iPhone to help you parallelize your code. The idea is that if you have code that takes a long time to execute, you create an NSOperation subclass, override main, and put your long running code in there:

@implementation CalculatePiOperation

- (void)main
{
    // Calculate PI to 1,000,000 digits
}

@end

To execute an operation, you typically add it to an NSOperationQueue:

NSOperationQueue * queue = [[NSOperationQueue alloc] init];
NSOperation * piOperation = [[[CalculatePiOperatin alloc] init] autorelease];
[queue addOperation:piOperation];

If you add multiple operations to a queue, they all execute in parallel on background threads, allowing your main thread to deal with the user interface. The queue will intelligently schedule the number of parallel operations based on the number of CPU cores your users have, thus effectively taking advantage of your users’ hardware.

The only caveat is that the lifetime of an operation is the main method. Once that method returns, the operation is finished and it gets removed from the queue. If you want to use a class that has an asynchronous API, you have to jump through some hoops. Typically you have to play games with the run loop to ensure that the main method doesn’t return prematurely.

While there are times when you want to do this, it can also be a pain. In other cases, you may not be allowed to use the API on a background thread because it is designed to only work on the main thread. Enter concurrent operations.

Operations come in two flavors: concurrent and non-concurrent. In an unfortunate case of confusing terminology, the default NSOperation subclass is called non-concurrent. I say unfortunate because the way the are used on an operation queue, they run in parallel. So, yes, operations that run in parallel are called non-concurrent.

Concurrent operations are created by overriding the the isConcurrent method in your subclass to return YES:

- (BOOL)isConcurrent
{
    return YES;
}

When a concurrent operation is added to an operation queue, it is started not on a background thread, but on the thread on which they were added. So, yes, concurrent operations all run on the same thread whereas non-concurrent execute in parallel on different threads. Got that? Good.

Update 2009-09-13: This is no longer true as of 10.6. The start method is always called on a background thread as of 10.6. To work properly with main-thread only and asynchronous APIs that rely on the run loop, we need to shunt our work over to the main thread. More on this in a followup post.

In any case, another major difference with concurrent threads is that you override start, instead of main. Also, the operation is not finished once the start method returns. This allows you to control the lifetime of the operation.

When dealing with asynchronous APIs, we can begin the asynchronous call on the main thread in start and keep the operation running until it finishes.

We also have a few more responsibilities. We need to keep track of isExecuting and isFinished ourselves, and we need to modify them in a key-value coding compliant manner. I typically do this using instance variables. The operation is only considered finished when the isFinished property changes to YES.

For example, if we want to write an operation that downloads data from a URL using URLConnection, its initializer would be:

- (id)initWithUrl:(NSURL *)url
{
    self = [super init];
    if (self == nil)
        return nil;
    
    _url = [url copy];
    _isExecuting = NO;
    _isFinished = NO;
    
    return self;
}

The start method shunts itself to the main thread, kicks off an asynchronous NSURLConnection, and returns:

- (void)start
{
    if (![NSThread isMainThread])
    {
        [self performSelectorOnMainThread:@selector(start) withObject:nil waitUntilDone:NO];
        return;
    }

    NSLog(@"opeartion for <%@> started.", _url);
    
    [self willChangeValueForKey:@"isExecuting"];
    _isExecuting = YES;
    [self didChangeValueForKey:@"isExecuting"];

    NSURLRequest * request = [NSURLRequest requestWithURL:_url];
    _connection = [[NSURLConnection alloc] initWithRequest:request
                                                  delegate:self];
    if (_connection == nil)
        [self finish];
}

There are three important points here. First, we have to make sure we are running on the main thread. Second, we have to change the isExecuting property to YES. Third, our start method returns before the NSURLConnection has completed, but the operation is still executing. This means our operation stays on the queue while the NSURLConnection is running, all without having to play games with the run loop.

We are using a private finish method to end the operation:

- (void)finish
{
    NSLog(@"operation for <%@> finished. "
          @"status code: %d, error: %@, data size: %u",
          _url, _statusCode, _error, [_data length]);
    
    [_connection release];
    _connection = nil;
    
    [self willChangeValueForKey:@"isExecuting"];
    [self willChangeValueForKey:@"isFinished"];

    _isExecuting = NO;
    _isFinished = YES;

    [self didChangeValueForKey:@"isExecuting"];
    [self didChangeValueForKey:@"isFinished"];
}

The key point here is that we change the isExecuting and isFinished flags. Only when these are set to NO and YES, respectively, will the operation be removed from the queue. The queue monitors their values using key-value observing.

The URLConnection delegate methods accumulate data or end the operation, as appropriate:

- (void)connection:(NSURLConnection *)connection
didReceiveResponse:(NSURLResponse *)response
{
    [_data release];
    _data = [[NSMutableData alloc] init];

    NSHTTPURLResponse * httpResponse = (NSHTTPURLResponse *)response;
    _statusCode = [httpResponse statusCode];
}

- (void)connection:(NSURLConnection *)connection
    didReceiveData:(NSData *)data
{
    [_data appendData:data];
}

- (void)connectionDidFinishLoading:(NSURLConnection *)connection
{
    [self finish];
}

- (void)connection:(NSURLConnection *)connection
  didFailWithError:(NSError *)error
{
    _error = [error copy];
    [self finish];
}

As you can see, we don’t have to turn an asynchronous API into a synchronous one, and yet we are still able to package up this task as an operation. While it may seem a little counterintuitive to use an operation, it does have its benefits. For example, you can use the queue to limit the number of parallel downloads to two:

    _queue = [[NSOperationQueue alloc] init];
    [_queue setMaxConcurrentOperationCount:2];

Also, you can use operation dependencies to make sure tasks occur in a proper order.

In Textcast, we use concurrent operations almost exclusively. We package up NSSpeechSynthesizer, PubSub, and WebKit as concurrent operations since they all have asynchronous APIs. All of these APIs also have thread safety issues of some sort and are better run on the main thread. Concurrent operations make this easier to manage.

Download a full example project demonstrating how to use concurrent operations: Concurrent.tgz

]]>

Dave Dribin’s Blog

rsync Progress Demo

Linear Fit using Python and NumPy

TLDR: Python One-Liners

Using Numbers

Using Python and NumPy

Using Polynomial.fit

Using Polynomial.polyfit

Bonus: Plotting with Python

Overall Progress with rsync

Improving Zsh Performance

Measuring Performance

Profiling

Making it Faster

Conclusion

git Merge Commit Messages

Moving to HTTPS with Let’s Encrypt

Linode Recommendation

Switching to Jekyll

My Unit Testing C4[3] Presentation

Joining Apple

What will you be working on?

So you're moving to California?

Will you still be writing The Road to Code for MacTech Magazine?

What about public speaking?

What does this mean for Bit Maki and Textcast?

What's the future of MAME OS X?

Welcome Lily and Zach Dribin

trigint - An Integer-based Trigonometry Library

My Chiptune Cover of Don’t Stop Believin’

Fun with C99 Syntax

Structure Initialization

Compound Literals for Structure Assignment

Compound Literals with Primitives

Conclusion

iPad Dev Camp Slides

A440 - A Core Audio “Hello World”

Debugging Asynchronous performSelector - Calls

Swizzling for Fun and Profit

What I Miss from Java

What I Miss

A Flexible I/O Class Hierarchy

Everything in java.util.concurrent

Exceptions for Errors

Packages, a.k.a Namespaces

One File for Interface and Implementation

Type Safe Enum Classes

Annotations

Awesome IDEs (IntelliJ)

Mac and iPhone Applications with Unit Tests, Refactored

Duplicated Code

Removing Testing Hacks

Why Unit Test Cocoa and iPhone Applications

Handy sudo Settings

My First NES Homebrew

Mercurial and HTTP Passwords

NSConference 2010

Bottom Bars in Interface Builder

Requirements for a New Blogging System

Why Not Movable Type 4

What am I Looking For?

Static

No Client-Server RDBMS

Markdown

Templating

Other Considerations

Current Contenders

Using @rpath - Why and How

Xcode Unit Test Bugs

Book List from my C4 Talk

Concurrent Operations on Snow Leopard

Mmm… bagels!

Concurrent Operations Demystified

Using `Polynomial.fit`

Using `Polynomial.polyfit`