Dad's Project in the Garage

There's a trope about the dad who works constantly on his favorite old piece-of-junk project in the garage. He may never get it running, but getting it running is not the only point of the project. A bigger point is that it's the thing he goes to to clear his head and recharge. It's something that he can focus intently on, and it's something that's entirely in his domain.

I have a number of slow-burning projects at home, too. But they're not in the garage, they're in the cloud. Usually, they're web services. At work, as part of a team, I write code that goes on embedded devices. But at home, the entire product is mine. It's my chance to be a "full stack" engineer, the CEO, and principal customer all at once. It's nice to take on these different roles.

Updating a File

Let's take a quick look at updating a file. It's one of the most useful and common operations in programming, so it's really very well understood. A naive Python snippet to update a file with new data would look like the following:

with open(filename, 'wb') as f:
    f.write(data)

What's beautiful about it is that it's cross-platform, and Python's "with" statement takes care of closing the file that was opened, even if an error occurs when writing the data.

But when you deploy code into the real world, you have to dive deeper, and think about what really can go wrong. For example, the automatic closing of the file I mentioned above? Python flushes data to the file, but doesn't necessarily sync the file to the physical disk. And in my case, it really does have to be cross platform, and I can only have one process access the file at a time, and I don't want IO errors corrupting the data. That means finding a cross-platform file lock, doing all the work on the side, and atomically moving the side file to the production file. The following snippet (elaborated a little more at this gist) fixes those problems.

with filelock.FileLock(filename):
    with tempfile.NamedTemporaryFile(mode='wb', 
            dir=ntpath.dirname(filename), 
            delete=False) as f:
        f.write(data)
        f.flush()
        if platform.system() == "Windows":
            os.fsync(f.fileno())      # slower, but portable
            if os.path.exists(filename):
                os.unlink(filename)   # or else WindowsError
        else:
            os.fdatasync(f.fileno())  # faster, Unix only
        tempname = f.name
    os.rename(tempname, filename)     # Atomic on Unix

On the one hand, it's no longer a two-liner. On the other hand, I've got a function that works in all the environments I need it to, and I know exactly how it's going to behave in any exceptional situation.

Removing a Photo from a Web Album

My latest project is a web-based photo album. I could pay Flickr, Google or Apple for their web albums, of course, but they've made UX changes I don't like, and their costs are too high. So, I'm writing my own. It'll never be as polished as the production photo albums, but it'll be my jalopy, and I'll be able to tweak it in any way I choose.

One of the things my album owners need to do is delete photos they no longer want in the album. So, how should I implement that?

os.unlink(photo_filename)

That's the command to delete a file. You know it's not going to be that easy. Since we're talking about a web album, we'd want to consider a few things. We want an ideal user experience. It has to be fast, they have to be able to change their minds later, and concurrency can't be a problem. Here are some of the steps involved in deleting a photo from the album.

Use JavaScript to make them confirm their choice in their Browser.
Schedule the deletion of the entry in the local metadata file. It'll be done in a side thread or at a later time. Just make sure the user's web page's data refreshes quickly.
Add a new entry to a list of timestamped-files-to-be-deleted-in-a-month.
Run a cronjob that works on a regular cycle that actually does timestamp checking and the physical file deletion. (It's actually an S3 key deletion, which gets propagated to multiple cloud static-file servers behind the scenes. Even more remote complexity is encapsulated behind a single function call.)

What almost looked like it'd be a one-line call turns into JavaScript, AJAX, worker-thread creation, data file manipulation, and a cronjob with its own health and status reporting scheme.

This is fun! Amirite? This is what working on your own personal project is all about. So let's dive a little more deeply into step 2 above.

When the user confirms deletion of a photo, the first thing that should happen is that they get feedback. The photo needs to disappear from their view immediately. This happens before the server actually deletes the photo from persistent storage. JavaScript can manipulate the DOM right away in their browser. Over on the server, let's say you want to remove a photo's filename and its metadata from a list of photos.

Removing an Item from a Container

The task of removing the photo data from a list can be simplified to the task of removing a record from a container (l) when the first field of that record matches a certain criteria (s).

There's the loop:

for item in l:
    if item[0] == s:
        l.remove(item)
        break

That's a straight forward imperative language neutral non-Pythonic implementation. Iterate across the container until you find the item, and then delete it right away. Unfortunately, l.remove(item) is another O(n) function being called inside an iteration of the list. That can be fixed with the enumerate call:

for idx,item in enumerate(l):
    if item[0] == s:
        del l[idx]
        break

This is better. The enumerate() call returns an index that allows us to use the del call which takes only O(1) time.

Although that'd work, let's try to find a more modern and efficient solution. Use the enumerate call in a generator that returns the index of the item to be deleted:

idx = next((i for i,v in enumerate(l) if v[0] == s), None)
if idx is not None:
    del l[idx]

The generator call is very efficient, and maybe we're done. No we're not! You see, it's a trick question. The actual answer is:

DELETE FROM l WHERE f = s LIMIT 1;

When it's your own project, you can change the domain! The data never had to be in a container that Python could process. Why not store it in a SQL database? Or, hey, just for grins, why not keep it in Python, but why was the container accessed like an unordered list? Was it ordered? Then do a binary search.

idx = bisect.bisect_left(l, [s, ])
if idx != len(l) and l[idx][0] == s:
    del l[idx]

Nice, O(log n). Or, wait. Maybe it doesn't have to be ordered. Each photo has a unique filename. That's a key. Each photo's data could be a record in a Python set. Python has a "set" container that resembles mathematical sets with all the performance features you'd hope for. So let's make "l" be a set, and "r" be the row that has s in it, then...

l.remove(r)

Yay, that'll usually occur in constant order time! So have we found the best solution?

Of course not. We can make the theoretical problem as simple as we like. But in practice, the web album sometimes wants a database with different primary keys, sometimes it wants an ordered list of items, and sometimes it only needs a set of unique items.

For that matter, sometimes my user will be on a desktop computer, sometimes they'll be on a tablet, and often they'll be on their phone. That's a lot of CSS to experiment with.

That's what fiddling with your own personal project is all about! Dive deeply into whichever problem piques your interest at that moment. Make something work better. Even if the rest of the world doesn't know why you bother. Sometimes it's just what you need to be doing.

Photo by tashland / CC BY-NC-ND 2.0

October 26, 2014 | Filed under life, python, code and technology | 1 Comment

Climbing Lesson Number 10

Matthew Childs describes 9 Life Lessons from Rock Climbing:

They're great rules, and anyone can appreciate them. If you really are a rock climber, I think you can appreciate the nuances even better as they apply to each domain. I also think there's a tenth rule.

Rule #10: The first day is for route-finding.

When you go on climbing trips to new destinations, don't get too optimistic about climbing routes as soon as you get there. I think that's an easy pitfall for beginner climbers who are understandably excited to get on the rock. The first time one goes to a new place, getting lost and moving methodically is a part of the experience.

Allocate the time for it. Expect it and plan for it.

If you've set your expectations correctly, then the whole trip is more enjoyable, and it won't feel like a disappointment that you had to do some route-finding on the first day. That first day isn't a loss, it's a part of the preparation for the days of climbing to come, and they'll go more smoothly once you've got the site mapped out.

The same thing applies when going on other adventures in life. You may have a target activity in mind, like enjoying consistent profitability from the residuals of your micro-ISV, but plan on having to do some grunt work to get there. If you're entering uncharted territory, plan on doing a little of your own exploration and mapping.

May 1, 2009 | Filed under TED, climbing, life and rules | 0 Comments