Monday, July 30, 2012

Lots of IT jobs in Boston (or "How not to network")

I attended a meetup last week for IT professionals, recruiters, and job-seekers in the Boston area.  For me it was a combination of wanting to start networking a bit more actively and maybe seeing if there were any PHP programmers who might fit in at Libboo.

The good and the bad...

My first observation was that there were far more people looking to hire than there were people looking for jobs.  This is good news if you're out looking!  At least this week in Boston.  (I'll have to wait to see if this is a trend or just coincidence.)

But my second observation was that most of the people there really didn't know how to make the most of an event like this.  (This applied both to the job-seekers as well as the prospective employers, which surprised me a bit -- I think of recruiters or managers as being generally more outgoing.)

It turned out that the organizer of the event was running late and couldn't get there to start things off.  So most people were sitting or standing around by themselves, sort of staring at the walls and waiting for someone to come along and connect them with what they wanted.

Don't do that.

Don't be that person.  If you wait around for someone else, you aren't in control of what you get... if you get anything at all.

I know it's difficult to start talking to someone.  Believe me, I really do.  I'm the biggest introvert that I know, and have been all my life.  I still would rather stay home or hang out with a very small group of good friends than go out to a noisy club.  I get exhausted dealing with people, even when I'm having a good time.

But if you're like me, you've heard this advice before.  Again, I know.  I heard it all the time too.  "Just go start talking to people!"  Or maybe, "They're just as nervous as you, you know."  None of those things helped, and frankly I suspect there are a lot of people who really like meeting new people and feel energized afterwards.  Having people lie to me or make the problem sound trivial always hurt more than it helped.

So what does help?

I can't speak for everyone.  (Even if I could, I wouldn't want to.  That's way too many new people to meet.)  But what worked for me was putting myself in a situation where I was forced to be social and outgoing.  I got a job as a Sales Engineer.  (That's someone sitting more or less between sales and engineering; participating in the sales process and meetings but keeping more to the technical aspects of a sale.)

And you know what?  It was hard.  Really, really hard.  I was uncomfortable and stressed and unhappy.  But it got easier, and the more I did it the easier it got.  I still don't jump out of bed every morning wanting to go out and talk to a hundred strangers, but now I'm comfortable enough doing it that I was able to get people talking at a meetup.

I don't know that I'd recommend changing jobs to everyone, but do something that pushes you past your comfort zone.  It will be hard at first, but sticking with it makes it easier.  Join a club, take a class, see a professional if you have to, but forcing yourself to learn and improve these skills will compensate you far more than the discomfort it causes.  (And these things are learnable!  Some people seem to be born knowing how to make small talk and put others at ease.  But the rest of us can learn with enough effort.)

Friday, July 20, 2012

Altering ebooks -- adding pages to an existing epub


I've been working with ebooks (specifically epub files) lately, particularly with modifying them.  There are some great tools out there to help with this work.

Calibre is fantastic at converting between formats, and better yet it has a command-line interface, so it can be part of an automated script.  It can also do some simple editing of metadata, allowing you to update the author, title, etc. of a book.  But there's no functionality for editing the content of an ebook.

Sigil is another awesome tool.  This is the go-to program for editing the content of an ebook.  It has one major drawback for me though -- it's not scriptable.  There's no way to do something like adding an informational page into an existing ebook without doing it by hand.

I did some research but wasn't able to really find anything that would let me add a page to an ebook non-interactively, so I did it myself.  I thought this might be useful to other people looking to modify epub files, I've included it below.

A couple of notes to keep in mind:

  • I wrote this in PHP because I needed to interface with a large existing PHP codebase.  This would be even easier to do in Python, but the logic here is pretty straightforward and should be easily adapted.
  • The epub format is really simple.  At heart it's some XML, some HTML (or XHTML), maybe a few images, all wrapped up in a zip container.  That's fortunate in that there are a lot of libraries out there to work on exactly these formats.
  • That said, there are some tricky bits to how the epub zip has to be structured.  You can't just throw everything into a zip and rename it, which means the built-in PHP zip libraries don't work for it.
The code is here, and a quick text overview follows after.

<?php
    /*** Helper function from php.net ***/
    // This allows the delete of a directory and its contents
    function rrmdir($dir)
    { 
        if (is_dir($dir))
        { 
            $objects = scandir($dir); 
            foreach ($objects as $object)
            { 
                if ($object != "." && $object != "..")
                     if (filetype($dir."/".$object) == "dir")
                         rrmdir($dir."/".$object);
                     else
                         unlink($dir."/".$object);  
            } 
        reset($objects); 
        rmdir($dir); 
       }
     }


    /*** Setup ***/
    // Since this is an example, we can hard-code some things...
    $loc = '/home/fader/Projects/Libboo/epub-test'; // Where epubs live
    $epub = 'test.epub'; // The epub we will modify
    $newepub = 'new.epub'; // The new epub we will generate
    $added_page = 'newpage.xhtml'; // The new page we're going to insert into it


    // Allocate a directory to work in
    $temp_path = sys_get_temp_dir() . "/" . uniqid("epub-");
    mkdir($temp_path) or die("Couldn't create temporary path.");


    /*** Let's do this thing! ***/
    // Open the epub archive
    $zip = new ZipArchive;
    $res = $zip->open($epub);
    if ($res !== TRUE)
        die("Couldn't open epub as a zip.");


    // Unzip the epub into a temporary location
    $zip->extractTo($temp_path);


    // *** Dig into the ebook container
    // The path is defined by the epub spec, so as long as this is a compliant
    // epub file, we should be able to find fit at this location
    $container_path = $temp_path . "/META-INF/container.xml";
    $container_xml = file_get_contents($container_path);
    if ($container_xml === FALSE)
        die("Couldn't open container XML file.");


    // Look in the container to find the spine
    $container = new SimpleXMLElement($container_xml);
    $spine_path = $temp_path . "/" . $container->rootfiles[0]->rootfile["full-path"];


    // Pull up the spine
    $spine_xml = file_get_contents($spine_path);
    if ($spine_xml === FALSE)
        die("Couldn't open the table of contents.");


    // Copy the new page into the correct location
    if (!copy($added_page, dirname($spine_path) . "/" . basename($added_page)))
        die("Unable to copy new page into temporary location.");


    // *** Decide where to insert a node
    // For this example, we'll just plug it in as the third element
    // Unfortunately, SimpleXML is too... simple to let us insert a node into
    // an arbitrary position, so we use the DOM object
    $dom = new DOMDocument;
    $dom->loadXML($spine_xml);
    // Fortunately the structure for an epub spine is pretty simple.  So we can
    // just get the list of pages ("item"s) and run down the tree a bit.
    $items = $dom->getElementsByTagName("item");
    $itemrefs = $dom->getElementsByTagName("itemref");


    // Let'ss grab the third element
    // (NB: Pretty much any epub should have at least 3 items.
    // (ncx, css, title, pages...)  But boundary checks are always a Good Thing.)
    if ($items->length < 3)
        die("Book is ridiculously short.");


    // *** Create and insert the new nodes
    // We'll need two nodes here -- one for the "item" and one for the "itemref".
    // Both need to be present for the new page to be found by the reader.
    $newitem = $dom->createElement("item");
    $newitem->setAttribute("id", "newpageid0");
    $newitem->setAttribute("href", basename($added_page));
    $newitem->setAttribute("media-type", "application/xhtml+xml");
    $insert_point_item = $items->item(3);
    $insert_point_item->parentNode->insertBefore($newitem, $insert_point_item);


    $newitemref = $dom->createElement("itemref");
    $newitemref->setAttribute("idref", "newpageid0");
    $newitemref->setAttribute("linear", "yes");
    $insert_point_itemref = $itemrefs->item(3);
    $insert_point_itemref->parentNode->insertBefore($newitemref, $insert_point_itemref);


    // *** Write it out
    $newxml = $dom->saveXML();
    $result = file_put_contents($spine_path, $newxml);
    if ($result === FALSE)
        die("Unable to write new XML file.");


    // *** Zip everything back up again
    // The mimetype needs to be stored, not compressed.  Unfortunately I have not
    // seen a way to do this with the PHP ZipArchive object.
    // This is the quick, dirty, nonportable, ugly way to do it:
    system("zip -q0Xj $temp_path/$newepub " . $temp_path . "/mimetype");
    // Since we're already calling the system zip binary, this is about 30 lines smaller
    // than using the PHP zip object to accomplish the same thing:
    system("cd $temp_path ; zip -q0Xj $newepub mimetype ; zip -qXr $newepub * -x mimetype");


    /*** Clean up after ourselves ***/
    // Move the new epub file to the working directory
    if (!rename($temp_path . "/" . $newepub, $loc . "/" . $newepub))
        die("Unable to move new epub file to $loc.");
    // Delete the temporary path
    rrmdir($temp_path);
?>

In short, here's what the above does:

  • Sets up a convenience function for cleaning up later
  • Extracts the contents of the epub file into a temporary location
  • Reads the container XML file (specified by the epub spec) to find the index of files (which could be in an arbitrary location inside the epub)
  • Copies in the new page to be added
  • Creates two XML nodes
    • One is the location of the file containing the new page
    • The other is a referent indicating where in the book that page should fall
  • Adds these nodes to the index
  • Zips everything back up
  • Moves the new epub to a specified location
  • Cleans up the temporary files created
It's pretty straightforward, all told.  The tricky bit is in zipping the files up -- epub requires that the mimetype file (specifying that it is an epub) must be the first file in the archive and stored rather than compressed.  This bit's tricky in PHP, so I copped out and just called the native system binary.

If anyone has any questions I'm happy to discuss this... it's a fun toy problem!



Wednesday, July 11, 2012

Configuring NFS on Ubuntu in Amazon EC2

(Quick note: if you're looking for the port ranges you need for NFS in EC2, check Step 4 below.)

When Libboo migrated from hosting everything ourselves to using Amazon's Elastic Compute Cloud (EC2), we decided to do a bit of rearchitecting at the same time to make scaling easier in the future.  Part of this was removing any assumptions that everything was running on a single server, so we wanted to put our data in one place that could be shared by any number of webservers.

As we are running everything on Linux, the obvious solution for this was the Network File System (NFS) protocol.  NFS is established and well understood in the Unix world... which means that there are a number of tools built for it and most of the bugs are worked out (or at least well understood) already.  We're using Ubuntu Server at Libboo, so that's what my examples use.  But this should work identically on any Debian-based distribution and be similar anywhere else.

It turns out to be easy to run NFS in EC2, but I didn't see any good documentation about exactly how to do it.  So to save others' time, here's what you need to do to set up NFS on Ubuntu 12.04 in EC2:

Step 1 - Install the NFS server

This is trivial on Ubuntu Server:
sudo apt-get update && sudo apt-get install nfs-kernel-server
This will download the latest version of the NFS server and set it up.

Step 2 - Configure the shared directories ("exports")

Getting to the file you need to work on is simple:
sudo nano /etc/exports
Actually configuring the exports is a bit more involved and unfortunately isn't a 'one size fits all' solution -- it just depends on what you want to share to whom and with what permissions.  But on the bright side there is a lot of good information out there about configuring NFS exports:
  • The NFS HOWTO has good, clear detail
  • The Ubuntu Help Wiki also has good information, though it's a bit verbose for my taste
Of course, searching Google for "NFS exports" will return a huge amount of help too.

Step 3 - Tell the NFS server about the exports

Once you've configured the exports, you need to tell the NFS server that you've done it:
sudo service nfs-kernel-server reload
This will make the NFS server load the configuration you've done and start using it.

Step 4 - Configure EC2 security

This is the magic bit that has to be done for everything else to work.  You need to tell Amazon to allow other systems to connect to your server on the ports that NFS expects to use.

Go to the EC2 dashboard and select "Security Groups" under "NETWORK & SECURITY".  Choose the security group you've put your NFS server in and add the following rules:
Inbound TCP ports 111, 2049, 44182, 54508
Inbound UDP ports 111, 2049, 32768, 32770 - 32800
 You should also be sure to limit these to a specific IP address (or range if you must).  Leaving these at the default of 0.0.0.0/0 will allow anyone on the Internet to connect to your server.  (You can -- and should -- also restrict this in the NFS configuration, but there's no sense in leaving ports open to anywhere you don't have to.)

Step 5 - You're done!

At this point the server should be working!  There is a lot more that you can configure but the defaults should be enough to get you running.  Now it's just a matter of configuring the clients to mount the shares.