cat /dev/random

Friday, May 14, 2004

Clusterfuck

I'm thinking about writing some programs to let jobs that can be split up into small tasks that run independently (think S.E.T.I., genetic algorithms, neural networks, etc.) be run on any number of available computers. The basic idea is this: a client would check out a task from some server, run it in a sandbox, and then tar up and return the result data. I'm thinking about using chroot, and a bunch of mount -binds and hard links to get the needed binaries in place. For security purposes, all mounts would be done -ro, and only hard links to files which couldn't be editable would be allowed. Also, no suid executables would be allowed and the client would su to nobody and cd into the new /, since otherwise the application could break out of jail. Eventually, I'd like to set it up so that standard utilities could be requested based on some ID.



A task would consist of a template filesystem (which chroot would set / to), a set of utility package requests (later version, probably), an command to run (in the new chrooted pathspace, of course), and a path to tar up and return as the result. A server for this system would, in effect, turn tasks into results, pushing the actual cpu grunt work to the client machines. The clients would probably have to be run through nice, since it will likely eat up the CPU, and if this is to be run to sop up extra CPU cycles it should be done politely.



I'll probably get started on this within a few weeks, any suggestions before I start?

4 Comments:

  • Dur... I forgot that the -v option could be used for more than just listing the files inserted/extracted with the -c/-x options. This should make things much easier.

    By Blogger aducore, at June 3, 2004 at 9:52 PM  

  • This comment has been removed by a blog administrator.

    By Blogger aducore, at June 3, 2004 at 9:54 PM  

  • I have a preliminary version working, but it uses a perl module for processing .tar files which may not scale up well with large filesystem images. It doesn't look like tar would work any better; I need to be able to read through the tar file with some kind of iterator that doesn't pre-process files in order to get the security and efficiency needed for this project. Time to break out the source for tar (ugh, maybe lager.)

    Clusterfuck accepts a tar file containing two files: 'fsimage' and 'fstype'. 'fstype' contains a string with the name of the filesystem image type (ext2, ext3, ...) used to mount 'fsimage'. 'fsimage' is the file in which the filesystem is implemented.

    Now I'm going to try to add a method of requesting packages of utilities which are fairly standard, so that every task doesn't need to include standard libraries and utilities. Any preferences on how these requests should be made? The method I'm thinking about is giving each package a unique identifier (like 'coreutils') or a URL if they aren't standard enough to become a part of clusterfuck. It'll probably be a file containing one identifier or URL per line, perhaps named 'utils'.

    By Blogger aducore, at June 5, 2004 at 3:13 PM  

  • Interesting... I posted a comment after the "Dur..." one about how I couldn't post comments from mozilla on either my Fedora machine at home or the Red Hat machine I work on at the astronomy department. It appears as though somebody has removed it. Any ideas? Tim and Ben, you are the only people outside of blogger staff who should be able to do this... did one of you delete it, or am I being censored by blogger? Lets see if this post stays up.

    By Blogger aducore, at June 7, 2004 at 3:16 PM  

Post a Comment

<< Home