My God, it's full of files
pythonic filesystem abstractions
Download as mp4.
Overview of different filesystem(-like) APIs in Python and attempts for unifying them
Pythonic filesystem abstractions: An overview of different filesystem(-like) APIs in Python and attempts for unifying them.
There's a lot of different filesystem(-like) APIs in Python. I intend to provide an overview of existing projects, their status and capabilities, and hopefully inspire you to work on improving things.
I intend to cover at least:
- sync/async nature of access
- path.py
- twisted.vfs
- twisted.filepath
- FUSE
- git (including my own project)
- Allmydata Tahoe
- CouchDB
- SFTP, NFS, Plan 9
Me
So what's a filesystem
- well, it has files
- folders (or some such)
- open, read/write, close
- file handle, seek position
- unlink, rename, mmap?
Filesystems
ext3, HFS+, VFAT, NTFS
NFS, CIFS
9P
Ceph, Allmydata Tahoe, Hadoop FS, ...
SFTP
S3
CouchDB
Venti + Fossil
git
FUSE
http://fuse.sourceforge.net/wiki/index.php/FUSE%20Python%20tutorial
GnomeVFS
KDE Input/Output (KIO)
APIs
Current Python API
Spread all over the place, built-ins and miscellaneous stdlib
file(),open(),fileobjectsos.listdir(),os.walk()os.path.exists(),os.chdir()os.unlink(),os.rename()stat,mmap
... Considered Hurtful
Where the pain started
Twisted
async
network
other process
uncontrollable delays
Deferred
twisted.vfs
good idea, but don't use
(sorry Andy)
Conch ISFTPServer
limited-purpose reimplementation of part of twisted.vfs
Things learned
async doesn't look like sync
Deferred not going to stdlib?
♪♫ pig and elephant DNA
just won't splice ♪♫
Concentrating on sync
(for now)
(but don't forget async)
(network still important)
Why replace?
Lots of reasons
Quick list
More detail later
Here we go
Non-native filesystems
Mockability
Mock writing to /etc
Fault injection
See Petardfs for a generic FUSE fault injecting filesystem.
Not nice for unit tests, but maybe for system tests.
Security
No .. from user
twisted.python.filepath
t.p.filepath
top = FilePath('toplevel')
sub = top.child('foo')
sub.createDirectory()
p = sub.child('bar')
with p.open('w') as f:
f.write('foo bar\n')
Security
Invisible dotfiles
Security
Virtual chroot
Security
Custom ACL
Nicer API
More features, where they exist
Transactions
Current API won't do
Global is bad
→ Accessible via single object
f = self.fs.open("myfile")
→ Mockability
implement needed part, KISS, croak on anything unwanted
→ Fault injection
wrap another implementation
→ No .. from user
p = self.fs.path("/my/safe/area")
p = p.child(user_input)
→ Invisible dotfiles
self.fs = NoDotfilesFS(self.fs)
→ Virtual chroot
self.fs = ChrootFS(
fs=self.fs,
root="/my/safe/area",
)
→ Custom ACL
self.fs = AccessControlFS(
fs=self.fs,
acl=acl_rules,
)
→ Nicer API
path.py?
The path.py website at
http://www.jorendorff.com/articles/python/path
fails to load unless
you do it on a full moon, in front of a mirror, and reload three
times.
In the end, I don't think path.py is a suitable base for this:
-
It tries to be a string.
-
It's cluttered; it even includes md5sum calculation and shutil calls. I think a base common API shouldn't include
rmtree.It's probably a nice pragmatic helper, just not a good common API.
path.py
top = path('toplevel')
sub = top / 'foo'
sub.mkdir()
p = sub / 'bar'
with p.open('w') as f:
f.write('foo bar\n')
→ Transactions
POSIX: atomic replace of single-file
could do atomic-write-on-close
→ Transactions
with git: arbitrary!
(sql-style redo on conflict)
with self.fs.transact() as t:
t.path("foo").rename("foo.old")
with t.path("foo").open("w") as f:
f.write("bar\n")
→ Can implement near-identical async API
Thank You
Questions? Opinions? Rants?
Find me during the conference or sprints to talk more.
Slides etc up on eagain.net