Venti

From FBSD_tips

Jump to: navigation, search

WARNING: THESE METHODS ARE DESTRUCTIVE. BACKUP ALL DATA BEFORE YOU START, OR IT WILL BE LOST


Contents

[edit] What is Venti?

Venti is a network archival storage system. It consists of two main operations, write, which writes the given block of data to the server, and returns a 'score', which is an SHA-1 hash of the block that was written; and read, which given an SHA-1 hash, returns the data. It has no operations such as delete, modify, or even list. As such, venti is 'permanent', and has a few key features:

  • Blocks are consolidated; if two writes contain the same data, they get the same score, so are only stored once.
  • By using an sha160 hash, collisions are lowered. This means that venti can run with no permission checks or access control checks. If you have the hash, you can retrieve the data.
  • Man in the middle attacks are determined by the hashing algorithm, as long as sha160 has no collision attacks, you can be assured that the blocks returned by the server are the correct blocks you requested; and most importantly, you can verify this by running a simple operation on blocks.
  • Compression and encryption are simple to add on to venti. Compression is already done by venti, and encryption can be added on above it.
  • Because it's an append only log, alot of things are considerably more robust. Things are never overwritten, simply 'copy-on-difference'.
  • Multiple completely independent machines, with completely independent formats, can safely write to the venti server at the same time. Because different blocks return different scores, the data is forced seperate. But, if data is duplicated, it is consolidated transparently.
  • Any one on the network can use the venti server for whatever they need. All it does is write blocks and keep indexes.

Venti's on disk format is logically split into two areas, one is the 'Index', and the other, the 'Arenas' (there is a performance optimization called 'bloom', view the manpage for a description). The index is used by Venti to map scores to the locations in the Arenas, while the Arenas are the append-only log, logically split into large blocks (the default being 512M), so they can be written to other media. The index is the most critical portion, so it's recommended that they be given small dedicated fast disks (my example uses 4 10KRPM SCSI disks, which is considered overkill by some, but I had them open for use). Slow index sectors (and index sectors too small for the data log), will severely hamper Venti's speed, as it has to consult the index on every read/write.

There are some similarities (superficially) between Venti and ZFS, however, ZFS's transaction log is intended as a 'non-permanent' implementation of similar concepts, and the implementation resembles UFS and soft updates, more then Venti.

Please visit [1] for a more thorough explanation

[edit] Installing required ports

All you need is sysutils/plan9port installed.

[edit] Setting up the disks.

Now, we are going to assume the following setup:

  • 4 10GB SCSI Ultra-160 10KRPM disks (da0, da1, da2, da3)
  • 1 400GB SATA300 7200RPM disk (ad6)

We're going to use the 4 SCSI disks as the index, and the sata disk as the arena.

Due to a small bug in plan9port, you can't use disks directly. You have to use files. I'm hoping, someone will step up to the plate and fix the bug. As far as I can tell it's only in the fmt utilities, so if something else formats them, venti can use device nodes directly. So, we'll give disks proper filesystems (and MBR label so things like fsck won't complain), and mount them under /disks

for i in da0 da1 da2 da3 ad6
do
    fdisk -IB /dev/${i}
    newfs -L vent${i} /dev/${i}s1
    mkdir -p /disks/vent${i}
    mount /dev/ufs/vent${i} /disks/vent${i}
done

Now, we'll create a file as large as the filesystem, using dd. Let's get the size of the filesystem from df in 1k blocks:

# df
Filesystem       1K-blocks    Used     Avail Capacity  Mounted on
/dev/ad4s1a       18935374 3169118  14251428    18%    /
devfs                    1       1         0   100%    /dev
/dev/ufs/ventda0   8679996       4   7985594     0%    /disks/ventda0
/dev/ufs/ventda1   8679996       4   7985594     0%    /disks/ventda1
/dev/ufs/ventda2   8679996       4   7985594     0%    /disks/ventda2
/dev/ufs/ventda3   8679996       4   7985594     0%    /disks/ventda3
/dev/ufs/ventad6 378415398       4 348142164     0%    /disks/ventad6

and we'll pass it to dd's 'seek' parameter, subtracting a single block.:

dd if=/dev/zero of=/disks/ventda0/data bs=1k seek=7985593 count=1

And repeat for all disks.

[edit] Setting up Venti

First, we'll format the index disks, like so:

for i in da0 da1 da2 da3
do
    /usr/local/plan9/bin/venti/fmtisect isect${i} /disks/vent${i}/data
done

And our data log:

/usr/local/plan9/bin/venti/fmtarenas arena0 /disks/ventad6/data

Now, we'll create our venti.conf (and place it in /etc/venti.conf)

index main
isect /disks/ventda0/data
isect /disks/ventda1/data
isect /disks/ventda2/data
isect /disks/ventda3/data
arenas /disks/ventad6/data
mem 100m
bcmem 100m
icmem 100m
httpaddr tcp!*!http

I've tweaked the memory partitioning to match my machine (1GB of ram running nothing else), change accordingly, as the venti server is not able to do it on it's own.

Next, we'll format the venti system:

/usr/local/plan9/bin/venti/fmtindex /etc/venti.conf

And start our server:

/usr/local/plan9/bin/venti/venti -c /etc/venti.conf

[edit] Using Venti on FreeBSD

Note: Venti client on FreeBSD isn't as flexible as it can be, due to unrelated issues. This is mostly for testing purposes.

First, you need the 'venti' environment variable set to the ip of your venti server (I'll use 127.0.0.1)

setenv venti 127.0.0.1

You'll also need to set the plan9 environment variable:

setenv PLAN9 /usr/local/plan9

And add the binary path to the end of your PATH:

setenv PATH ${PATH}:/usr/local/plan9/bin

Now, we'll write something to the venti archive:

eh# /usr/local/plan9/bin/venti/write < /COPYRIGHT
2f46cbc4997b25521103efa911281ca474e708e0

What it returns is the sha1 'score' of the data, which can then be read like so:

eh# /usr/local/plan9/bin/venti/read 2f46cbc4997b25521103efa911281ca474e708e0
venti/read 2f46cbc4997b25521103efa911281ca474e708e0 0
# $FreeBSD: src/COPYRIGHT,v 1.8 2006/12/31 16:34:16 delphij Exp $
#       @(#)COPYRIGHT   8.2 (Berkeley) 3/21/94
...

[edit] Vbackup

Let's backup a disk to venti. Note, this only currently works with ext2 and ufs2 filesystems.

/usr/local/plan9/bin/vbackup /dev/ad0s1a >/etc/vnfs.conf

This creates a backup of the filesystem, using venti's automatic block consolidation features (which make a full backup and incremental backup identical), and appends it to /etc/vnfs.conf

Now we'll serve it via the NFS protocol:

/usr/local/plan9/bin/venti/vnfs /etc/vnfs.conf

Let's restore an image stored via vbackup, shall we? First, we'll open /etc/vnfs.conf (or look at the output of vbackup):

mount /eh/2007/1121/dev/md0 ffs:3f313b13cc3c51b485bde961db89535ad2ff6e55 2007/11
# 2007/1121 21:40:23.034 /dev/md0 ffs:3f313b13cc3c51b485bde961db89535ad2ff6e55
mount /dsbsd/2007/1122 ffs:9e2ab1a13dd3fa256147fb4023e0e6dcaf35b3ed 2007/1122/03
# 2007/1122 03:36:29.428 /dev/ad0s1a ffs:9e2ab1a13dd3fa256147fb4023e0e6dcaf35b3e

Notice how it writes the device name by default, as well as the hostname. Useful, eh? Let's restore our previous backup. See the long string after ffs:? that's the 'score', with it, you can retrieve the piece of data on the venti server that it matches (venti cares little about things like userids, permissions, etc). We'll use that to restore an image:

vcat 3f313b13cc3c51b485bde961db89535ad2ff6e55 >/dev/md0

[edit] Vac

Let's look at another application of venti; There is a utility called 'vac', which is very similar to tar, however, instead of it being a tape archive, it is a venti archive.

cd /usr/bin
/usr/local/plan9/bin/vac -f /root/usr.bin.vac *

Note that /root/usr.bin.vac is only a few bytes, the actual data is stored in the venti server; this is just a file containing the 'score' of the vac archive on the venti server.

An interesting property, is that if vac is passed the -m option, it will treat any .vac files as files to use to 'merge' into the archive; for example, I created an archive of all my vac backups of old UNIX systems, with a simple vac -m *.vac in one directory. The vac's will appear to be 'unextracted' to the root of the directory the .vac is in, when in reality, they merely copy the small score (using trivial bandwidth and no disk space).

Also, if you have a .vac containing the score of the last time the directory was vac'd, there is a performance optimization for the vacfile, the -q and -d options. The former consults the metadata for changes, and the latter specifies the previous vac, like so:

/usr/local/plan9/bin/vac -q -d /root/old-usr.bin.vac -f /root/usr.bin.vac *

[edit] vacfs

Next, we'll start a vacfs process to 'serve' our vac archive on the local machine (this can be run on any client machine, all that is needed is the .vac file, which is just a small string).

vacfs -f /root/usr.bin.vac

And, if you have fuse installed, you can connect to it like so:

9pfuse /tmp/ns.$USER.$DISPLAY/vacfs.usr.bin /mnt

Where usr.bin is the part of the vac file name before the .vac suffix.

Now an ls of /mnt will match /usr/bin at the point in time the vac was made.

Note that, unfortunately, fuse4bsd is still experimental; it can lock up under high load in some scenarios (for example, having my /usr/obj/ on a venti archive and make installworld caused an instant fu_read deadlock.

Also, there is no 'unvac' utility that could be used analogous to tar -xf or unrar.

[edit] Viewing the stats of the server

You may have noticed the httpaddr tcp!*!http line earlier in venti.conf. Venti has a built in httpd that serves statistics on the server, for example, point your webserver to:

http://machineip/storage
http://machineip/index
http://machineip/log


Please view [2] for more information.

Personal tools