next_inactive up previous


Architecture of TapiocaStor

©2001 The TapiocaStor Group


Contents


List of Figures

1 Overall architecture

See Figure 1.

TapiocaStor is a highly componetized distributed network backup program that was architected to get something working quickly with lots of people working independently on small parts of the project. Almost every component can be tested in a stand-alone manner and we will be able to do actual network backups and restores long before the program is anywhere near finished.

TapiocaStor consists of these major components:

  1. tapio - the backup and restore engine. This is a highly componentized set of 'black boxes' which are strung together by 'glue'. They live in a 'gluepot' ('glue' won't glue together anything that's not in the 'gluepot'). Note that these are Unix-style components - they are stream based, not record based like COM or CORBA components, and most of them can be tested manually by feeding them pre-fab inputs and outputs via normal Unix/Linux shell redirections.

  2. glue - uses a recipe (provided by the Central Authority) to decide which components need to run on which systems in order to perform the requested operation. Uses various types of 'plumbing' to glue the components together, such as pipes, network connections, SAN addresses, etc.

  3. plumbers - some plumbing needs plumbers (plumbing services) running in order to operate. Plumbers are small servers that basically accept connections and then forward data back and forth from remote systems.

  4. pudding - The whole glued-together chain of components is called a Pudding. (Tapioca. Pudding. Yum).

  5. tapioca - the tapio central authority. This is responsible for deciding what glue to use to chain together tapio components to do the desired backup and restore operations, handling storage management, and handling the central data store (a MySQL database as of the time of this writing).
  6. tapigui, tapiweb, tapicli: These communicate with tapioca via the tapicom protocol in order to give it the proper instructions for performing backup and restore.

  7. Client programs: These communicate with tapioca via the tapicom protocol in order to give it the proper instructions for performing backup and restore.

Figure 1: Overall Architecture
\begin{figure}\centerline{\epsfxsize=4in \epsfbox{overall.eps}}\end{figure}

2 Tapio Components

2.1 Overview

Tapio components are all Unix-style components. Unix-style components traditionally process a stream of data. They traditionally accept data on their stdin, process it, and put out data on their stdout. They have command line arguments that tell them what to do and, optionally, name other data sources or destinations. They also have a stderr where they produce error messages for consumption or logging.

The basic sorts of tapio components are:

  1. Sources - these obtain data from someplace and feed a stream of data into some other component in the Pudding. The filesystem tree reader part of tapio is an example - it walks the filesystem looking for files matching its input patterns, and outputs a backup stream on its stdout. The Plumbing then takes care of sending that backup stream onward to the next component in the pudding.

  2. Sinks - these absorb data and store it someplace. For example, the tape writer part of tapio is an example - it accepts a stream of backup data on its input, and stores it on tape, taking care to handle things like, e.g., end of volume and reporting where particular chunks of data are located on the tape.
  3. Processors: These sit in the middle of a stream of data. They take in a stream of data on their input, perform some processing (such as, perhaps, inserting some log information into a database), and ship the data on outwards. The filelogger that sits in the stream of data that's about to go to the tape writer is an example of this - it sends data to the tape writer, the tape writer tells where it stored the data, then the filelogger writes the filename etc. and its tape position into a database so that the file can be easily located later.
  4. Tees: These take an input stream, and divide it into multiple forks. For example, if you want to duplex a backup (e.g. to make both a nearline and an offsite backup), you would use a Tee to split the backup stream into multiple streams.
  5. Funnels: These take multiple input streams and funnel them down to one output stream. For example, doing an incremental backup of a half dozen systems, you might want to use a Funnel to multiplex that data. Otherwise your tape drive is going to be shoe-shining like mad, or will be writing data at far less density than it is capable of doing (in the case of something like an Ecrix).

2.2 Sources

Figure 2: Examples of Sources
\begin{figure}\centerline{\epsfxsize=3in \epsfbox{sampsources.eps}}\end{figure}

Sources (see Figure 2) take data from some device or external source, and output a backup stream. Some example sources are:

  1. src_filesys: given a list of files on its stdin, it creates a backup stream on its stdout. (Sort of cpio-ish, I know, but this can be hidden by higher layers).
  2. src_tape: Given file specifications on its input, reads a tape or tape(s) and produces a backup stream (well, restore stream) on its output. Also needs a message_pipe and response_pipe for asking other components to handle end of tape and other such issues.

It is obvious that we can also easily have things like src_ndmp (to get a stream from a NDMP device), src_oracle (to get an Oracle backup stream without having to put Oracle offline), etc.

2.3 Sinks

Figure 3: Examples of Sinks
\begin{figure}\centerline{\epsfxsize=3in \epsfbox{sampsinks.eps}}\end{figure}

Sinks (see Figure 3) take a backup stream, and consume it by storing it on some sort of media. Some example sinks are:

  1. sink_filesys: Given a backup stream on its input, stores files extracted from that backup stream into the specified location. Output: a list of files stored, and any errors that occurred during the store.
  2. sink_tape: Given a backup stream on its input, stores the data on a tape drive. Produces a list of stream blocks and the location they were stored on the tape drive upon its output. Also has two other channels: message_pipe outputs control messages sent about, e.g., end of tape, while response_pipe reads responses that tell the sink what to do (e.g., resume writing, with tape ID now being xyz and volume ID now being abc).

It should be obvious by now that we can write sinks that can save data to a wide variety of destinations. If we want to write an archive to a ZIP disk rather than to tape, for example, we just need to write the appropriate sink and use it in our tapioca pudding rather than the sink_tape sink. Similarly, if we wish to restore data to a NDMP-based network appliance rather than to a filesystem, all we need to do is write the appropriate sink and use that in our pudding rather than the sink_filesys sink.

2.4 Processors

Processors (see Figure 4) take a stream, and do some processing on it. In the process they may touch the filesystem or a SQL database to obtain some information. Not all processors process the data stream, some process filename streams or otherwise do stuff that is important for backup and restore.

Some example processors are:

  1. proc_treewalk: Given a list of things to back up, along with possible actions associated with those things, builds a list of file names to feed to src_filesys. Works similar to 'find' in that it can, e.g., handle 'find all files changed since a timestamp file was changed', but also has the ability to do things like, e.g., take a database offline or put it online, via a command execution facility. Requires the src_filesys module to feed it a list of files written, where a file is added to the list *after* it has already been written (this allows the database offline/online stuff to work).

  2. proc_record: Given an input stream from a source, and a block location stream from a sink, writes out a record of what files are located at what position on the backup media. This record is written out to the database and may also be appended to the backup media.

Figure 4: Examples of Processors
\begin{figure}\centerline{\epsfxsize=4.5in \epsfbox{sampproc.eps}}\end{figure}

2.5 Funnels and Tees

These basically either combine multiple streams (funnels) or split a single stream into multiple streams (tees).

An example funnel is fun_mux (Figure 4). The multiplexer funnel operates by taking incoming streams from multiple systems, and combining them into one stream. It understands the basic structure of the backup streams and thus is able to keep the backups from stomping on each other. This allows multiple backups to go to the same tape in parallel (useful for doing network incremental backups).

Block location data is passed back upstream to the appropriate source so that data can be properly logged and recorded by an upstream processor responsible for that particular stream of data.

Figure 5: Examples of Funnel
\begin{figure}\centerline{\epsfxsize=3in \epsfbox{sampfun.eps}}\end{figure}

An example tee is tee_mirror (Figure 6). It takes a single input stream, and splits it into two output streams, so that a single backup can be duplexed (written to two different destinations). At the moment, due to limitations on how the tapioca store maintains backup information, only one of these streams will get its details logged (the Central Authority tells the 'glue' which stream's details to pass upstream). Since upstream could have been fun_mux, the data logger processors are upstream of tee_mirror.

Figure 6: Examples of Tee
\begin{figure}\centerline{\epsfxsize=2in \epsfbox{samptee.eps}}\end{figure}

3 Plumbers and Plumbing

3.1 Types of Plumbing

There are basically three kinds of plumbing used in 'tapio': anonymous pipes (i.e., created by a pipe(2) call), named pipes (used for plumbing other than stdin/stdout/stderr), and sockets (used by the plumbers to simulate pipes on remote systems). As far as all components are concerned, they are all sucking on or writing to pipes. Sockets are used only by the plumbers to (transparently) run a component on the remote end.

Since pipes and named pipes as plumbing are covered by a number of excellent books (such as Stephens), the plumbers will be the focus here. The plumbers take a variety of input pipes, multiplex them over a socket, and demultiplex them on the other end of the socket. They do the same thing with output pipes.

3.2 Network Plumbing

The plumb command installs some plumbing and talks to the designated remote plumber. A command is passed to the plumber telling it what component in the gluepot to execute on the other side of the connection. All local read and write named pipes named in the plumb command are echoed across the network connection, as are stdio, stdout, and stderr. The protocol allows a maximum of 254 filehandles shipped across the network this way (the other two are used for control). In actuality, more than 4 or 5 is unlikely.

plumb and netplumber (the server that lives on the other end of the connection) handle network encryption internally. They use the OpenSSL library to communicate via ARC4 stream cipher and RSA key exchange. If you are concerned about performance, OpenSSL also allows connections to be 'plain text' (i.e., w/no encryption), which still, however, provides for strong authentication using MD5 or SHA1 checksums.

4 Sample Puddings

A simple network backup looks like figure 7. Note that stderr for the entire pudding gets sent to tapioca (the tapio control authority), which presumably logs it. The record processor may also be sending some progress info to tapioca for display to the user. Neither of these are shown because neither affects the internal workings of the pudding. Note that the netplumber takes care of forwarding stderr as well as the two shown communication channels.

Figure 7: A simple network backup
\begin{figure}\centerline{\epsfxsize=4in \epsfbox{backupPudding.eps}}\end{figure}

A more sophisticated pudding will probably include 'lzop' on the data source side of things (system B) to compress the network stream, and 'lzop -d' between 'plumb' and the record keeper to uncompress the network stream. That way the data will be compressed going over the network.

5 Tapioca

5.1 Overview

The tapio central authority (see Figure 8)is a Java program which allows remote execution of operations on behalf of end users. It accepts commands from clients, dispatches them to the appropriate component for processing that command, and keeps the appropriate services running that are used by those components.

Despite the opinion of this author regarding threads, the tapio central authority is a multithreaded program. This is due to the fact that Java is not comfortable with the notion of a ``process''. Java has some language features that make it somewhat more ``thread safe'' than most languages, but it is envisioned that this will make the central server slightly less robust than an architecture based upon independent processes. It was judged that this was an acceptable tradeoff, due to the advantages of using Java in terms of portability and etc. Do note that puddings are spawned off as processes - the components therein are written in ``C'' or C++ and thus do not have Java's adversion to fork().

Figure 8: Overview of Central Authority
\begin{figure}\centerline{\epsfxsize=4.5in \epsfbox{caoverview.eps}}\end{figure}

The major components of the central authority are:

  1. Command dispatcher - accept commands from clients, and dispatch them to appropriate tapioca components.
  2. Component pool - these accept requests from clients, perform the required actions, and send data back to clients. Thus ``backup'' is a component in the component pool
  3. Services pool - this is a set of services that the component pool accesses as needed to manage resources, allocate storage, etc.
  4. Glue controller - this special service allows creating puddings (a pool of interconnected processes possibly running in a distributed fashion to perform a set of tasks in the network).
  5. Device control - these are ``C'' or C++ components for doing the dirty work of moving tapes in loaders, positioning to a location on a tape, etc., under the control of the storage manager. Note that some components in the glue pot (set of components that can be run by the glue controller) themselves have some device control (e.g. the SCSI tape reader can position to various locations on a tape when restoring single files, to avoid having to read the entire tape).

5.2 To be continued

When we get to the details of the central authority, we will dissect the architecture further here. At the moment the central authority's design is only roughly sketched and is still a couple of months away from work starting on it, so there isn't anything else for here. The priority at the moment is the pudding - we need to be able to do network backups before we need to worry about how to manage them.

6 Clients

6.1 Overview

Clients may live anywhere on the network that is capable of connecting to the central authority. Clients communicate with the central authority via the tapicom protocol. The tapicom protocol is similar to Kerberos in that you check out a ticket via a ``login'' process to create a session ticket, then all further communications are authenticated using that ticket. All communications are encrypted for security purposes. All operations other than the initial login require that you have already associated a valid machine,login id,password triplet with your session. All logins are authenticated against the machine indicated, unless there is already a machine,login id,password triplet in tapioca's own user database for that user (i.e., unless it is a server-authenticated user). Tickets remain valid until you log them out. On the server side, ``tapioca'' stores the ticket information in a database, so tickets never time out. The password itself is not stored - it is discarded after being checked for validity.

The tapicom protocol is command-response oriented. Commands are issued to the central authority. A response is received. More as details of tapicom are worked out.

All clients for Tapioca are written in Java.

There are three clients for Tapioca:

  1. Command Line Interface - this interface is the crudest, simplest, and harder to write than it would first appear.

  2. GUI Interface - this is a Java Swing interface. It is intended to run as a standalone application, *not* as a web applet. It uses features of the latest Java Developement Kit (1.3.1 as of this writing) and thus will not work with the antiquated JVM in most browsers.

  3. Web interface - this is a Java server pages application, written using the Tomcat applications server ( http://jakarta.apache.org/tomcat ). It can be installed on any server on the network that can connect to the Central Authority. The generated pages contain JavaScript, but no Java, due to the instability of the JVM in most browsers.

7 About

This document was written in LaTeX on Red Hat 7.1 Linux. All figures were drawn using 'xfig' and included in the document via the epsfig package. No non-free software was used in the production of this document.

About this document ...

Architecture of TapiocaStor

This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.47)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -local_icons -no_auto_link -no_subdir -split 0 -show_section_numbers Main.tex

The translation was initiated by The Unknown Hacker on 2001-07-01


next_inactive up previous
The Unknown Hacker 2001-07-01