|
Huvudmeny (edit)Innehåll
Mina andra siter |
FreeSBI, a concept for low cost secure distributed storageFreeSBI (Free Storage Base Initiative, also a floating disc) AbstractFreeSBI is an attempt to create a viable long term redundant storage concept that handles more data than traditional RAID systems and provides for a more flexible handling of disks. The primary keys to the concept are to store each file intelligently on a subset of the disks in the disk pool and to allow the pool to span many separate servers connected through a network. Key ideas of the concept include minimal administration, flexibility, no technology lock in, no need to plan ahead when creating storage schemes, reliability and ability to handle any size of data. As the amount of data grows, so does the need for administration and organization of it. Therefore, FreeSBI also contains features to automate advanced administration of large amounts of data in complex environment through file system triggers and scripts. BackgroundThe storage needs of many organizations and even individual users are quickly becoming larger than the backup technology can handle in a cost efficient way. There is also a need for flexible, long term storage solutions, even if the amounts of data are relatively small. Traditional RAID arrays can handle the needs up to a point, but as data grows beyond that point, they become impractical. There clearly is a need for reliable storage of large volumes of data. Large volumes in this case being, for practical purposes, unlimited. Traditional RAID arrays, while a better solution than non-redundant storage, has some serious drawbacks:
The FreeSBI concept tries to overcome these problems through a more dynamic solution, distributed over several servers. The basic concept has these advantages:
There are drawbacks, mainly concerning performance, which will be limited by network performance. As this solution is aimed at large data stores or long term storage, rather than high performance applications, this is a reasonable trade off. A file server will not be able to serve data faster than the network can handle anyway. In addition to the traditional storage requirements, today's complex environments require much manual work to maintain. FreeSBI tries to at least provide some measure of automation to this, through file system triggers and scripting, allowing the system administrator or power user the possibility to automate common tasks. Not only does it reduce manual work, it guarantee that exactly the same actions will be taken every single time, without fail. This mechanism is in many ways closely echoes triggers and stored procedures in databases. TerminologyTo avoid confusion, some recurring terms are defined for this document:
Basic theory of redundancyTo make things clearer, we'll start with a short introduction to redundancy and why it's used. MirroringIn its simplest form, mirroring, data is simply stored twice, on separate disks. This, obviously, is not very scalable as data amounts increase, and also doubles disk space requirements. Mirroring is used in RAID 1. ParityA slightly smarter approach is the parity set (sometimes called XOR set). In this case, we need three or more disks. Data is divided among all but one of the disks, and on the last disk, a parity block is stored. Let's do a simple example of a two byte file in a three disk array: File contents (binary): 01011100 00110101
As you can see, the third disk contains a bit that is 0 or 1 in such a way that there is always an even number of ones. This means that if any disk goes bad, we can always recreate it by counting the number of bits that are one in the remaining disks and set the missing bit so that there is an even number of ones. Even better, we now have a lower overhead. This method works with any number of disks, so if you have 11 disks, you only have 10% overhead. It still have one important flaw, however. As disks frequently fail in in groups, due to electrical errors or overheating, this means that data loss is still likely. Parity is used in RAID 5. Reed-Solomon error correctionTo handle failures of multiple disks, a technique called Reed-Solomon error correction is used. The details of this algorithm is out of the scope of this paper, but a good starting point for more information can be found on http://en.wikipedia.org/wiki/Reed-Solomon. Basically, this algorithm is not that different from the parity bit, the main difference is that you can have an arbitrary number of redundant disks. For instance, it's perfectly possible to have 10 data disks and 4 redundant disks. In this setup, any four disks could fail and recovery is still possible. A side effect of this is that performance can be improved as any ten disks can be used when reading a file, as the data can be recreated with any ten parts in the set. On the other hand, write performance is reduced, as more data has to be written, although this can be buffered and done later to compensate for bursts of write activity. In a way, the redundant blocks can be seen as wildcards, which can replace any other block of the file. Reed-Solomon codes are used in, among other things, CDs and RAR files. A word of cautionRedundancy in itself provides protection against drive failures, but it does nothing to protect from such things as accidental deletion or a virus wiping out/damaging files. A deletion of file write will be faithfully done over all disks in the array, even if it's not intentional. For protection against that, a journalling/logging file system is needed, but that's outside the scope of this article, even though it's certainly possible to combine journalling with the FreeSBI concept. As a side effect, journalling also makes backups easier and less expensive. Naturally, a correctly configured security system with appropriate user privileges also reduce the risk of this happening a lot. FreeSBI theoryFreeSBI is based on the same principle as the Reed-Solomon correction and traditional RAID 5, but adapted to the special situation in several key ways. These adaptations are not overly complicated, and it's more a matter of releasing some of the restraints placed by traditional RAID systems and optimizing the layout of the files. Adaptable redundancyThe system is flexible enough to allow varying degrees of redundancy by selecting the number of data segments and redundancy segments used for each file. This allow fine tuning of the balance between overhead and safety. It's also possible to, within the concept, have different levels of redundancy for different files. For example, this could be implemented as a configuration file with patterns such as "/users/*/temp/* 0", signifying that everything matching this pattern should have any redundancy at all. This would allow for more efficient disk usage. Data is not stored on all disksAs the number of disks grows, it becomes impractical to spread the segments of a file over all of them. In theory, it becomes faster, but in practice, on a busy system, it creates a lot of chatter as every single request needs to be handled by all disks. FreeSBI overcomes this by not storing that many segments. It's primarily, but not exclusively, intended to work for large disk pools, maybe hundreds of disks or more, so it's impractical to spread files over all of them. Instead, the segments are spread over a smaller number of disks, for instance 10 disks for data and 2 for redundancy, even if the pool contains 20 disks. The data is stored on the disks with the most remaining free disk space. This has the side effect of the pool being able to mix disks of different sizes, as files will be more often placed on the larger drives. This will also make sense from a performance perspective, as larger drives are usually newer, and thus likely to have better performance. As this will quickly balance out the free space, we will soon get a choice of where to put files. This is good, as it allows us to lessen the risk of data loss in the case of an entire server getting lost/destroyed/stolen. To further increase reliability, depending on the size of the server farm, the block spread between the physical servers could be adjusted. In a large server farm, where the complete loss of a single server can be survived, the blocks should be spread as evenly as possible between the physical servers. On the other hand, if there are few servers, it's better to keep the blocks as much as possible on a single server, as a complete server fail on one server will at least leave the files on the other servers intact. This is probably not going to make a big difference, and it's probably not to be a high priority to implement. Another possible strategy for efficient placing segments of is to place the data segments as evenly spread as possible between disks. The reason for this is that redundancy segments (with the help of extra redundancy segments as described below) can sometimes be deleted to balance free disk space. From a purely theoretical viewpoint, data segments are not more important than redundancy segments and could be deleted, but it's still preferable to keep all data segments as it allows for a partial recovery even if more disks than the redundancy allows for should fail, so it's probably best to always keep the data segments. Disks can be added laterWhen a new disk is added, it becomes a disk with a lot of free space, which means that the algorithm above will tend to place segments on it more often, and it will fill faster until it the free space is evenly divided. As disks may be added late, when the pool is already full, there will not be enough disk space on the other drives to handle the data, as the new drive can only take one segment from each file. This will automatically and transparently be balanced by extra redundancy as detailed later in this document, without the need for user interaction. Disks can be removedRemoving a disk is simple. As soon as the system detects it's missing, it will start to recreate the missing segments on other disks. This might, to avoid doing it unnecessarily, instead of running the rebuild automatically, have to be done through user intervention after an administrative alert has been issued. This is a result of the distributed nature of the pool, where a temporary loss of a network connection could otherwise result in a big and unnecessary rebuild. Extra redundancyAs the observant reader may have noticed by now, the fact that we don't write data segments for a file to all disks gives us a rare opportunity to get something for nothing. When the pool is not full, we can use free disk space to store additional redundancy segments to the disks not currently used by a file. This means that we will be able to recover some files even if more disks than the normal level of redundancy would fail. It's also a potential performance boost when reading, as there is a wider choice of possible disks to retrieve data from. When writing, to avoid performance loss, we simply write to the segments required, and invalidate the extra segments, which can be recreated at a later time when there is idle time. These extra segments are deleted as disk space is needed for other uses, down to a minimum level of redundancy set by the user. A side effect of this is that the unbalanced pool problem described above is solved, as the first extra segments to be deleted would be the ones on the disks with least free space. A side effect is that it also solves the problem of rebalancing the pool when a new disk is added to a full pool. Without this extra redundancy, no files could have been added, as there is only one disk to put the segments on, but with it, extra redundancy segments will start to be written to it immediately. When a new file is written, the extra redundancy segments means that segments on other disks can be deleted, creating room for data on other disks. File system triggersTriggers are simply scripts that are run automatically when certain events occur. They can either be used to automatically react to changes or to prevent certain changes from taking place. Some typical uses are:
There is a need for two kinds of triggers, possibly three or four.
Of these trigger types, the two first are needed, the two last are handy, but a similar effect can be achieved by some creative scripting and the first two. Implementation of storageThis is in no way a detailed description of how a FreeSBI file system should be implemented, merely a collection of thoughts and considerations that probably needs to be taken into account. Stacked on another file systemAs we can't rely on having low level access to a disk on another computer, the obvious solution is to mount the disks on the server that will serve the file system to the clients. The file segments (both data and redundancy) are then stored in files with appropriate file names on these mounted disks. For example, myfile.dat would be stored as myfile.dat.d1, myfile.dat.d2 et cetera for the data blocks, and myfile.dat.r1, myfile.dat.r2 et cetera for the redundant blocks. The server that acts as a front for the FreeSBI volume to the clients thus have two mount directories.
This also have the side effect of making the development easier, as the fidgety details of low level access can be ignored and it's more or less just an issue of providing a translation layer for the high level calls. There may be a small performance overhead in this, but it should neglible, especially compared to the network bottleneck. As the data is stored in ordinary files on mounted remote (or local) file systems, as opposed to raw blocks in its own file system, partial recovery of data is possible even if more disks than the redundancy allows fails. It's possible for the admin to piece together the remaining file segments just by joining the files, thus at least saving some data. Of course, "fishing disk" like this will not be able to save all files, as many formats do not like missing parts, but at least there is a fighting chance. Managing to save 80% of one's source code is very much preferable to losing it all. A very neat side effect of stacking it on top of another file system is that testing during development becomes much easier, as physical disks and servers will not be needed for most of the testing, because they can easily be faked by using directories instead. This means that even large use cases (ie, with many disks/servers) can be tested on a single laptop if need be. The only thing that can't be tested reliably in a fake setup like this is performance. Small files and block sizeAs most file systems only use full blocks, for instance 16 kB blocks, a file that is smaller than a block will still use the entire block. Some file systems, most notably ReiserFS, has the capability to stuff several small files in one block. This can be a problem if many tiny files are stored, as there is a lot of waste. This problem becomes even worse in a RAID setup, as a tiny file is split up into many small segments, and all of these segments will use an entire block. So, if a 20 byte file is stored on a single disk with a 16 kB block size, it will use 16 kB. If the same file would be stored on a traditional RAID array with 8 disks, it would use 16 * 8 = 128 kB. That's over 6000 times the original size of the file. Of course, 128 kB may not look much, but if you have many small files, it quickly adds up. The way to lessen the effect of this in FreeSBI is to never store more segments of a file than needed to fill blocks. A 20 bytes file would be a single part, while a 30 kB file in a 16 kB block size example would get 2 parts. On top of that, of course, is the ordinary number of redundancy segments needed for recovery. It still means that small files will have a larger overhead, but it will still be much better than a traditional RAID. A nice side effect of this is that the different number of segments for different files makes it easier to compensate for different drive sizes, especially in smaller pools. Index for speedAs segments for a file is only stored on some of the disks in the pool, we need some mechanism to know which disks we need to look at to find the segments, or unnecessary chatter will degrade performance. This is either done by having a tiny index file stored for each file, which contains a list of which disk each part can be found on, or by having an indexed database with the same information. Regardless of how it's done, it's done just for performance reasons, and if the disk with the index fails, it can easily be rebuilt, simply by scanning the disks in the pool and build a new index, based on the files found. A consistent naming, as mentioned above, will make this trivial. To make the index even more robust, it could be implemented as a cache. In this case, it starts out empty, and as soon as it gets a request that's not in the cache, it searches the disks, fulfills the request and adds the result to the cache. If it encouters something that doesn't match the cached index, that record is removed and rebuilt with actual information from the disks. In this way, the index becomes self-administrating, and can not be corrupted and will rebuild itself automatically after a loss. It also makes it much easier to have several machines fronting against a single pool in order to achive 'no single point of failure'. The index cache, if used like this, should, in most cases, be large enough to hold the index of the entire pool, to avoid unnessesary disk reads from the pool. In other words, it should never discard an entry for any reason other than that it's incorrect. The one exception is if large parts of the files are "dead" files that are seldom accessed or if the array is huge. Seeding the cache, if one does not want to wait for it to build on the fly, is done by a simple recursive directory listing of the volume, which forces the file system to list every file, and thus give it the information it needs to fill the index. No single point of failureAs the disks are spread on several servers, the file server serving the clients can be any machine with access to these servers, or even one of them. In a small network, it could even be one of the clients. However, as long as it's only one machine, it's a single point of failure, but there is an easy solution. As the disks are accessed through high level calls, with normal file locking semantics in place, there is no reason to not allow several servers to act as front ends for the file system. This means that if one of them goes down, the rest will still work as usual. It also provides some means of load balancing, as the front end server will be the most heavily loaded machine in the pool. Implementation of file system triggersCall architectureThe actual detection of the trigger events should be trivial, the file system already needs to know when these events occur to function. This means that what we need to do is to provide a mechanism that fires external scripts and possibly responds to them. This should be fairly straightforward. The suggested implementation is quite simple, and intended to be accessible for administrators and power users as well as professional developers. This is the reason for using shell scripts (Bash, Perl, PHP, Python or any other shell scriptable language) instead of a more technical system call API. There is some performance overhead, but given the intended use, it will probably be of minor impact. Some triggers, specifically the List triggers, will have a greater performance impact, but they are also by far the least used and are only really needed for very special situations. When the trigger event occurs, the FreeSBI looks for a file named in a certain naming scheme in the directory or, if it's not found, in parent directories. When a script is found, it is called, using appropriate information about the event as arguments to the script. If a script is to be able to disallow an action, this is signaled by exiting with an error condition. For performance reasons, script locations, perhaps even the script contents, should probably be cached, which is easily done by hooking into the internal implementation of the trigger events, listening for changes to the scripts. A typical example of how a call might look, where the prefix .FSTrigger_ is used to separate them from ordinary files: .FSTrigger_CreateFilePre username /dir/dir/filename If the script doesn't want to allow this user to create this file, it returns an error, and the user recieves an error of insufficient rights, if it wants to allow it, it just exits normally and the operation procedes. For performance reasons, it's also possible to make scripts targeted for certain file types. For instance, the script named .FSTrigger_WriteFilePost_log would only be triggered if the file in question has the extension log. Possibly, one could create a more advanced syntax allowing wildcards for even higher flexibility, although, these, of course, would have to be replaced with characters allowed in file names. Wildcards do, however, create the problem of several scripts matching the file. The best way to handle that is probably to run all of them, and if they are of the responding type, only one needs to disallow for the result to be a disallow. It's simple, even borderline crude, but it's accessible for non-programmers and it's easy to keep track of what triggers/scripts are used. As the scripts are stored along with the directories they serve, if the directories are moved, the triggers/scripts are moved with them, minimizing the risk of mistakes. Another benefit is that since the scripts are stored like any other file, next to the files they handle, we ensure that all servers serving as a front end for the pool use the same scripts, which means that all requests will be handled the same way, regardless of which way they came. Trigger eventsExactly which triggers are needed is not yet carved in stone, but the following is a fair assumption (all references to filename/dirname is assumed to be full paths): CreateFile eventsCreateFilePre username filename Called before a file is created, return error to disallow. CreateFilePost username filename Called after a file is created. The file may still be open. CreateFileFail username filename errormsg Called if an attempt to create a file fails. WriteFile eventsWriteFilePre username filename Called before a file is written to, return error to disallow. WriteFilePost username filename Called after a file is written to and is closed. WriteFileFail username filename errormsg Called if an attempt to write to a file fails. ReadFile eventsReadFilePre username filename Called before a file is read, return error to disallow. ReadFilePost username filename Called after a file is read and is closed. ReadFileFail username filename errormsg Called if an attempt to read a file fails. DeleteFile eventsDeleteFilePre username filename Called before a file is deleted, return error to disallow. DeleteFilePost username filename Called after a file is deleted. DeleteFileFail username filename errormsg Called if an attempt to delete a file fails. ListFile eventsListFilePre username filename Called before a file is shown, for instance in a directory list, return error to disallow. In this case, disallowing does not signal an error to the user, it just hides the file from the listing. ListFilePost username filename Called after a file is shown. ListFileFail username filename errormsg Called if an attempt to list a file fails. RenameFile eventsRenameFilePre username oldfilename newfilename Called before a file is renamed, return error to disallow. RenameFilePost username oldfilename newfilename Called after a file is renamed. RenameFileFail username oldfilename newfilename errormsg Called if an attempt to rename a file fails. AttribFile eventsAttribFilePre username filename Called before a file has its attributes changed, return error to disallow. AttribFilePost username filename Called after a file has had its attributes changed. AttribFileFail username filename errormsg Called if an attempt to modify attributes for a file fails. OwnFile eventsOwnFilePre oldusername newusername filename Called before a file has its owner changed, return error to disallow. OwnFilePost oldusername newusername filename Called after a file has had its owner changed. OwnFileFail oldusername newusername filename errormsg Called if an attempt to change owner for a file fails. CreateDir eventsCreateDirPre username dirname Called before a directory is created, return error to disallow. CreateDirPost username dirname Called after a directory is created. CreateDirFail username dirname errormsg Called if an attempt to create a directory fails. DeleteDir eventsDeleteDirPre username dirname Called before a directory is deleted, return error to disallow. DeleteDirPost username dirname Called after a directory is deleted. DeleteDirFail username dirname errormsg Called if an attempt to delete a directory fails. ScanDir events (unclear if these are needed, and if it's possible to implement)ScanDirPre username dirname Called before a directory list is scanned, return error to disallow. ScanDirPost username dirname Called after a directory is scanned. ScanDirFail username dirname errormsg Called if an attempt to scan a directory fails. ListDir eventsListDirPre username dirname Called before a directory is shown, for instance in a directory list, return error to disallow. In this case, disallowing does not signal an error to the user, it just hides the directory from the listing. ListDirPost username dirname Called after a directory is shown. ListDirFail username dirname errormsg Called if an attempt to list a directory fails. RenameDir eventsRenameDirPre username olddirname newdirname Called before a directory is renamed, return error to disallow. RenameDirPost username olddirname newdirname Called after a directory is renamed. RenameDirFail username olddirname newdirname errormsg Called if an attempt to rename a directory fails. AttribDir eventsAttribDirPre username dirname Called before a directory has its attributes changed, return error to disallow. AttribDirPost username dirname Called after a directory has had its attributes changed. AttribDirFail username dirname errormsg Called if an attempt to modify attributes for a directory fails. OwnDir eventsOwnDirPre oldusername newusername dirname Called before a directory has its owner changed, return error to disallow. OwnDirPost oldusername newusername dirname Called after a directory has had its owner changed. OwnDirFail oldusername newusername dirname errormsg Called if an attempt to change owner for a directory fails. If low priority triggers are implemented, they will exist for all triggers that don't expect a response, and be distinguished by having LowPrio added to the end of the trigger name. A similar scheme could be used to handled delayed triggers, for instance adding a time interval in parens at the end (number and time unit), such as CreateFilePost(4week). Possibly, this could be extended with a time to run the script, such as CreateFilePost(sunday2300) which would trigger it on the next sunday at 23.00, or CreateFilePost(4week,sunday2300), which would trigger it at 23.00 on the first sunday after four weeks. The trigger mechanism would recognize this, and schedule the script to run at that time, either as a cron job or in an internal scheduler. Triggers and safety/reliabilityAs triggers are actually executing code on the server, the default setting should be to only allow admins to create or modify them. In a trusted environment, power users could also be allowed to do it, but this should not be default and requires an admin to specifically allow it. Another issue is the possibility of triggers causing cascading events, in other words, a trigger doing something that inadvertedly triggers the trigger again in an infinite loop. Given that only trusted users are allowed to use triggers, this is a minor problem, but some mechanism is needed to break such a loop. The easiest way is probably to just move/delete the script and fix it before replacing it, but it's possible that the script prevents that. In that case, a way to turn off all triggers completely is needed. This is also useful in some other special cases, such as when migrating a lot of files between servers, where you don't want to risk odd behaviour because data is temporarily not in compliance with the rules in the scripts. This is a lot like how triggers are disabled in databases for such operations. Typical use casesThis section is trying to show the versatility of FreeSBI through a few typical scenarios. Small user with needs of simple, but reliable long term storageThe user puts a number of disks, either in his desktop machine or in a simple file server, perhaps built from a discarded earlier desktop machine. A typical number of disks is four, but any number is possible. A relatively low redundancy is used, probably only one extra segment, although it's possible to use a higher redundancy for important files such as personal photos and documents. When the pool is full, the user either adds a new disk, or, if there is no room for more disks, just removes the smallest and replace it with a bigger, and data will automatically be rebuilt on the bigger disk. For most users, disk technology advance faster than storage needs, so this will be sufficient. This user don't care about triggers, doesn't need them and thus ignores them completely. Organisation with larger storage needsThis user is typically a medium size company, say around 100 employees, with some reliable storage needs with little downtime, some services such as web servers and legacy applications, and an overworked system administrator. In this case, the disk pool is spread over two or more servers, and expanded with new disks and servers as needed. As the pool looks like one big disk, there is no need to move stuff and changing links and paths on user computers when the pool is expanded, saving work and causing less interruptions for the users. Also, as disks can be added och replaced on the fly, there is very little downtime. All of the servers are set up to front for the pool, to increase reliability. A somewhat higher level of redundancy is used, problably two or three extra segments. To keep disk usage for work related data, triggers are used to block certain file types. Triggers are also used to automatically restart the web server when the config file change. The old legacy system is a bit wonky, and no one really dares poke around in it, so another trigger watches the logs and restarts the system as needed. Triggers are also used to migrate unused files to offline media. Huge data center with needs for huge data amounts and reliabilityThis user is most likely either a big organization, or an organization such as a big web hotel, science facility or university. There is a huge need for data storage, and there failure is not an option. There is a minor army of admins keeping stuff working. Here, we probably have one or more server halls, full of rack mounted servers with disks. The pool will contain hundreds, maybe thousands of disks, and tens or hundreds of servers. In a configuration like this, chances are that there will always be some server which is down. This means that we can't rely on any single server, so the files will be spread over many disks on several servers, with many redundancy segments, say five to ten. Hardware cost is a minor issue compared to reliability, so extra servers and disks are cheap. As disk space runs out, more disks are added to the pool. There are also several servers fronting for the pool, once again to spread the load and to remove the single point of failure. All of it is connected through the fastest possible network. As backups for such amounts of data is complicated, triggers are set up to make backups of new/changed data only. Triggers are also used for various monitoring tasks and for logging of possible malicious activity. They are also use to scan for viruses only when a file is actually changed, as a full file system scan takes more time than practical. As all data is on a single pool, user management and user restrictions becomes imperative. Scripts outside FreeSBI are set up to manage users and to make sure they get the proper permissions and space allotments. In addition to this, triggers are used for some monitoring and removal of inappropriate content. In this environment, expanding the pool, adding servers or replacing servers is not occasional events, it's an ongoing task. The ability to keep the pool online while such operations are done is necessary. Appliance file server for end userIn this case, we have a company which manufactures NAS units (Network Attached Storage), dedicated file server boxes. By using FreeSBI, it would be possible to automatically scale the appliance according to the needs of the user. The NAS would have an embedded Linux, customized as a file server with FreeSBI as top level file system. Say, for example, that the NAS has room for four disks. When the array is full, simply replace the smallest disk with a larger disk, or add another NAS device with more disks, which will detect the first, and they will cooperate neatly and look like one big disk to the user. This way, expanding storage need becomes a simple issue, and no planning ahead is necessary. When the user runs out of disk space, he runs out to the local computer store, and either buys a new disk for an existing NAS (empty disk slot or replacing a smaller disk) or another NAS with disk(s). Either way, just plug it in and the existing volume will be expanded with the new space, with no user interaction. This will make it possible even for non-computer specialists to have big storage capacities with the security of a multiple redundant distributed RAID array, bringing professional quality storage within reach of ordinary people. ConclusionThe FreeSBI concept have the potential to serve as a solution for storing vast amounts of data reliably and cheaply, unlimited by such factors as disk size and number of disks possible in a single server. It can use cheap hardware and still provide high reliability. It should also scale well, the main limiting factor being the speed of the network. It's not limited to large pools, though, and is just as useful for keeping a viable long term secure storage for smaller amounts of data as well. The main drawback, given its distributed nature, is performance, and there is no easy way around that. However, the nature of most such huge data stores is such that performance is not a key concern, and the FreeSBI pool should be able to serve data as fast as the network can provide. It would, however, be much less useful to put something like a database on a FreeSBI pool. A key part of its usefulness is that there is no need to plan ahead. You don't need to know your future needs when you initially build your pool, as more space and more reliability can be added dynamically at any time, without having to rely on current storage technologies and disks still being available in the future. As long as some kind of storage device can be mounted on the front end machine(s) or as network shares accessible on it, it can be added to the pool. Simplicity in administration and planning of the FreeSBI disk pool is best understood by looking at the solutions to more or less all administrator tasks, which is one or both of these actions:
Everything else should be more or less automatic. Even the two tasks above could be simplified by a user interface that shows status (free/remaining space, status of the disks in the pool, which server it's preferable to put a new disk in, current settings for redundancy and the overhead it causes and so on). In addition to this, the trigger concept makes it possible to automate many administration tasks and data storage rules, thus not only ensuring that data is correctly stored, but also that the correct data is stored and logical consistency maintained. While the trigger concept is not necessary for the storage concept, it makes handling large amount of data much less work intensive and safer, so it's a very useful feature. Remains to be done
|