Cloud Storage v. Web Site - a comparison
Posted: Wed 05 Jun 2013 12:01
Having recently almost hit the 5GB storage limit on the BCRA web site, I have gradually been moving files from the web site into cloud storage, so that there is now about 8GB in cloud storage and 3GB on the web site (of which some will go to the cloud sometime soon). Much of our archive material does not require the facilities provided by a web server. For example, the Cave & Karst Science Catalogue Listing web page contains hyperlinks to the PDFs at the cloud storage location of http://cavescience-cloud.bcra.org.uk. Here's a few points that might be helpful to anyone wishing to do a similar thing.
An issue that Im still thinking about (and which is the reason for this posting) is which is the best location for material that needs to remain confidential?, lets say financial records or minutes of private meetings. There are a number of possibilities...
A further possibility would be to develop the system I use for Cave & Karst Science Online where the user has to log on to access the documents. The logon is transmitted en clair buts its somewhat easier to administer than the basic authorisation provided by web servers (i.e. when you password-protect a web page). The disadvantage is that, to avoid endless hand-writing of lists of files, the web server would need to run a 'dir' command to build up a list of documents to display to the user. That's easy if the documents are all on the web server; not so easy (but not impossible) if theyre in cloud storage at a remote location. Hmmm - I shall have to work on that.
- The salient point about cloud storage (e.g. as provided by Memset is that it is a file up/download facility (i.e. you cannot run server-side scripting). It is useful for storing large files that are hardly ever going to be accessed.
- If you have enabled public HTTP access to the archive -- e.g. via hyperlinks on a web page such as Cave & Karst Science Catalogue Listing, or via an index page on the cloud storage site itself, such as http://cavescience-cloud.bcra.org.uk) -- you might want to hide the hyperlinks from search engines, otherwise what happens is that Google and dozens of other bots (not all of which take any notice of your robots.txt file) will download and index your entire archive and, if youre paying for download bandwidth, you will get an unexpectedly large bill for archive material that is only supposed to be accessed by visitors once in a blue moon! I hid the links using a simple bit of JavaScript. The hyperlinks (the Anchor tags) are
so, if JS is not enabled, the user is referred to a suitable web page, e.g. /nojscript.html; but if JS is enabled, the makeLink() function turns the argument into a complete URL. If you've followed this so far, you'll know how to write the makeLink() function.
Code: Select all
<A href="/nojscript.html" onmouseover="this.href=makeLink('1_Transactions/ckt008.pdf');">
- You do not have to enable HTTP access; you can restrict access to FTP, and you can issue separate passwords for downloads and uploads.
- You ought to take a regular backup of the entire cloud site. I do this in a somewhat convoluted way. Firstly, I maintain a copy on my PC of what I have uploaded. Once a week I run a Windows 'scheduled task' that synchronises the cloud to my PC, i.e. it uploads any files that I have changed on my PC. Secondly, I run a separate sync task which downloads any files that have changed in the cloud storage to a separate copy of the cloud storage that I keep on a network drive. In other words - I back up my own work to the cloud storage, and I back up the cloud storage (which may include other people's uploads) to a third location.
- Generally, other people do not upload anything anyway but, where they do, I try to organise it so that they have their own folders and do not 'randomly' alter material created by someone else.
An issue that Im still thinking about (and which is the reason for this posting) is which is the best location for material that needs to remain confidential?, lets say financial records or minutes of private meetings. There are a number of possibilities...
- Encrypt the document. The encryption facility provided with the latest versions of WinZip is very powerful. All you need to do is to zip your documents with encryption switched on. Unfortunately, many of the people I deal with are not confident with tasks like this. Its also a bit of a pain if you have a large document archive.
- Store the document on a web server, which is password-protected. The standard authorisation method built-in to web servers is not particularly secure, and the user's password is transmitted en clair but the advantage is that the method is simple for the user. (Using an HTTPS page would be more secure). When I use this method I often make use of a Web Bug which sends me an email each time the page is accessed, although this is a bit like locking the stable door after the horse has bolted.
- Store the documents on a web site, outside the HTTP document root. This is standard advice for storing material that you want to keep invisible to web site hackers, e.g. PHP scripts. The documents cannot be accessed by HTTP but they can be accessed by FTP. Of course, this method depends on having a secure FTP password and it has the disadvantage of not being able to warn you if security has been breached (i.e. the web bug, above). Instead, you would need to inspect the FTP log regularly.
- Place the documents in cloud storage, with FTP access only. Here, you probably wont be able to get access to an FTP log, so youre very much in the dark as to how secure your data is.
A further possibility would be to develop the system I use for Cave & Karst Science Online where the user has to log on to access the documents. The logon is transmitted en clair buts its somewhat easier to administer than the basic authorisation provided by web servers (i.e. when you password-protect a web page). The disadvantage is that, to avoid endless hand-writing of lists of files, the web server would need to run a 'dir' command to build up a list of documents to display to the user. That's easy if the documents are all on the web server; not so easy (but not impossible) if theyre in cloud storage at a remote location. Hmmm - I shall have to work on that.