Difference between revisions of "Swestore-irods"

From SNIC Documentation
Jump to: navigation, search
m
(iRODS)
Line 39: Line 39:
  
 
=== iRODS ===
 
=== iRODS ===
 +
 +
=== The SNIC iRODS system ===
 +
 +
The SNIC iRODS system at NSC and it is running on two
 +
physical servers as a collection of virtual machines.
 +
 +
The iCAT server is dealing with the metadata. It is running
 +
a Postgres database which containts information about where
 +
to find any particular file in the system.
 +
 +
There are four storage servers which got a small amount of
 +
local disk space and they use the dCACHE system via NFS4 to
 +
store larger amounts of data.
 +
 +
There is a web server to facilitate access via iDROP web.
 +
 +
Accessing the data is supported via the iRODS command line
 +
clients, iDROP and iDROP web.
 +
 +
Authentication is done preferably using Yubikey but traditional
 +
password authentication is also possible.
 +
 +
Currently we are running e-iRODS 3.0, Postgres 9.4 with ODBC 2.3.1
 +
on CentOS 6.4.
 +
 +
=== Using the SNIC iRODS system ====
 +
 +
Deailed documentation, papers and resources are available from
 +
the e-iRODS web site, http://www.eirods.org.
 +
 +
Web site for the community iRODS is http://www.irods.org.
 +
 +
To use the system we need to have the iRODS command line client,
 +
iDROP or iDROP web installed. For Unix systems the iRODS command
 +
line client is available as an installable package for various
 +
Linux platforms from the e-iRODS website downloads section.
 +
 +
The community iRODS clients also should work, they won't support
 +
the latest e-iRODS features though. They are downloadable from
 +
http://www.irods.org/. There are pre-built clients for Windows and
 +
MAC OS as well.
 +
 +
==== Command line client ====
 +
 +
The command line client is natural to use for Unix users.
 +
There are versions of the usual ls, rm, mv, mkdir, pwd, rsync
 +
commands prefixed with an i for iRODS, lie irm, imv, imkdir etc.
 +
 +
As expected iput and iget move files to and from the irods system.
 +
All these commands print short help when using the -h option.
 +
 +
To use these first we need to initialize the iRODS environment.
 +
There is an environment file .irodsEnv in the .irods subdirectory
 +
of the home directory which contains information where and how
 +
to access the iRODS metadata (iCAT) server.
 +
 +
There is a script with the name irods-client-setup which when
 +
executed will try it's best to create an irods environment file
 +
to use.
 +
 +
It looks like (placeholders are in <>):
 +
<pre>
 +
irodsHost '<fully qualified hostname with the dots>'
 +
irodsPort 1247
 +
irodsDefResource '<default iRODS resource name>'
 +
irodsHome '/<irods zone name>/home/<user id>'
 +
irodsCwd '/<irods zone name>/home/<user id>'
 +
irodsUserName '<irods user id>'
 +
irodsZone '<irods zone name>'
 +
irodsAuthScheme 'PAM'
 +
</pre>
 +
 +
The iCAT server is irods.swestore.se.
 +
The default irods zone name is snicZone.
 +
The default resource is snicdefResc.
 +
 +
With the corrent envirnment file all we need is a Yubkey or a
 +
password and we can run the iinit command to authenticate to
 +
the iCAT server. After that we can use the usual i commands.
 +
 +
More details on the i commands are available at
 +
https://www.irods.org/index.php/icommands
 +
 +
==== iDROP web client ====
 +
 +
The web client is accessible via the URL https://iweb.swestore.se/.
 +
A login screen will be presented first and your Yubikey shold
 +
be used to log in. More documentation is available at
 +
https://www.irods.org/index.php/iRODS_Browser/.
  
 
==== Acquire a SweStore YubiKey ====
 
==== Acquire a SweStore YubiKey ====

Revision as of 12:19, 31 October 2013


This is not official yet

SNIC is building a storage infrastructure to complement the computational resources.

Many forms of automated measurements can produce large amounts of data. In scientific areas such as high energy physics (the Large Hadron Collider at CERN), climate modeling, bioinformatics, bioimaging etc., the demands for storage are increasing dramatically. To serve these and other user communities, SNIC has appointed a working group to design a storage strategy, taking into account the needs on many levels and creating a unified storage infrastructure, which is now being implemented.

Swestore is in collaboration with ECDS, SND, Bioimage Sweden, BILS, UPPNEX,WLCG, NaturHistoriska RiksMuseet.

National storage

The Swestore Nationally Accessible Storage, commonly called just Swestore, is a robust, flexible and expandable long term storage system aimed at storing large amounts of data produced by various Swedish research projects. It is based on the dCache and iRODS storage systems.

Swestore is distributed across the SNIC centres C3SE, HPC2N, Lunarc, NSC, PDC and Uppmax. Data is stored in two copies with each copy at a different SNIC centre. This enables the system to cope with a multitude of issues ranging from a simple crash of a storage element to losing an entire site while still providing access to the stored data.

One of the major advantages to the distributed nature of dCache and iRODS is the excellent aggregated transfer rates possible. This is achieved by bypassing a central node and having transfers going directly to/from the storage elements if the protocol allows it. The Swestore Nationally Accessible Storage system can achieve aggregated transfer rates in excess of 100 Gigabit per second, but in practice this is limited by connectivity to each University (usually 10 Gbit/s) or a limited number of files (typically max 1 Gbit/s per file/connection).

Support

If you have any issues using SweStore please do not hesitate to contact support@swestore.se.

Getting access

Apply for storage
Please follow the instructions on the Apply for storage on SweStore page.

dCache

Acquire an eScience client certificate

Follow the instructions on Requesting a certificate to get your client certificate. This step can be performed while waiting for the storage application to be approved and processed. Of course, if you already have a valid eScience certificate you don't need to acquire another one.
For Terena certificates
If intending to access SweStore from a SNIC resource, please make sure you also export the certificate, transfer it to the intended SNIC resource and prepare it for use with grid tools (not necessarily needed with ARC 3.x, see proxy certificates using Firefox credential store).
For Nordugrid certificates
Please make sure to also install your client certificate in your browser.
Request membership in the SweGrid VO
Follow the instructions on Requesting membership in the SweGrid VO to get added to the SweGrid Virtual Organisation (VO) and request membership to your allocated storage project.

iRODS

The SNIC iRODS system

The SNIC iRODS system at NSC and it is running on two physical servers as a collection of virtual machines.

The iCAT server is dealing with the metadata. It is running a Postgres database which containts information about where to find any particular file in the system.

There are four storage servers which got a small amount of local disk space and they use the dCACHE system via NFS4 to store larger amounts of data.

There is a web server to facilitate access via iDROP web.

Accessing the data is supported via the iRODS command line clients, iDROP and iDROP web.

Authentication is done preferably using Yubikey but traditional password authentication is also possible.

Currently we are running e-iRODS 3.0, Postgres 9.4 with ODBC 2.3.1 on CentOS 6.4.

Using the SNIC iRODS system =

Deailed documentation, papers and resources are available from the e-iRODS web site, http://www.eirods.org.

Web site for the community iRODS is http://www.irods.org.

To use the system we need to have the iRODS command line client, iDROP or iDROP web installed. For Unix systems the iRODS command line client is available as an installable package for various Linux platforms from the e-iRODS website downloads section.

The community iRODS clients also should work, they won't support the latest e-iRODS features though. They are downloadable from http://www.irods.org/. There are pre-built clients for Windows and MAC OS as well.

Command line client

The command line client is natural to use for Unix users. There are versions of the usual ls, rm, mv, mkdir, pwd, rsync commands prefixed with an i for iRODS, lie irm, imv, imkdir etc.

As expected iput and iget move files to and from the irods system. All these commands print short help when using the -h option.

To use these first we need to initialize the iRODS environment. There is an environment file .irodsEnv in the .irods subdirectory of the home directory which contains information where and how to access the iRODS metadata (iCAT) server.

There is a script with the name irods-client-setup which when executed will try it's best to create an irods environment file to use.

It looks like (placeholders are in <>):

irodsHost '<fully qualified hostname with the dots>'
irodsPort 1247
irodsDefResource '<default iRODS resource name>'
irodsHome '/<irods zone name>/home/<user id>'
irodsCwd '/<irods zone name>/home/<user id>'
irodsUserName '<irods user id>'
irodsZone '<irods zone name>'
irodsAuthScheme 'PAM'

The iCAT server is irods.swestore.se. The default irods zone name is snicZone. The default resource is snicdefResc.

With the corrent envirnment file all we need is a Yubkey or a password and we can run the iinit command to authenticate to the iCAT server. After that we can use the usual i commands.

More details on the i commands are available at https://www.irods.org/index.php/icommands

iDROP web client

The web client is accessible via the URL https://iweb.swestore.se/. A login screen will be presented first and your Yubikey shold be used to log in. More documentation is available at https://www.irods.org/index.php/iRODS_Browser/.

Acquire a SweStore YubiKey

For authentication Yubikey one-time passwords (OTP) are used. With a simple touch of a button, a 44 character one-time password is generated and sent to the system.

When you apply for storage, please provide your email address and a physical address where the yubikey should be sent.

Difference between dCache and iRODS

dCache

To protect against silent data corruption the dCache storage system checksums all stored data and periodically verifies the data using this checksum.

The system does NOT yet provide protection against user errors like inadvertent file deletions and so on.

Access protocols

Currently supported protocols
GridFTP - gsiftp://gsiftp.swestore.se/
Storage Resource Manager - srm://srm.swegrid.se/
Hypertext Transfer Protocol (read-only), Web Distributed Authoring and Versioning - http://webdav.swestore.se/ (unauthenticated), https://webdav.swestore.se/
NFS4.1

For authentication eScience certificates are used, which provides a higher level of security than legacy username/password schemes.

Download and upload data

Interactive browsing and manipulation of single files
SweStore is accessible in your web browser in two ways, as a directory index interface at https://webdav.swestore.se/ and with an interactive file manager at https://webdav.swestore.se/browser/. Note that the interactive file manager has a lot of features and functions not supported in SweStore, only the basic file transfer features are supported.
To browse private data you need to have your certificate installed in your browser (default with Terena certificates, see above). Projects are organized under the /snic directory as https://webdav.swestore.se/snic/YOUR_PROJECT_NAME/.
Upload and delete data interactively or with automation

There are several tools that are capable of using the protocols provided by SweStore national storage. For interactive usage on SNIC clusters we recommend using the ARC tools which should be installed on all SNIC resources. As an integration point for building scripts and automated systems we suggest using the curl program and library.

Use the ARC client. Please see the instructions for Accessing SweStore national storage with the ARC client. Recommended method when logged in on SNIC resources.
Use lftp. Please see the instructions for Accessing SweStore national storage with lftp.
Use cURL. Please see the instructions for Accessing SweStore national storage with cURL.
Use globus-url-copy. Please see the instructions for Accessing SweStore national storage with globus-url-copy.

Tools and scripts

There exists a number of tools and utilities developed externally that can be useful. Here are some links:

Slides and more

Slides and material from seminar for Lund users on April 18th

Usage monitoring

iRODS

Supported clients

iDrop web - Point your Web browser to iweb.swestore.se
Command line client eirods icommands How to use icommand on SNIC clusters

The SweStore iRODS system

The SweStore iRODS system at NSC and it is running on two physical servers as a collection of virtual machines.

The iCAT server is dealing with the metadata. It is running a Postgres database which containts information about where to find any particular file in the system.

There are four storage servers which got a small amount of local disk space and they use the dCACHE system via NFS4 to store larger amounts of data.

Using the SweStore iRODS system

Deailed documentation, papers and resources are available from the e-iRODS web site, http://www.eirods.org.

Web site for the community iRODS is http://www.irods.org.

To use the system you need to have the iRODS command line client installed or using iDROP web. For Unix systems the iRODS commandline client is available as an installable package for various Linux platforms from the e-iRODS website downloads section.

The community iRODS client also should work, but you need to modify configuration (iRODS/config/config.mk):

PAM_AUTH = 1
PAM_AUTH_NO_EXTEND = 1
USE_SSL = 1 

Command line client

The command line client is natural to use for Unix users. There are versions of the usual ls, rm, mv, mkdir, pwd, rsync commands prefixed with an i for iRODS, i.e. irm, imv, imkdir etc.

As expected iput and iget move files to and from the irods system. All these commands print short help when using the -h option.

To use these first we need to initialize the iRODS environment. There is an environment file .irodsEnv in the .irods subdirectory of the home directory which contains information where and how to access the iRODS metadata (iCAT) server.

It looks like (placeholders are in <>):

irodsHost 'irods.swestore.se'
irodsPort 1247
irodsDefResource 'snicdefResc'
irodsHome '/snicZone/home/<email address>'
irodsCwd '/snicZone/home/<email address>'
irodsUserName '<email address>'
irodsZone 'snicZone'
irodsAuthScheme 'PAM'

The iCAT server is irods.swestore.se. The default irods zone name is snicZone. The default resource is snicdefResc.

With the corrent environment file all we need is a Yubkey and we can run the iinit command to authenticate to the iCAT server. After that we can use the usual iCommands. The ticket is valid 8 hrs.

More details on the i commands are available at https://www.irods.org/index.php/icommands

iDROP web client

The web client is accessible via the URL https://iweb.swestore.se/. A login screen will be presented first and your Yubikey should be used to log in.

Center storage

Centre storage, as defined by the SNIC storage group, is a storage solution that lives independently of the computational resources and can be accessed from all such resources at a centre. Key features include the ability to access the same filesystem the same way on all computational resources at a centre, and a unified structure and nomenclature for all centra. Unlike cluster storage which is tightly associated with a single cluster, and thus has a limited life-time, centre storage does not require the users to migrate their own data when clusters are decommissioned, not even when the storage hardware itself is being replaced.

Unified environment

To make the usage more transparent for SNIC users, a set of environment variables are available on all SNIC resources:

  • SNIC_BACKUP – the user's primary directory at the centre
    (the part of the centre storage that is backed up)
  • SNIC_NOBACKUP – recommended directory for project storage without backup
    (also on the centre storage)
  • SNIC_TMP – recommended directory for best performance during a job
    (local disk on nodes if applicable)