Difference between revisions of "Accessing Swestore with the ARC client"

From SNIC Documentation
Jump to: navigation, search
(arcproxy 1.0.1)
(Change examples to GridFTP, updates.)
Line 9: Line 9:
  
 
= Requirements =
 
= Requirements =
To access SweStore national storage using the ARC client you need to [[Grid_certificates#Requesting_a_certificate|get a grid certificate]] and [[Grid_certificates#Requesting_membership_in_the_SweGrid_VO|become a member]] of the SweGrid virtual organisation. If you want access to your own private storage area you need to have a SweStore [[Apply_for_storage_on_SweStore|storage project]]. You also need to have the ARC client installed in order
+
To access SweStore national storage using the ARC client you need to [[Grid_certificates#Requesting_a_certificate|get a grid certificate]] and [[Grid_certificates#Requesting_membership_in_the_SweGrid_VO|become a member]] of the SweGrid virtual organisation. If you want access to your own private storage area you need to have a SweStore [[Apply_for_storage_on_SweStore|storage project]].
 +
 
 
All SNIC systems have the ARC client installed. If yours doesn't, please contact support at your centre so they can fix this error as soon as possible. To install the ARC client on your own computer, please follow instructions [[ARC_client_installation|here]], or see the official Nordugrid [http://www.nordugrid.org/documents/arc-client-install.html ARC installation] page for more information.
 
All SNIC systems have the ARC client installed. If yours doesn't, please contact support at your centre so they can fix this error as soon as possible. To install the ARC client on your own computer, please follow instructions [[ARC_client_installation|here]], or see the official Nordugrid [http://www.nordugrid.org/documents/arc-client-install.html ARC installation] page for more information.
  
 
= Quickstart =
 
= Quickstart =
Basic commands are:
+
 
 +
== Basic commands ==
 
: <code>arcproxy</code> - unlock your certificate so you can use it. See [[Grid_certificates#Proxy_certificates|Proxy certificates]] for details.
 
: <code>arcproxy</code> - unlock your certificate so you can use it. See [[Grid_certificates#Proxy_certificates|Proxy certificates]] for details.
: <code>arcls</code> - for listing files. Works similarly to <code>ls</code>. Example <code><nowiki>arcls srm://srm.swegrid.se/snic/YOUR_PROJECT_NAME</nowiki></code>
+
: <code>arcls</code> - for listing files. Works similarly to <code>ls</code>. Example <code><nowiki>arcls gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME</nowiki></code>
: <code>arccp</code> - for copying files. Works similarly to <code>cp</code>. Example <code><nowiki>arccp myfile.txt srm://srm.swegrid.se/snic/YOUR_PROJECT_NAME/create_dirs_if_necessary/myfile.txt</nowiki></code>
+
: <code>arcmkdir</code> - for creating directories. Works similarly to <code>mkdir</code>. Example <code><nowiki>arcmkdir gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/newdir</nowiki></code>
: <code>arcrm</code> - for deleting files. Works similarly to <code>rm</code>. Example <code><nowiki>arcrm srm://srm.swegrid.se/snic/YOUR_PROJECT_NAME/whoops.txt</nowiki></code>
+
: <code>arccp</code> - for copying files. Works similarly to <code>cp</code>. Example <code><nowiki>arccp myfile.txt gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/myfile.txt</nowiki></code>
Note that the paths on storage are the same as in the web browser, only that the urls start with is srm://srm... rather than <nowiki>http://webdav...</nowiki>. Use <code>man</code> and <code>--help</code> to get more info on each command. Examples: <code>man arcrm</code> or <code>arcls --help</code>
+
: <code>arcrm</code> - for deleting files. Works similarly to <code>rm</code>. Example <code><nowiki>arcrm gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/whoops.txt</nowiki></code>
 +
 
 +
Use <code>man</code> and <code>--help</code> to get more info on each command. Examples: <code>man arcrm</code> or <code>arcls --help</code>
 +
 
 +
== Paths ==
 +
The ARC commands supports multiple storage protocols, we recommend using GridFTP with paths on the form <code><nowiki>gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/...</nowiki></code> but SRM (Storage Resource Manager) <code><nowiki>srm://srm.swegrid.se/snic/YOUR_PROJECT_NAME/...</nowiki></code> can also be used.
  
 
= Copying files =  
 
= Copying files =  
Line 29: Line 36:
 
normal '''cp''' command as shown in the following example:
 
normal '''cp''' command as shown in the following example:
  
  $ arccp archive.tar.gz srm://srm.swegrid.se/snic/YOUR_PROJECT_NAME/
+
  $ arccp archive.tar.gz gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/
  
 
Please note the trailing / which marks the destination as a directory.
 
Please note the trailing / which marks the destination as a directory.
Line 42: Line 49:
 
recursive copy.
 
recursive copy.
  
  $ arccp --recursive=3 foobar/ srm://srm.swegrid.se/snic/YOUR_PROJECT_NAME/
+
  $ arccp --recursive=3 foobar/ gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/
 +
 
 +
== Long-running operations ==
 +
 
 +
Note that copying large directory trees can take quite some time, and will fail if you're not aware of the following:
 +
 
 +
* Your login session created with the <code>arcproxy</code> command has a limited lifetime. Use <code>arcproxy -I</code> to show the remaining time. Use <code>arcproxy -c validityPeriod=xxH</code> to initiate a session with longer lifetime.
 +
* If you loose connectivity with the resource you're running arccp on the command will abort. A utility such as <code>screen</code> or <code>tmux</code> can be used to create a terminal session you can reattach to.
 +
* Transfer rates are largely dependent on the average file size, if you have a lot of small files the transfer will be slower than if you have large files.
 +
* We recommend to limit your transfer sessions (ie. the directory tree copied with each arccp command) to 1TB if you have mostly large (100+MB) files and to 100GB if you have smaller files.
  
 
= Listing files =  
 
= Listing files =  
Line 51: Line 67:
 
following example:
 
following example:
  
  $ arcls srm://srm.swegrid.se/snic/bils/db/uniprot/2012_05
+
  $ arcls gsiftp://gsiftp.swestore.se/snic/bils/db/uniprot/2012_05
 +
reldate.txt
 +
speclist.txt
 +
uniprot_sprot.dat.gz
 +
uniprot_sprot.fasta.gz
 +
uniprot_trembl.dat.gz
 +
uniprot_trembl.fasta.gz
  
 
Additional information can be listed by adding the '''--long''' option:
 
Additional information can be listed by adding the '''--long''' option:
  
  $ arcls --long srm://srm.swegrid.se/snic/bils/db/uniprot/2012_05
+
  $ arcls --long gsiftp://gsiftp.swestore.se/snic/bils/db/uniprot/2012_05
 
  <Name> <Type> <Size> <Creation> <Validity> <CheckSum> <Latency>
 
  <Name> <Type> <Size> <Creation> <Validity> <CheckSum> <Latency>
  reldate.txt file 151 2012-05-23 03:00:19 (n/a) adler32:f3f52f1d ONLINE
+
  reldate.txt file 151 2012-05-23 03:00:19 (n/a) adler32:f3f52f1d (n/a)
  speclist.txt file 1715169 2012-05-23 03:00:17 (n/a) adler32:91e59dae ONLINE
+
  speclist.txt file 1715169 2012-05-23 03:00:17 (n/a) adler32:91e59dae (n/a)
  uniprot_sprot.dat.gz file 462895141 2012-05-23 02:57:18 (n/a) adler32:0f131bb2 ONLINE
+
  uniprot_sprot.dat.gz file 462895141 2012-05-23 02:57:18 (n/a) adler32:0f131bb2 (n/a)
  uniprot_sprot.fasta.gz file 79935897 2012-05-23 03:00:20 (n/a) adler32:89844c57 ONLINE
+
  uniprot_sprot.fasta.gz file 79935897 2012-05-23 03:00:20 (n/a) adler32:89844c57 (n/a)
  uniprot_trembl.dat.gz file 9162678278 2012-05-23 02:52:01 (n/a) adler32:b2d7cfd5 ONLINE
+
  uniprot_trembl.dat.gz file 9162678278 2012-05-23 02:52:01 (n/a) adler32:b2d7cfd5 (n/a)
  uniprot_trembl.fasta.gz file 4456514443 2012-05-23 02:57:34 (n/a) adler32:2b73b2a1 ONLINE
+
  uniprot_trembl.fasta.gz file 4456514443 2012-05-23 02:57:34 (n/a) adler32:2b73b2a1 (n/a)
 +
 
 +
== Metadata ==
 +
 
 +
Metadatainformation on a specific file can be listed by specifying the '''-m''' or '''--metadata''' option. Worth noting is that the amount of metadata available differs depending on which protocol is used.
 +
 
 +
Examples:
  
Metadatainformation on a specific file can be listed by specifying the '''-m''' or '''--metadata''' option. In the following example the metadata information of the '''motd.1''' file is shown:
+
$ arcls --metadata gsiftp://gsiftp.swestore.se/ops/nikke/smallfile
 +
/ops/nikke/smallfile
 +
checksum:adler32:762606eb
 +
mtime:2013-04-12 11:06:56
 +
path:/ops/nikke/smallfile
 +
size:30
 +
type:file
  
  $ arcls --metadata srm://srm.swegrid.se/ops/motd.1
+
  $ arcls --metadata srm://srm.swegrid.se/ops/nikke/smallfile
  /snic/bils/db/uniprot/2012_05/reldate.txt
+
  /ops/nikke/smallfile
  accessperm:rw-r--r--
+
  accessperm:rw-r-----
  checksum:adler32:f3f52f1d
+
  checksum:adler32:762606eb
  ctime:2012-05-23 03:00:19
+
  ctime:2013-04-12 11:06:56
 
  filestoragetype:PERMANENT
 
  filestoragetype:PERMANENT
  group:25051
+
  group:25001
 
  latency:ONLINE
 
  latency:ONLINE
 
  lifetimeassigned:PT1S
 
  lifetimeassigned:PT1S
 
  lifetimeleft:PT1S
 
  lifetimeleft:PT1S
  mtime:2012-05-23 03:00:19
+
  mtime:2013-04-12 11:06:56
  owner:25064
+
  owner:25001
  path:/snic/bils/db/uniprot/2012_05/reldate.txt
+
  path:/ops/nikke/smallfile
  size:151
+
  size:30
 
  spacetokens:
 
  spacetokens:
 
  type:file
 
  type:file
Line 85: Line 119:
 
= Creating directories =  
 
= Creating directories =  
  
There is no command line tool for creating directories. This command
+
  $ arcmkdir gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/newdir
will probably be added in an upcoming release. In the meantime the
 
following procedure can be used:
 
 
 
$ touch dummyfile
 
$ arccp dummyfile srm://srm.swegrid.se/snic/YOUR_PROJECT_NAME/newdir/
 
  $ arcls srm://srm.swegrid.se/snic/YOUR_PROJECT_NAME/newdir/
 
dummyfile
 
  
The dummmyfile can be removed using the arcrm command described in the following section.
+
If the arcmkdir command is missing the ARC utilities need to be upgraded. You can work around this by copying a dummy file to the path you want and then deleting the dummy file.
  
 
= Removing files or directories =
 
= Removing files or directories =
  
  $ arcrm srm://srm.swegrid.se/snic/YOUR_PROJECT_NAME/newdir/dummyfile
+
  $ arcrm gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/newdir/dummyfile
  $ arcrm srm://srm.swegrid.se/snic/YOUR_PROJECT_NAME/newdir/
+
  $ arcrm gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/newdir/
  
 
To remove directories they have to be empty.
 
To remove directories they have to be empty.

Revision as of 09:33, 12 April 2013

< Getting started with SweGrid
< SweStore

This guide describes how to use the Nordugrid ARC client for storing and retrieving files from SweStore National Storage. The ARC client is usually used for sending grid jobs to grid clusters, but it also contains commands for data management. A complete user guide for the ARC client can be found in http://www.nordugrid.org/documents/arc-ui.pdf.

Requirements

To access SweStore national storage using the ARC client you need to get a grid certificate and become a member of the SweGrid virtual organisation. If you want access to your own private storage area you need to have a SweStore storage project.

All SNIC systems have the ARC client installed. If yours doesn't, please contact support at your centre so they can fix this error as soon as possible. To install the ARC client on your own computer, please follow instructions here, or see the official Nordugrid ARC installation page for more information.

Quickstart

Basic commands

arcproxy - unlock your certificate so you can use it. See Proxy certificates for details.
arcls - for listing files. Works similarly to ls. Example arcls gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME
arcmkdir - for creating directories. Works similarly to mkdir. Example arcmkdir gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/newdir
arccp - for copying files. Works similarly to cp. Example arccp myfile.txt gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/myfile.txt
arcrm - for deleting files. Works similarly to rm. Example arcrm gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/whoops.txt

Use man and --help to get more info on each command. Examples: man arcrm or arcls --help

Paths

The ARC commands supports multiple storage protocols, we recommend using GridFTP with paths on the form gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/... but SRM (Storage Resource Manager) srm://srm.swegrid.se/snic/YOUR_PROJECT_NAME/... can also be used.

Copying files

Copying files to and from resources is accomplished using the arccp command.

Copying single files

Copying single files is accomplished in the same way as using the normal cp command as shown in the following example:

$ arccp archive.tar.gz gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/

Please note the trailing / which marks the destination as a directory. Without a / the destination will be a file, which may or may not be what you wanted. All required directories are created when needed so the destination may be a nonexisting directory.

Recursive copying

Recursive copying is accomplished using the --recursive option to arccp. The argument to the option determines the depth of the recursive copy.

$ arccp --recursive=3 foobar/ gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/

Long-running operations

Note that copying large directory trees can take quite some time, and will fail if you're not aware of the following:

  • Your login session created with the arcproxy command has a limited lifetime. Use arcproxy -I to show the remaining time. Use arcproxy -c validityPeriod=xxH to initiate a session with longer lifetime.
  • If you loose connectivity with the resource you're running arccp on the command will abort. A utility such as screen or tmux can be used to create a terminal session you can reattach to.
  • Transfer rates are largely dependent on the average file size, if you have a lot of small files the transfer will be slower than if you have large files.
  • We recommend to limit your transfer sessions (ie. the directory tree copied with each arccp command) to 1TB if you have mostly large (100+MB) files and to 100GB if you have smaller files.

Listing files

Listing files on a resources is done using the arcls command. In the simplest form the command just takes a URL as input and displays names and directories without any extra information as shown in the following example:

$ arcls gsiftp://gsiftp.swestore.se/snic/bils/db/uniprot/2012_05
reldate.txt
speclist.txt
uniprot_sprot.dat.gz
uniprot_sprot.fasta.gz
uniprot_trembl.dat.gz
uniprot_trembl.fasta.gz

Additional information can be listed by adding the --long option:

$ arcls --long gsiftp://gsiftp.swestore.se/snic/bils/db/uniprot/2012_05
<Name> <Type> <Size> <Creation> <Validity> <CheckSum> <Latency>
reldate.txt file 151 2012-05-23 03:00:19 (n/a) adler32:f3f52f1d (n/a)
speclist.txt file 1715169 2012-05-23 03:00:17 (n/a) adler32:91e59dae (n/a)
uniprot_sprot.dat.gz file 462895141 2012-05-23 02:57:18 (n/a) adler32:0f131bb2 (n/a)
uniprot_sprot.fasta.gz file 79935897 2012-05-23 03:00:20 (n/a) adler32:89844c57 (n/a)
uniprot_trembl.dat.gz file 9162678278 2012-05-23 02:52:01 (n/a) adler32:b2d7cfd5 (n/a)
uniprot_trembl.fasta.gz file 4456514443 2012-05-23 02:57:34 (n/a) adler32:2b73b2a1 (n/a)

Metadata

Metadatainformation on a specific file can be listed by specifying the -m or --metadata option. Worth noting is that the amount of metadata available differs depending on which protocol is used.

Examples:

$ arcls --metadata gsiftp://gsiftp.swestore.se/ops/nikke/smallfile
/ops/nikke/smallfile
checksum:adler32:762606eb
mtime:2013-04-12 11:06:56
path:/ops/nikke/smallfile
size:30
type:file
$ arcls --metadata srm://srm.swegrid.se/ops/nikke/smallfile
/ops/nikke/smallfile
accessperm:rw-r-----
checksum:adler32:762606eb
ctime:2013-04-12 11:06:56
filestoragetype:PERMANENT
group:25001
latency:ONLINE
lifetimeassigned:PT1S
lifetimeleft:PT1S
mtime:2013-04-12 11:06:56
owner:25001
path:/ops/nikke/smallfile
size:30
spacetokens:
type:file

Creating directories

$ arcmkdir gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/newdir

If the arcmkdir command is missing the ARC utilities need to be upgraded. You can work around this by copying a dummy file to the path you want and then deleting the dummy file.

Removing files or directories

$ arcrm gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/newdir/dummyfile
$ arcrm gsiftp://gsiftp.swestore.se/snic/YOUR_PROJECT_NAME/newdir/

To remove directories they have to be empty.

Known problems

ARC 0.8 versus 1.0

In late spring 2011 Nordugrid release the 1.0 version of ARC (sometimes called 11.05). One of the new features of 1.0 compared to the previous 0.8 release was a new command set. Basically most of the ng* commands was replaced with the new arc* commands. Some functionality moved between commands (ngstat became arcinfo and arcstat) and some new commands was introduced (arcproxy as an replacement for grid-proxy-init, which wasn't an arc command at all but a part of the Globus Toolkit). There are still legacy compatibility binaries in place for the old ng* commands, but I strongly suggest that you use arc* when available.

If you on the same local account switch between ng* and arc* commands you may get warnings:

Bad format detected in file /home/jens/.arc/srms.conf, in line srm.swegrid.se 8443 2.2
Unwrapped data does not fit into buffer
Connection to server failed: Connection refused
Connection to server failed: Connection refused

or

WARNING: Bad or old format detected in file /home/jens/.arc/srms.conf, in line srm.swegrid.se 8443 gsi 2.2
WARNING: Bad or old format detected in file /home/jens/.arc/srms.conf, in line srm.swegrid.se 8443 gsi 2.2

There is a file, srm.conf, that gets automatically updated when accessing a resource. ngls and arcls does not agree on the content of that file. There are bug reports about it. That warning is just confusing and shouldn't be displayed. Another attempt using the same command will probably not display those errors again.

arcproxy 1.0.1

There us a bug in dCache which makes proxy certificates from arcproxy 1.0.1 unusable. This is the version distributed in the 11.05-2 standalone and MacOS clients. The error you get from arcls is:

ERROR: Failed listing files

All other version of arcproxy should be fine. If you encounter this version av arcproxy, please use grid-proxy-init if available. The generated proxy certificates should be equivalent.