Tue, 28 Mar 2006

IMSc mirror site of arXiv

The machine ( or is a mirror site for the preprint archive for scientific papers that originated at Los Alamos (for physics) and at Duke University (for mathematics); these sites had then been combined at the Los Alamos (LANL) centre and have since moved to Cornell University.

The idea for this mirror site was mooted in early 1997. We wrote an e-mail to the main site maintainers and asked what was required at our end by way of machine configuration. The appropriate hardware was procured and the machine was configured to run GNU/Linux (based on the Debian distribution) with the "apache" web server. At this point the IMSc had a 19.2Kbps leased telephone line connection to the internet so the mirror was "primed" with files on two DAT tape drives sent by post. The staff members at LANL then configured the software using "ssh" to login remotely. At this point the mirror became functional but the daily download was often only completed at the end of the day. Consequently there was a serious possibility that even temporary shutdowns of one hour could make the mirror out-of-sync.

Meanwhile, we had already written to ERNET that we wished for an upgrade of the link. In late 1997-early 1998 a radio link to VSNL was set up which had bandwidth of 64Kbps. This link was provided by ERNET by designating IMSc as a "node". We were also supposed to provide ISP services for some of the sites in Chennai via this link and we got one project assistant post from ERNET to help out. The improved bandwidth meant that the download was usually finished by mid-afternoon on most days with this link. However, we had some non-trivial technical difficulties with the equipment provided for this link and some files would block the link. Luckily this was rare and some workaround was found until better equipment was provided.

Since then we have gone through a number of transitions to higher bandwidth links, machines with more disk space and faster disks, adding more disk to existing machines and software upgrades. Other than this the major routine maintainance task is to take backups. Currently, this is automated by maintaining a duplicate of the mirror. Periodically, other problems which require manual intervention show up like the overloading of the services due to internet "robots"---these may sometimes have to be "banned" manually. Compared with the work required at the time of installation the activity level required is low.

Currently, the mirror is connected to the internet with an 8Mbps link and is up-to-date within about 2-3 hours from the time that the primary site is updated. There is enough processing power and disk space to last us for another year before we need to upgrade either.

People involved in IMSc: R. Basu, T. Jayaraman, K. Paranjape, G. S. Moni, Suresh Rao, Raveendra Reddy and IMSc administration.

People involved at primary site: P. Ginsparg, Brian May, Thorsten Schwander, Simeon Warner.


