Friday, March 30, 2012

Import RIS format to SQL Server?

Hello!

I have data exported from a Reference Manager 11, and need to import it
into and SQL database.

Each record has different number of fields. It is used to cite journal
articles.
(more about the format at
http://www.adeptscience.co.uk/kb/article/A626)

The format is very strange:

TY - RPRT
A1 - Esparza,J.
T1 - Report of a WHO workshop on the measurement and significance of
neutralizing antibody to HIV and SIV, London, 3-5 October 1988
Y1 - 1990
VL - 4
SP - 269
EP - 275
RP - Not In File
CY - San Francisco CA
PB - UC Berkeley
KW - HIV
KW - SIV
KW - AIDS
T3 - World Health Organisation Global Programme on AIDS
ER -

TY - CHAP
A1 - Franks,L.M.
T1 - Preface by an AIDS Victim
Y1 - 1991
VL - 10
SP - vii
EP - viii
RP - Not In File
T2 - Cancer, HIV and AIDS.
CY - Berkeley CA
PB - Berkeley Press
KW - HIV
KW - AIDS
M1 - 1
M2 - 1
SN - 0-679-40110-5
ER -

TY - CASE
A1 - Cary,A.
A1 - Friedenrich,W.
T1 - Redman v. State of California
Y1 - 1988/10/7
VL - 201
IS - 32
SP - 220
EP - 240
RP - Not In File
CY - ATLA Law Reporter
PB - San Diego County 45th Judicial District, California
KW - AIDS
KW - litigation
KW - AIDS litigation
KW - rape
U1 - ISSN 0456-8125
N1 - Raped inmate can press case against officials for contracting
AIDS
ER -

It looks like some of the columns are separted by CR but part of the
same colum. For instance 'KW' is seen multiple times per record but
should be one field called 'KW'.

Any idea how I would import this to SQL Server with the DTS?"Tmuld" <tmuldoon@.spliced.com> wrote in message
news:1113921439.658252.91360@.g14g2000cwa.googlegro ups.com...
> Hello!
> I have data exported from a Reference Manager 11, and need to import it
> into and SQL database.
> Each record has different number of fields. It is used to cite journal
> articles.
> (more about the format at
> http://www.adeptscience.co.uk/kb/article/A626)
> The format is very strange:

<snip
> It looks like some of the columns are separted by CR but part of the
> same colum. For instance 'KW' is seen multiple times per record but
> should be one field called 'KW'.
> Any idea how I would import this to SQL Server with the DTS?

The format looks rather awkward to handle in DTS transformations, so you'll
probably need some other solution. One would be to load the data into a
staging table, then use stored procedures to clean it up and INSERT it into
the destination tables. A better option is probably to parse the data
outside the database, using C#/Perl or whatever your preferred language is,
then load it. You might be able to transform it into XML, for example, then
use OPENXML or SQLXML Bulk Load to import it:

http://www.sqlxml.org/faqs.aspx?faq=13

Whatever solution you decide on, you could still use a DTS package to manage
all the steps, whether they are stored procedures or external programs.

Simon|||I think that your best bet would be to loop through the file in a VB
application which handles the logic. Usually, files that hold data in a
"record" format have the columns going across the file and rows going
down. They also usually have a set number of columns. While this format
isn't unheard of (I've worked with similar formats in the past) most of
the standard tools for importing files (BCP, DTS, BULK INSERT) are not
designed to deal with it. BCP might be able to get around it if you had
a fixed number of columns, but the variable number of columns would
present a problem there.

If you have no skill in a programming language like VB and you don't
have access to a resource who has those skills then you could BCP the
file into a staging table with an IDENTITY column. Make sure that the
ordering of the records in the file is preserved in the rows of the
table. From there you can loop through the rows sequentially with a
cursor. This is one of those VERY rare instances where I think a cursor
is called for. As I said though, I think you're better off using VB or
something similar if you can.

By looping through the records you can create a new row in your base
table every time that you hit a new record tag in the file then you can
create rows in children tables for each repeatable column type.

Good luck,
-Tom.

No comments:

Post a Comment