Wednesday, March 7, 2012

import complex Comma delimited text files into sql tables

I need to import comma delimited text files into sql tables. in one of the column, there is a comma in the string itself. e.g.

Cust_ID Name Phone Address1 Address2

Date that I have:

001,juia, anderson,4694568855,,Cedar Spring

The data does not have double quote as text qualifiers. but as you see, on the Name column, there is a comma, which is not a delimiter. can anybody give any suggestions on how i can deal with that? i would appreciate it so much.

thanksWink

There are several ways to accomplish this. Here are a few ideas.

1. The best, of course, would be to ALTER the table and add a FirstName column and use the current column for the LastName. It would then be so-o-o much easier to find someone by their LastName.

2. Change the delimiters on the input files.

3. Import into a staging table, and then combine the names (if you really must).

|||

Arnie, Thanks for the reply !

However, not all of the data has first name and last name. Some of them have no commas for first name and last name, some do, some even have firstname (and comma), middle name (and comma), and lastname. If i create another column, how do I deal with those that do not have comma in the name column?

Thanks,

|||


You have a complex problem.


It would be easiest to have the extraction process use a different delimiter. Then embedded commas would be simple to handle.


If you cannot effect a change in delimiter by the extraction process, you could create a staging table, with more than enough varchar columns to hold all possible combinations of falsely delimited data. And then execute a script that would assume that all columns up to the one containing a number (phone) was part of the name. You could then concatenate the parts into one field.

Not an ideal situation, but oftentimes, reality is messy.

|||Posted a similar response in the SSIS forums.
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1424223&SiteID=1

Basically, we all agree here. The data is poorly put together and will be almost impossible to determine which column is where in a progmatic fashion.

No comments:

Post a Comment