Wednesday, March 28, 2012

import large field from SSIS

Hi,

I am making a SSIS package that imports data from a application using a custom ODBC driver. The field in the application is set to be a "longvarchar" type field and can be from 2 characters to 2MB of data.

I've created a ODBC data connection in the SSIS package and use a "DataReader Source" to read the data I need. The sql statement is very simple

Select log from tablename

When I try to run the SSIS package with that statement it just goes to yellow on the DataReader Source and stops. It stays like that until I stop it. If I select other fields except for that field it works fine. Also I've been able to get it to succeed getting the log field if I select a log record that's not too big. The largest one I've been able to get is 800 characters, but I got one with 2500 characters that just stops on yellow.

In the Progress log the last line says:

[DTS.Pipeline] Information: Execute phase is beginning.

Does anyone have any ideas on how to resolve this?
Have you check the ValidateMetadeta properties of data source Is there any warning message appear in you data reader ?|||I've tried both with having the ValidateMetaData option to false and true but it doesn't make any difference. There is no error/warning messages in the progress log and I don't know of any other places to look for error messages.

This is really starting to annoy me, but this is the only way we can get the data out of that system so I need this to work...
|||

Hi,

DataReaderSrc is not particularly efficient about dealing with BLOB data, such as DT_NTEXT or DT_IMAGE columns. It may be that it is just being slow...

Unfortunately, DataReaderSrc does not utilize the perf counters for BLOB bytes read -- this is a known issue that is planned to be fixed in a future release. If you look at perfmon while the package is running, what's happening with the CPU and memory usage?

mark

|||Thanks for your answer. When looking at the perfmon and the task manager while running the package I was a bit surprised.
First of in the perfmon the Memory object is steady on 0, the Physical Disk object is going up and down from 0 to 20 and then there is the occasional spike up to 100.
Then there is the processor object which stays at around 50 constantly. Looking at the task manager the process: "DtsDebugHost.exe" is staying at 50% CPU and using 30.976 K memory. Is this normal when executing a package?

I was thinking too that it might just be slow. But I changed my query to only select 1 record based on the id of the record and it still stays on for 20 min+ (I stopped it after that). If I select a record that got less data in the BLOB field then it completes within 10 seconds.

My development server is a Intel Xeon dual 3.6 GHz with 3 GB memory so I don't think our server is good enough Smile

If the future fix will fix this issue that will be good enough, because I sort of told people that we have to find a different way to create the reports.
|||

The "future fix" I spoke of was just to add the performance counters to DataReaderSrc, which would help diagnose issues like this one.

I will try to set up a similar scenario here to see if I can reproduce the behaviour you are seeing.

thanks

Mark

|||

Hi Josh,

I created a package with a datareader source, and used connections of the type:

.NET Providers\SqlClient Data Provider

.NET Providers\Odbc Data Provider

In both cases, i was able to read 20 Mb of TEXT data in a second or two.

Is it possible for you to try using a different driver/provider?

Can you try using some other application with your driver/provider to see if you can read the data or if you have the same problem?

thanks

Mark

No comments:

Post a Comment