The documentation on the fuzzy lookup transform mentions that only columns of type DT_WSTR and DT_STR can be used in fuzzy matching. I interpreted this as meaning that you could not create a mapping between an input column of type DT_NTEXT and a column from the reference table. I assumed that you could still have a DT_NTEXT column as part of the input and mark this as a pass through column so that it's value could be inserted in the destination, together with the result of the lookup operation. Apparently this is not the case. Validation fails with the following message: 'The data type of column 'fieldname' is not supported.' First, I'd like to confirm that this is really the case and that I have not misinterpreted this limitation.
Finally, given the following situation
- A data source with input columns
Field_A DT_STR
Field_B DT_NTEXT
- A fuzzy lookup is used to match Field_A to a row in the reference table and obtain Field_C.
- Finally, Field_B and Field_C must be inserted into the destination.
Can anyone suggest how this could be achieved?
Fernando Tubio
One possible workaround is using a multicast transform to route the input columns around the lookup transform. A merge join transform can then be used to join the outputs from the multicast and the fuzzy lookup to include the DT_NTEXT field back into the data flow.
I've tried this solution and it works but I wonder if it is really necessary to resort to all these contortions.
|||
It looks like your workaround is the best approach. The Fuzzy lookup does not support DT_NTEXT, DT_TEXT, or DT_IMAGE columns as copy columns OR pass-through columns. I am not sure why that is, but I will try to find out.
Mark
|||I guess one of the reasons was performance, and that these columns require special handling. You might want to put in a request for this feature for a future release.
Thanks
Mark
Thank you Mark.
Considering my limited knowledge about the inner workings of the data flow pipeline I am very likely wrong, but I would have guessed that a pass-through operation merely involved copying some pointers around. In any case, the package creator can control which columns to pass-through and if he is concerned with performance, then he is in a better position to decide whether to include these columns in the output. So I guess it would be nice to have this choice in a future release.
Fernando Tubio
No comments:
Post a Comment