Whilst having managed databases for a number of years, I've never had much involvement with localisation for different charactersets/client setups.
I am now faced with building a 'Data Warehouse', which I have configured as an AL32UTF8 characterset.
I am now learning that this has to host data imported from around 15 globally dispersed databases, running from 9.0.1 up to 10.2.0.4. As I can see it at the moment, these remote databases are all setup as US7ASCII databases.
All the ETL program code has been configured and has been running nicely, based on various in-house data "quirks"; and data has been propagated into the respository. However, I'm now being asked why some of the data looks like 'nonsense'. I.e. Square 'O's and upside-down question marks.
I'm led to believe that although the remote DBs are running US7ASCII, their clients are all configured differently. Consequently, in France we're getting French e 'acutes', Germany we're getting 'umlauts' inserted. In China - I have absolutely no idea what is going on - a lot of the data looks fine, but a good proportion of it doesn't.
I know, using differing keyboard setups, I can insert foreign characters directly into the Data Warehouse. I have 2 versions of it running 10G and 11G, and both are able to render the differing foreign characters entered into either.
I can insert the same 'test data' into a US7ASCII database, and it accepts it fine - it just can't be "re-read" the same way it went in, and when SELECTED across a DB Link to the 10G/11G warehouses, those DBs have no idea what these characters are either.
I'm presuming that the US7ASCII DBs are undertaking some characterset conversion for the odd foreign characters, and whilst these may be 're-read' by the foreign characterset client that inserted them - other DBs that may want to read the data may have no idea how to convert these character representations back to another client configuration.
I'm currently of the mind that whilst the local apps with their client config may "seem" fine and "dandy", we're somewhat cross-hobbled in terms of viewing this data and having it meaningfully rendered into a Data Warehouse running Unicode.
Can anyone out there give me some enlightenment on what I might be faced with here?
Another concern I have is:
I am being called upon to migrate a 9.0.1 US7ASCII Enterprise DB that contains these converted foreign characters from some applications (French, English, Germans, Dutch apps all use this database), to a 10.2.0.4 database Standard.
I have informed the business that the only supported method for a direct migration would be Export/Import between these 2 versions - and definately involving the Enterprise downgrade to Standard...
Now, I'm concerned that during the Export/Import I'm going to bogged down with character translation issues - such that what might get exported out and then imported in might THEN not be seen quite as it always has in the foreign clients thereafter...
Anyone out there got any words of wisdom on how I might deal with this, what 'gotchas' there will be, and if there is any "sane" way to circumvent them?
How are you export'ing / import'ing the data?
If you are using the exp/imp utility, you have to make sure the local environment variable NLS_LANG at the source (where exp is done) is set to a characterset compatible with the target (imp) database and also the local environment variable NLS_LANG at the target is set correctly.
The above is to avoid characterset conversions which could corrupt the data.
NLS_LANG has to be set at the highest of the charactersets between source and target.
"The person who says it cannot be done should not interrupt the person doing it." --Chinese Proverb