Bikur Cholim בקור חולים

Tuesday, December 20, 2016

Data Import - Part VIII

1.       Properly Exporting / Importing Characters – see examples in paragraphs a. to aa. below, in three parts: Illegal Characters, Special Symbols and Foreign languages; this is followed by a solution or two: 
Part I:          Illegal Characters
a.                            In English, there is a difference between ‘ and ' - its dissimilar twin: Jacob’s or Jacob's? (note the difference: the first apostrophe is slanted whilst the second one in straight).
b.                           – means: - ( = a simple dash, as in Mars Record Code 4315)
c.                            – = - ( = a simple dash)
d.                           Ch’n means: ‘ ( = a simple apostrophe, as in “Ch’n” of Record Code 3337)
e.                           English extended characters like an apostrophe becomes fangled: “Int’l” should be “Int’l”.  Same with “Solly’s” for: Solly’s
f.                             The same with “wife?��s mobile” for “wife’s mobile”
g.                            A Dash or Stroke: like the name of the following company: “Molkerei Niesky – Niesky Werk”, which means: “Molkerei Niesky – Niesky Werk”.
h.                           Similarly the apostrophes that were transferred from FileMaker and opened in Excel show the following text instead of “ ‘ “: “’”, as in “Shackelton’s Milling Ltd”, which should read: “Shackelton’s Milling Ltd”. (Replaced over seventy (!) occurrences).
i.                              Similarly the quotation marks “ “like these” ” that were transferred from FileMaker and opened in Excel show the following text instead of “ ‘ “: ““”. 
Part II:         Special Symbols / Characters
a.       The Registered Symbol (®) need to be repaired, as it came over as “®” “or “®”.
Part III:       Non-alphabetical, non-special Characters, also found in search string that were copied from the Address Field in a web browser and when Paste Special was used to paste it into Word, the search operands and code strings are sometimes converted into computer lingo (See: [i])…
a.       Dash: -                                                                                                                                      %E2%80%8E
b.      _ (Horizontal bar or Underscore)                                                                                  %5F
c.       . (Period or fullstop):                                                                                                          %2E
d.      Comma (or , ):                                                                                                                       %2C
e.      Slash: \ /                                                                                                                                  HTML: %2F
f.        Colon: :                                                                                                                                     %3A
g.       Ampersand or ( & ):                                                                                                            HTML: &
h.      Chevron: < or >
i.         Quotation mark “”                                                                                                             HTML: %22
j.        Open Quotation mark “”                                                                                                HTML: %9C
k.       Close Quotation mark “”                                                                                                 HTML: %9D
l.         HTML Code Equivalent “€”                                                                                   %80
m.    + (Plus sign)                                                                                                                            %2B
n.      â (Small a, circumflex accent)                                                                                          %E2
o.      <                                                                                                                                                 <
p.      HTML Code Equivalent “Ž”                                                                                   %8E
q.      >                                                                                                                                                 >
r.        Double Apostrophe (Quotation Mark) - also in the Hebrew Character set  "
s.       weeks' for weeks’ (apostrophe) - Upper Apostrophe (') in the Hebrew Character set '
t.        non-breaking space                                                                                                             
u.      when I copy a non-breaking hyphen from MSWord into FileMaker, it pastes a ¨
Part IV: Foreign languages
a.       Wherever the words “Tel. ” or “Fax. ” Appear in Contact Fields, delete them, as they appear in the dialler as “835.” And “329.” Respectively.
b.      A space in Acrobat PDF originating in the USA, turns into a in FileMaker.
c.       Turkish characters are problematic, as they’re not recognised at all by Excel, like the name of the following company: “EVL?YA ?EKERLEME SAN. VE T?C. LTD. ?T?”, which means: “EVLİYA ŞEKERLEME SAN. VE TİC. LTD. ŞTİ” and “Lütfü Türközü” means “Lütfü Türközü”…
d.      Spanish characters like “á” are problematic, as they’re not recognised at all by Excel, like the name of the following company: “Andrea Sánchez”, which means: “Andrea Sánchez”.
e.      Spanish characters like “”” are problematic, as they’re not recognised at all by Excel, like the name of the following company: “Ingenio Azucarero “Roberto Barbery Paz””, which means: “Ingenio Azucarero “Roberto Barbery Paz””.
a.       The following four ingredients were exported as an Excel file from FileMaker Pro – UTF-8, the fifth is from the Contacts:
                                                                           i.      Ac Caprílico C-8 = Ac Caprílico C-8
                                                                         ii.      Ac Láctico = Ac Láctico
                                                                        iii.      Ac Mirístico = Ac Mirístico
                                                                       iv.      Ac.Acético = Ac.Acético
                                                                         v.      Gaétan = Gaétan
f.        Spanish characters like “ã” are problematic, as they’re not recognised at all by Excel, like the name of the following company: “Amorim & Irmãos., SA - Equipar Plant”, which means: “Amorim & Irmãos., SA - Equipar Plant”.
g.       José = José
h.      Asín = Asín
i.         Frédéric = Frédéric
j.        Micheál = Michel
k.       ?¹ = “Ö” from “Österreich”
l.         f??r = für
m.    Swedish characters like “ö” are problematic, as they’re not recognised at all by Excel, like the name of the following company: “Arla Foods Götene Ost”, which means: “Arla Foods Götene Ost”.
n.      The German “Jörg” becomes “Jrg”
o.      Scandinavian characters are problematic, as they’re not recognised at all by Excel, like the name of the following company: “Høgelund Dairy”, which means: “Høgelund Dairy”.
p.      Polish characters are problematic, as they’re not recognised at all by Excel, like the following names:
a.       The company: “Spóldzielnia Mleczarska”, which means: “Spóldzielnia Mleczarska
b.      “Å‚” in “Malwina GaÅ‚ach”, which should read “Malwina Gałach”
c.       “KrążyÅ„ska”, which means “Krążyńska”. 
d.      The same for Wróblewski / Wróblewski. 
q.      Irish characters are problematic
a.       à and ¡ (lowered i) as in Micheál (for Michael in English) comes out in the export file as “Micheál).
r.        German characters are problematic, as they’re not recognised at all by Excel, like the following name: “Rückert”, which means: “Rückert”.
s.       The õ character creeps in when doing an export and then import…
t.        French ê from a factory name “Pêcheur” (Sensient), should read: “Pêcheur”
I’ll start with the last problem: Foreign languages
Copy the text with the foreign language characters in them from the source document (i.e. an Outlook eMail message) and paste it into MSWord.  Then Cut & Paste the text into FileMaker.  Try to write a Script that will do all of it for you on the fly: Launch MSWord, Paste the text that you copied at source, Paste into FileMaker.  Try using the following Scripts to ‘massage’ the text further: Replace, TextStyleRemove, TextStyleAdd, Proper (),Lower(), Upper().
The instructions below come from the FileMaker Pro Gurus Website (See [ii]), which will help for cleaning the data so it is prepared for importing back into FileMaker (or any other database): See next page.
Remember that it is possible to set the import and export Character Set setting to one of the following:
·         ANSI
·         ASCII
·         Unicode
·         UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode, which is backwards compatible with ASCII.
·         Mac
·         ISO-8859-1 (informally also called Latin-1) is an 8-bit character set for Western European languages
·         Code Page
The resulting text in the output file will depend on the selection you made from the Character Sets in the above list, which explains why sometimes the exported data contain strange characters, as shown in the above examples.

No comments:

Post a Comment