There are many ways to ETL the XML files, you can using c#(or any other language you’d like to) code parse it based on the schema and bulk insert into database, or import it into excel then load it into tables, or using SSIS, whatever you like and you feel you are comfortable with.
And today I would like to introduce, to those sql developers may not familiar or haven’t really practice it yet- using some light sql code to handle all the job at once, and it works like a charm. I will use an example to demonstrate it.
Let’s say you have received a XML project, to let you process the file hourly/daily/weekly. And your XML files are something like below:
<customer xmlns="customer" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <customerRow xmlns="customer"> <customerid>10000</customerid> <name>Schweizerische Nationalbank</name> <street1>Bundesplatz 1</street1> <street2/> <zip>3011</zip> <city>Bern</city> <site/> <statut>A</statut> <language>DE</language> <alpha>bns ber</alpha> <encashmentmethod>10</encashmentmethod> <representant>9</representant> <customerkind>1</customerkind> <customergroup>0</customergroup> <category>1</category> <creation_date>2009-11-13</creation_date> <modification_date>2015-05-22</modification_date> <syncvalue>5620.55821</syncvalue> </customerRow> </customer>
Before you start it, the very first thing to do is build a staging table, store your XML data in one field.
CREATE TABLE XMLCustomer
(
Id INT IDENTITY(1,1) PRIMARY KEY,
FileName nvarchar(30),
XMLData XML,
LoadedDateTime DATETIME
)
The ID, FileName and LoadedDatetime is not necessary, it just give you some alternatives to quickly identify which file is about what, so when you have multiple XML files based on the name ID or datetime, you can call different procedures.
The second step is bulk insert: using query down below, and exchange the directory with your file directory.
DECLARE @DT DATETIME
SET @DT=GETDATE()
INSERT INTO vicxuan..XMLCustomer(XMLData, LoadedDateTime)
SELECT CONVERT(XML, BulkColumn) AS BulkColumn, @DT
FROM OPENROWSET(BULK ‘V:\\ USER\FTP\Chatel\customer_group.xml’, SINGLE_BLOB) AS x;
Now we can query to parse the xml file, you don’t have to parse all element fields like I did, you can just pick the one you need to import into the table/tables.
DECLARE @x XML
select @x=XMLdata from vicxuan..XMLCustomer where id=1
;WITH XMLNAMESPACES(DEFAULT ‘customer’)
SELECT ‘CH_Chatel_St_Denis’ as SalesOfficeID,
x.v.value(‘customerid[1]’,’nvarchar(20)’) as CustID,
x.v.value(‘name[1]’,’nvarchar(50)’) AS Name,
x.v.value(‘street1[1]’,’nvarchar(120)’) AS Address1,
x.v.value(‘street2[1]’,’nvarchar(120)’) AS Address2,
x.v.value(‘city[1]’,’nvarchar(50)’) AS City,
x.v.value(‘zip[1]’,’nvarchar(20)’) AS postalCode,
case when x.v.value(‘statut[1]’,’varchar(1)’)=’A’ THEN 1 ELSE 0 END AS status,
x.v.value(‘site[1]’,’varchar(30)’) AS Site,
x.v.value(‘language[1]’,’varchar(5)’) AS language,
x.v.value(‘alpha[1]’,’varchar(30)’) AS alpha,
x.v.value(‘encashmentmethod[1]’,’int’) AS encashmentmethod, x.v.value(‘representant[1]’,’int’) AS representant, x.v.value(‘customerkind[1]’,’int’) AS customerkind, x.v.value(‘customergroup[1]’,’int’) AS customergroup, x.v.value(‘category[1]’,’varchar(5)’) AS category,
x.v.value(‘creation_date[1]’,’datetime’) AS creation_date, x.v.value(‘modification_date[1]’,’datetime’) AS modification_date, x.v.value(‘syncvalue[1]’,’decimal(17,4)’) AS syncvalue, x.v.value(‘language[1]’,’nvarchar(10)’) AS language,
INTO ##CUSTOMER_TEMP
FROM @x.nodes(‘/customer/customerRow ‘) x (v)
All the elements’ data is in temp table, you can alter data type, insert temp table data into destination tables, or call stored procedure to do data transformation, etc.
The whole query can be set up as a stored procedure, schedule in sql job, then you can automate all the process as the frequency you want.
Thank you for reading, if you have any questions please comment let me know.

Leave a comment