Recently I needed to get the marketing insight replicated data from the data lake where it is replicated to, to Customer Insight. However, that turned out to not be very easy as the data had some formatting issues. However, I found that the Marketing reports that Microsoft have release for Power BI can be used as inspiration for how to query the data.
I am currently working on a rather long Proof of concept of Customer Insight for a customers. One of the things they wanted to see if it worked to get into Customer Insight was EmailClicked which is part of the Marketing Insight data that Marketing stores in an internal database. This can rather easily be replicated to an external datalake, with some configuration in the admin tab in Marketing. However, when I tried to connect to that data, using the Azure Datalake gen2 adapter in Customer Insight, it consistently said that there was no data. After some digging I found that the main reason for this was that the files in the datalake did not have the propper file-ending for the datalake connector to understand them. In short, they are csv-files, but do not have the filename xyz.csv. Simple enought problem, I thought but as I am not super comfortable working with datalake data, I tried to figure out some way of easier solving this issue. First I tried using the dataflow connector to ADLS gen 2 but that got the same problem. Just so you know.
Then it struck me, data flows use PowerQuery/M which is the same thing that is used in Power BI, AND Microsoft have release some marketing reports for Power BI that utilize the data in the datalake combined with dataverse. I hence opened one of these and tried to copy the entire data query part. It turned out to be more than 20 different components. Datasources, configuration, functions and more. When pasted into the Data Flow, it simply didn’t work and as usual, I didn’t get a very good errormessage. But my hunch was that maybe the logic is too complicated with too many internal connections for the data flow. If you know the limitations here, please leave a comment. This didn’t stop me, my next step was to remove all unnecessary stuff from it, like the dataverse queries, config and such, but still, it didn’t work. So I attempted to move it all into one single M-script with a hardcoded referece to EmailClicked. When I did that, it worked! This is the final result:
let
Source = AzureStorage.Blobs("<datalakename>"),
#"ContainerContent" = Source{[Name="<containername>"]}[Data],
#"Removed Other Columns" = Table.SelectColumns(ContainerContent,{"Content", "Name", "Date modified", "Attributes"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Other Columns", each [Name] <> "model.json"),
#"Sorted Rows" = Table.Sort(#"Filtered Rows",{{"Date modified", Order.Descending}}),
#"Expanded Attributes" = Table.ExpandRecordColumn(#"Sorted Rows", "Attributes", {"Size"}, {"Size"}),
#"File Name column" = Table.DuplicateColumn(#"Expanded Attributes", "Name", "File Name"),
#"Remove csv" = Table.ReplaceValue(#"File Name column","csv/","",Replacer.ReplaceText,{"File Name"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Remove csv", "File Name", Splitter.SplitTextByEachDelimiter({"/"}, QuoteStyle.Csv, true), {"Interaction Name", "File Name"}),
#"Transform" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Interaction Name", type text}, {"File Name", type text}, {"Size", Int64.Type}}),
#"Add Datestamp" = Table.DuplicateColumn(#"Transform", "Date modified", "Datestamp"),
#"DateStampFormat" = Table.TransformColumnTypes(#"Add Datestamp",{{"Datestamp", type date}}),
TodayFunction = DateTime.FixedLocalNow,
#"Add Today" = Table.AddColumn(#"DateStampFormat", "Today", each TodayFunction()),
#"Changed TodayType" = Table.TransformColumnTypes(#"Add Today",{{"Today", type date}}),
#"Add DaysFromToday" = Table.AddColumn(#"Changed TodayType", "DaysFromToday", each [Datestamp]-[Today]),
#"Changed DaysFromToday" = Table.TransformColumnTypes(#"Add DaysFromToday",{{"DaysFromToday", Int64.Type}}),
Result = Table.RemoveColumns(#"Changed DaysFromToday", "Today"),
result2 = Table.SelectRows(Result, each [DaysFromToday] >= -180),
FilteredByInteraction = Table.SelectRows(result2, each [Interaction Name] = "EmailClicked"),
#"AddFileContents" = Table.AddColumn(#"FilteredByInteraction", "FileContent", each
Table.PromoteHeaders(Csv.Document([Content],[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.Csv]), [PromoteAllScalars=true])),
#"ContentTable" = Table.SelectColumns(AddFileContents,{"FileContent"}),
#"NoDataFiles" = Table.IsEmpty(#"ContentTable"),
InteractionTable = Table.ExpandTableColumn(#"ContentTable", "FileContent", Table.ColumnNames(ContentTable{0}[FileContent])),
#"Transformed" = Table.TransformColumnTypes(InteractionTable,{{"Timestamp", type datetimezone}}, "en-US"),
#"Duplicated Column" = Table.DuplicateColumn(#"Transformed", "Timestamp", "Datestamp"),
#"Datestamp" = Table.TransformColumns(#"Duplicated Column",{{"Datestamp", DateTime.Date, type date}}),
#"RenameId" = Table.RenameColumns(#"Datestamp",{{"InternalMarketingInteractionId", "Id"}}),
#"keyedtable" = Table.AddKey(#"RenameId", {"Id"}, true)
in
#"keyedtable"
I have highlighted the datalake name, container and the table. I hope this helps if you are having a similar issue!
Trackbacks/Pingbacks