Page 1 of 1

First crack at using JREPL - Guidance

Posted: 18 Feb 2020 05:59
by tgalla010
Good morning! I'm hoping to toss a nice easy one out there for someone. Scripting is not my strength and however I can navigate my way through a .bat file.

I stumbled across this impressive tool while searching for a means to fix .csv files I'm getting from a software vendor. The source is mysql/unix and I'm "fixing" the format so my PowerBI system can easily process the data. Here are the details and data sample.

File Name c:\srs data\shop users.csv
Fix 1. Remove all instances of \n in the data and replace that with ","
- The \n (new line) is coming from the unix export and are actual characters that are generally but not always at the beginning of the line. I hope to remove all instances from the file.

Fix 2. Bring the data back to a single line where there have been obvious breaks. In the data example, per notepad++ a CF and LF exist on the line prior to the \n. I don't want to remove all instances of the CF LF.

Using an example from another thread, I gave it a stab with this syntax and while the process runs...no luck.

type test.csv | jrepl "\n?" "," /M

Would anyone be able to point me in the right direction?

Here is a sample data set. I have modified some of the data to protect the innocent. Customer record 7675 has an example where the CR LF on the previous line created a \n on the following line.

Code: Select all

"7673","NULL","NULL","NULL","Amanda Brown","NULL","NULL","tinkerbe54@gmail.com","NULL","NULL","NULL","NULL","NULL","NULL","NULL","0","NULL","NULL","0","NULL","0","NULL","NULL","NULL","NULL","NULL","2019-09-08 22:51:13","2019-09-16 15:39:43","NULL","NULL","{"welcome-email-sent":true,"first-purchase-discount":"7673RST"}","NULL","NULL","NULL","NULL","NULL","$2y$10$FydsQ0GoXl7bsblFIssEV.MhJuSdd/xoYgICG6ohGce7pi3whbalC","NULL","fhYyGPLKmHU:APA91bHqx4cafKGUmuwugSDhy4r9T_T_iI5rrHno4Yp0DBUrFHc4np5q_S8vFaUutnlniBvVW-UGsn4jQQ8morfvadHf_GV89CJN72bunQZbxCvFbbTjADOsIitvDlo2Ku3QnQLth-Fu","NULL","ios","NULL","NULL","1","0","NULL","NULL","NULL","NULL","0"
"7674","10162337917605357","NULL","NULL","Feryal Brown","NULL","NULL","fzeid@yahoo.com","https://graph.facebook.com/v2.10/10162337915357/picture?type=normal","NULL","NULL","NULL","NULL","NULL","NULL","0","25769054098","NULL","0","1434110415-7674","0","NULL","NULL","NULL","NULL","NULL","2019-09-08 23:12:20","2020-02-14 03:39:18","NULL","NULL","{"welcome-email-sent":true,"has_groups_access_member_info":true,"last-visit-time":1567984354,"saw-onboarding-popup":true,"first-purchase-discount":"7674FIRST","groups_access_member_info_confirmed_at":1581651558}","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","1","0","NULL","NULL","NULL","NULL","0"
"7675","101566618878206","NULL","NULL","Ashleigh Brown","NULL","NULL","abe345@aol.com","https://graph.facebook.com/v2.10/10156661880616/picture?type=normal","119 N Longcross Rd","NULL","Linthicum Heights","MD","21090","NULL","0","NULL","NULL","0","NULL","0","NULL","NULL","NULL","NULL","NULL","2019-09-09 00:47:50","2020-02-14 03:39:18","NULL","NULL","{"welcome-email-sent":true,"shopping-cart.coupon":"","shopping-cart.second-coupon":"","first-purchase-discount":"7675FIRST","has_groups_access_member_info":true,"groups_access_member_info_confirmed_at":1581651558}","NULL","US","NULL","NULL","NULL","$2y$10$gkUgF68tghdGUqwRTEXLAek44fRgGUZUVsd7BCwcnAS25VqwAKz4u","NULL","dMou2F9E6kA:APA91bGnyKYNOnjajKtZ8zo2-BVulRM8pIvptpXU_VYLwlLb5Ta2FTQjQpos6VStVY6s9vNNPYw11Gm5I120zWEymnEMy6qDNnCJ-Q6LJQ5ajuUnLY4T9KQHLbyYk6YnwFwQY2Yj0yJ_","f82fadc8ed9657b67ce4ab104ae5209e67af1ecb3a5a12480a68a14b15eff282","ios","NULL","NULL","1","0","NULL","NULL","NULL","NULL","0"
"7676","10219994827849390","2389791057742336","NULL","Stanna Veenis","NULL","NULL","dsves@gmail.com","https://graph.facebook.com/v2.10/10219827849390/picture?type=normal","PO BOX 721328","NULL","Pinon Hills","CA","92372","1111111111","0","NULL","NULL","0","NULL","0","cus_FqIzGIsUE3sbil","Visa","4920","NULL","NULL","2019-09-09 00:48:32","2020-02-14 03:39:18","NULL","NULL","{"welcome-email-sent":true,"has_groups_access_member_info":true,"last-visit-time":1574240445,"saw-onboarding-popup":true,"first-purchase-discount":"7676FIRST","shopping-cart.coupon":"","shopping-cart.second-coupon":"","shopping-cart.apply-balance":true,"groups_access_member_info_confirmed_at":1581651558}","NULL","US","NULL","NULL","NULL","$2y$10$SMiM2FQbkye4RbDIYXlELuGzJKgLSWHTVrf2DFXnoeoOYaNxvFQIK","422 Lost Trail Dr.
\nHenderson, NV
\n89014","dNc5OEgA7dA:APA91bGor-5h5bhbEwyZsBzIzt7HmJfopNuLAJFzEQjuTnZLL5pWEY_0FHTkTPgx6i0n4YP7oj0n240YcutpSbaX-1gTcNFmgFvCSbl0jkNiqyRGDi1R2D9gk93B4PrXtILXmGSF5KNP","NULL","android","NULL","NULL","1","0","5555550870","NULL","NULL","NULL","0"
"7677","10214506666721933","NULL","NULL","Haley Brown","NULL","NULL","hjcom@yahoo.com","https://graph.facebook.com/v2.10/10214506621933/picture?type=normal","NULL","NULL","NULL","NULL","NULL","NULL","0","NULL","NULL","0","NULL","0","NULL","NULL","NULL","NULL","NULL","2019-09-09 01:35:42","2020-02-14 03:39:18","NULL","NULL","{"welcome-email-sent":true,"has_groups_access_member_info":true,"last-visit-time":1567992966,"saw-onboarding-popup":true,"first-purchase-discount":"7677FIRST","groups_access_member_info_confirmed_at":1581651558}","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","1","0","NULL","NULL","NULL","NULL","0"
"7678","10217122963534570","NULL","NULL","Mary Brown","NULL","NULL","iloveplls2003@yahoo.com","https://graph.facebook.com/v2.10/10217963534570/picture?type=normal","NULL","NULL","NULL","NULL","NULL","NULL","0","NULL","NULL","0","NULL","0","NULL","NULL","NULL","NULL","NULL","2019-09-09 01:57:59","2020-02-14 03:39:18","NULL","NULL","{"welcome-email-sent":true,"has_groups_access_member_info":false,"last-visit-time":1567994287,"saw-onboarding-popup":true,"first-purchase-discount":"7678FIRST","groups_access_member_info_confirmed_at":1581651558}","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","1","0","NULL","NULL","NULL","NULL","0"
"7679","10209615670408722","NULL","NULL","Carrie Brown","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","0","NULL","NULL","0","NULL","0","NULL","NULL","NULL","NULL","NULL","2019-09-09 02:34:11","2020-02-14 03:39:18","NULL","NULL","{"has_groups_access_member_info":false,"groups_access_member_info_confirmed_at":1581651558}","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","1","0","NULL","NULL","NULL","NULL","0"
"7680","1669466763109748","NULL","NULL","Noelle Brown","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","0","NULL","NULL","0","NULL","0","NULL","NULL","NULL","NULL","NULL","2019-09-09 05:10:10","2020-02-14 03:39:18","NULL","NULL","{"has_groups_access_member_info":true,"groups_access_member_info_confirmed_at":1581651558}","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","NULL","1","0","NULL","NULL","NULL","NULL","0"

Re: First crack at using JREPL - Guidance

Posted: 18 Feb 2020 08:20
by dbenham
A couple points.

1) Linefeeds are valid characters for csv values if they are enclosed within double quotes. It looks like your 7676 record is supposed to contain a multi-line address field. It looks correct, except the \n strings should not be there. If you really need to remove the linefeeds, then I don't think you want to simply replace them with comma. Some of your rows have addresses with a separate field for each component (street address, city, state, zip code). Whereas 7676 has one big field for the entire address. I've never seen a valid csv that was inconsistent like that.

3) Your csv has a serious problem with some values. It looks like the data contains some fields that are supposed to be JSON. Such fields would need to be quoted, but then the JSON quotes would need to be doubled. Even if you manage to resolve your linefeed issue, your file would still be unreadable as csv.


Dave Benham

Re: First crack at using JREPL - Guidance

Posted: 19 Feb 2020 13:48
by tgalla010
Thank you for your input, yeah I agree the file has more than one issue and to be honest I've attempted to get the vendor to correct the export to no avail. So my approach now is to attempt to reconstruct it using multiple approaches if need be.

Re: First crack at using JREPL - Guidance

Posted: 19 Feb 2020 15:36
by dbenham
I don't mind helping you discover specific JREPL options that will perform the transformation you want, but you have to first establish precise rules as to what you want.

Based on what I can see, the rules you gave in your initial post cannot work.

An additional problem I saw - I'm not convinced the columns all have the same meaning from row to row. JSON seems to float between different columns, as do addresses. I'm not even sure how many columns there are supposed to be.

If a CSV file is to be useful, then you must have clearly defined meaning for each column.

If you can come up with some sensible rules, then I can help with the JREPL options.


Dave Benham