Forum Moderators: coopster

Message Too Old, No Replies

Parse CSV file that has extra (internal) commas

         

techtheatre

5:14 am on Jan 19, 2009 (gmt 0)

10+ Year Member



I am trying to parse a CSV file that occasionally has commas within a field. My approach (below) breaks in this instance, as the string is split at these "internal" commas as if they were deliminators. Unfortunately there is no way to change the CSV format, as it is coming from an external source. The "good" news is that the string that contains the comma(s) is enclosed in quote marks...which is interpreted correctly by Excel, but not in my parsing below. Any help is greatly appreciated.

SAMPLE CSV FILE (third row has the problem):

DEBIT,2009-01-07,"TELEPHONE SERVICE IN",-5.03
CREDIT,2009-01-06,"ATM CHECK DEPOSIT",75.00
DEBIT,2009-01-05,"COMPANY, INC. MONTHLY SERVICES",-35.96
DEBIT,2009-01-05,"PAYMENT transaction#: 184934",-200.00

MY CODE:

<?php

$csv_data = $_POST['csv_data'];
$csv_array = explode("\n",$csv_data);

// Loop through all CSV data rows and generate separate
$last_data_row = count($csv_array) - 1;//this is necessary because arrays start with position #0
for($counter = 0; $counter < $last_data_row; $counter++)
{
$data_row = explode(",",$csv_array[$counter]);

$TransactionType = $data_row['0'];
$TransactionDate = $data_row['1'];
$TransactionDescription = $data_row['2'];
$TransactionAmount = $data_row['3'];

$query = "INSERT INTO Transactions (TransactionType, TransactionDate, TransactionDescription, TransactionAmount) VALUES('$TransactionType','$TransactionDate','$TransactionDescription','$TransactionAmount'); ";

echo "$query <br />";
}

?>

techtheatre

6:39 am on Jan 19, 2009 (gmt 0)

10+ Year Member



Okay...so i have found a solution...though i am not sure if it is the most efficient/effective. I have been testing it and it seems to be working out... Basically i took the first couple array values for granted, since i know they will not contain commas. Then i did a count() on the array to see how many values it had. I then took the LAST one (regardless of the number) and made it my "Transaction Amount". Then whatever was "left over" in the middle I concatenated together to form my description (and i even added back in the commas for clarity). I hope this helps someone someday. If anyone has a better solution, please let me know! Thanks!

<?php

$csv_data = $_POST['csv_data'];
$csv_array = explode("\n",$csv_data);

// Loop through all CSV data rows and generate separate
$last_data_row = count($csv_array) - 1;//this is necessary because arrays start with position #0
for($counter = 0; $counter < $last_data_row; $counter++)
{
$data_row = explode(",",$csv_array[$counter]);

//just in case there were commas in the 'description' portion we are taking the first few and the last...then clumping the leftovers (2 + [3, 4, 5...]) in the 'description'
$RowsInArray = count($data_row) - 1;

$TransactionType = $data_row['0'];
$TransactionDate = $data_row['1'];
$TransactionDescription = $data_row['2'];
for( $descriptcounter = 3; $descriptcounter <= ($RowsInArray-1); $descriptcounter++)
{
$TransactionDescription .= ', '.$data_row[$descriptcounter];
}
$TransactionAmount = trim($data_row[$RowsInArray]);

$query = "INSERT INTO Transactions (TransactionType, TransactionDate, TransactionDescription, TransactionAmount) VALUES('$TransactionType','$TransactionDate','$TransactionDescription','$TransactionAmount'); ";
echo "$query <br />";
}

?>

henry0

12:07 pm on Jan 19, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might try the following
using: $TransactionDescription = $data_row['2'];

make believe that the whole element is textual
by placing it within escaped double quotes
$address = "\"$address\"";

A little trick I picked up from somewhere :)
wish I had this simple idea first!

bkeep

7:25 pm on Jan 19, 2009 (gmt 0)

10+ Year Member



Would something like this work


$sql = "INSERT INTO $table ($fields) VALUES (";

//Read the post string as csv
$row_count = 0;
while (($data = str_getcsv($_POST['csv_data'])) !== false) {
++$row_count;
foreach ($data as $key => $value) {
$clean_data = mysql_real_escape_string((get_magic_quotes_gpc())? stripslashes($value): $value);
$query_string .= "'" . $clean_data . "',";
}
$query_string = substr($query_string, 0, -4);
$query_string .="),(";
}
$sql = substr($query_string, 0, -2);

print $sql;

coopster

7:37 pm on Jan 21, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Or use the PHP function designed for this, fgetcsv [php.net]

bkeep

7:42 pm on Jan 21, 2009 (gmt 0)

10+ Year Member



fgetcsv parse the file that's why I suggested the str_getcsv basically the same thing

coopster

11:56 pm on Jan 21, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Yep, understood. From the first sentence in the original post ...

I am trying to parse a CSV file

So, I thought I would make it obvious that the exact function exists.

bkeep

12:30 am on Jan 22, 2009 (gmt 0)

10+ Year Member



You know I missed that part. I was looking at the $_POST['csv_data']; part and wasn't thinking it was a file, That snippet I posted can be interchanged with the fgetcsv as the original function I use is for importing csv files

I should clean the dust off my monitor

henry0

12:06 pm on Jan 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Did you try my suggestion,
I do not remember where I used it, but I know it did the job.

techtheatre

4:17 am on Jan 23, 2009 (gmt 0)

10+ Year Member



Henry0: I did not understand your suggestion. The problem happens when the $data_row() array is initially populated because i am exploding the string based upon commas. This means that $data_row['2'] does not (necessarily) contain the entire string it should contain, thus escaping it really doesn't do anything. Let me know if I misunderstand your suggestion.

coopster/bkeep: Thanks for pointing out this function. I had no idea that this existed. I have not yet tried playing around with it, because the scripting I wrote does do the job...however I plan to re-write it with fgetcsv just to learn how it works. You are both correct...the real (eventual) plan is to set up my script to import the actual file and parse that. However, at the moment I have it just parsing an incoming posted string for ease of troubleshooting. THANKS for letting me know about this function, I kinda feel like i just re-invented he wheel. Oh well...I guess it was a good exercise in problem solving :-)

henry0

11:24 am on Jan 23, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was under the impression that you were only concerned by comma/s within the str

techtheatre

2:15 pm on Jan 23, 2009 (gmt 0)

10+ Year Member



I am...but it is that comma that is causing the problem. The entire "row" is a single string, and i am breaking this up (explode) at each comma. Unfortunately this includes a break at a comma that is not meant to be a deliminator but rather is included as part of one of the values. Most of my commas are "good"...but there are these occasional extra "bad" commas where they don't belong.

I bet that the built-in CSV functions within PHP (fgetcsv and str_getcsv) will work beautifully, but in the meantime the workaround i wrote (posted above) is also working well.

Thanks!