====== Converting CSV to SQL ======
Plenty of data is available in CSV format, and many people's immediate way of dealing with this is to import it into a spreadsheet (possibly because it's the only tool they know of when they're dealing with anything other than paragraphs of text).
Quite often, though, the data itself is far better handled in a database, so getting the CSV file into SQL is a good starting point.
MySQL does have the [[https://dev.mysql.com/doc/refman/8.0/en/load-data.html|load data]] statement, which can easily read CSV formatted data, however you need to create the table schema for that data first, which can be fiddly.
If you have a CSV file with a header line, telling you the names of the fields of data, a simple Bash script can convert this header line into a [[https://dev.mysql.com/doc/refman/8.0/en/create-table.html|create table]] statement and then import the data itself into the table, which now automatically matches the data fields.
#!/bin/bash
# Take a CSV file with a header row and create a SQL table containing named fields
# Then import the data into the table.
if [ -z "$2" ]
then
echo "Please supply a CSV filename with a header row, and a table name to be created from it"
exit 1
fi
csvfile="$1"
table="$2"
DB=MyDatabaseName
length=64
( echo "drop table if exists $table;"
echo "create table $table (UniqueIndex int auto_increment unique"
head -n1 "$csvfile" | fromdos | sed 's/^"//; s/","/\n/g; s/"$//' | while read field
do
sqlfield=`echo "$field" | tr -cd '[:alpha:]'`
echo ",$sqlfield varchar($length)"
done
echo ");"
) | mysql $DB
tail -n+2 "$csvfile" | fromdos | sed "s/\(.*\)/insert into $table values (0,\1);/" | mysql $DB
An example of the sort of CSV file this script can handle:"Account","Service","Call Class","Charge Code","Destination","Peak Per Unit","Off Peak Per Unit","Weekend Per Unit","Billing Increment","Unit","As At"
"Customer","IP Direct","SIP","AFG","Afghanistan","0.2397","0.2397","0.2397","1","60","01/06/2022"
"Customer","IP Direct","SIP","AFGMOB","Afghanistan Mobile","0.2543","0.2543","0.2543","1","60","01/06/2022"
"Customer","IP Direct","SIP","ALB","Albania","0.1835","0.1835","0.1835","1","60","01/06/2022"
"Customer","IP Direct","SIP","ALBMOB","Albania Mobile","0.4563","0.4563","0.4563","1","60","01/06/2022"
"Customer","IP Direct","SIP","ALG","Algeria","0.0748","0.0748","0.0748","1","60","10/06/2022"
"Customer","IP Direct","SIP","ALGMOB","Algeria Mobile","0.4238","0.4238","0.4238","1","60","01/06/2022"
"Customer","IP Direct","SIP","ALGMBO","Algeria Mobile Other","0.5522","0.5522","0.5522","1","60","01/06/2022"
Result:
mysql> select * from wiki limit 1\G
*************************** 1. row ***************************
UniqueIndex: 1
Account: Customer
Service: IP Direct
CallClass: SIP
ChargeCode: AFG
Destination: Afghanistan
PeakPerUnit: 0.2397
OffPeakPerUnit: 0.2397
WeekendPerUnit: 0.2397
BillingIncrement: 1
Unit: 60
AsAt: 01/06/2022
1 row in set (0.00 sec)
**Notes**:
* The **DB** variable should contain the name of a (pre-existing) database into which you want the tables to be created
* The table name you specify will be deleted and re-created each time to run this script
* The **length** variable should specify the longest individual item of data in the CSV file
* All fields are created as **varchar** types, but note that this does not stop you using **sum()** and similar arithmetic functions on the data
* All SQL field names are created from the (upper- and lower-case) letters of the CSV field names with spaces and any punctuation omitted
* The header row and the data in the CSV file **must** be quoted with double quotes **"** and not single quotes **'**, and comma-separated (not tabs)
* An auto-increment index field is created as the first field in the table, partly because it can be useful, and partly because it makes the **create table** statement simpler in the script.
* A minor modification to the script could allow it to handle CSV files which are comma-separated but do not contain quotation marks around each field name or data value
* It also wouldn't be too hard to make it handle tab-separated values
----
[[.:|Go up]]\\
Return to [[:|main index]].