Converting CSV to SQL
Plenty of data is available in CSV format, and many people's immediate way of dealing with this is to import it into a spreadsheet (possibly because it's the only tool they know of when they're dealing with anything other than paragraphs of text).
Quite often, though, the data itself is far better handled in a database, so getting the CSV file into SQL is a good starting point.
MySQL does have the load data statement, which can easily read CSV formatted data, however you need to create the table schema for that data first, which can be fiddly.
If you have a CSV file with a header line, telling you the names of the fields of data, a simple Bash script can convert this header line into a create table statement and then import the data itself into the table, which now automatically matches the data fields.
- csv2sql.sh
#!/bin/bash # Take a CSV file with a header row and create a SQL table containing named fields # Then import the data into the table. if [ -z "$2" ] then echo "Please supply a CSV filename with a header row, and a table name to be created from it" exit 1 fi csvfile="$1" table="$2" DB=MyDatabaseName length=64 ( echo "drop table if exists $table;" echo "create table $table (UniqueIndex int auto_increment unique" head -n1 "$csvfile" | fromdos | sed 's/^"//; s/","/\n/g; s/"$//' | while read field do sqlfield=`echo "$field" | tr -cd '[:alpha:]'` echo ",$sqlfield varchar($length)" done echo ");" ) | mysql $DB tail -n+2 "$csvfile" | fromdos | sed "s/\(.*\)/insert into $table values (0,\1);/" | mysql $DB
An example of the sort of CSV file this script can handle:
"Account","Service","Call Class","Charge Code","Destination","Peak Per Unit","Off Peak Per Unit","Weekend Per Unit","Billing Increment","Unit","As At" "Customer","IP Direct","SIP","AFG","Afghanistan","0.2397","0.2397","0.2397","1","60","01/06/2022" "Customer","IP Direct","SIP","AFGMOB","Afghanistan Mobile","0.2543","0.2543","0.2543","1","60","01/06/2022" "Customer","IP Direct","SIP","ALB","Albania","0.1835","0.1835","0.1835","1","60","01/06/2022" "Customer","IP Direct","SIP","ALBMOB","Albania Mobile","0.4563","0.4563","0.4563","1","60","01/06/2022" "Customer","IP Direct","SIP","ALG","Algeria","0.0748","0.0748","0.0748","1","60","10/06/2022" "Customer","IP Direct","SIP","ALGMOB","Algeria Mobile","0.4238","0.4238","0.4238","1","60","01/06/2022" "Customer","IP Direct","SIP","ALGMBO","Algeria Mobile Other","0.5522","0.5522","0.5522","1","60","01/06/2022"
Result:
mysql> select * from wiki limit 1\G *************************** 1. row *************************** UniqueIndex: 1 Account: Customer Service: IP Direct CallClass: SIP ChargeCode: AFG Destination: Afghanistan PeakPerUnit: 0.2397 OffPeakPerUnit: 0.2397 WeekendPerUnit: 0.2397 BillingIncrement: 1 Unit: 60 AsAt: 01/06/2022 1 row in set (0.00 sec)
Notes:
- The DB variable should contain the name of a (pre-existing) database into which you want the tables to be created
- The table name you specify will be deleted and re-created each time to run this script
- The length variable should specify the longest individual item of data in the CSV file
- All fields are created as varchar types, but note that this does not stop you using sum() and similar arithmetic functions on the data
- All SQL field names are created from the (upper- and lower-case) letters of the CSV field names with spaces and any punctuation omitted
- The header row and the data in the CSV file must be quoted with double quotes " and not single quotes ', and comma-separated (not tabs)
- An auto-increment index field is created as the first field in the table, partly because it can be useful, and partly because it makes the create table statement simpler in the script.
- A minor modification to the script could allow it to handle CSV files which are comma-separated but do not contain quotation marks around each field name or data value
- It also wouldn't be too hard to make it handle tab-separated values
Go up
Return to main index.