dependencies
| (this space intentionally left almost blank) | ||||||
Higher level CSV parsing/processing functionalityThe two most popular CSV parsing libraries for Clojure presently - Features
StructureSemantic CSV is structured around a number of composable processing functions for transforming data as it comes out of or goes into a CSV file. This leaves room for you to use whatever parsing/formatting tools you like, reflecting a nice decoupling of grammar and semantics. However, a couple of convenience functions are also provided which wrap these individual steps in an opinionated but customizable manner, helping you move quickly while prototyping or working at the REPL. | |||||||
Core API namespace | (ns semantic-csv.core
(:require [clojure.java.io :as io]
[clojure-csv.core :as csv]
[semantic-csv.impl.core :as impl :refer [?>>]])) | ||||||
To start, require this namespace,
Now let's take a tour through some of the processing functions we have available, starting with the input processing functions. | |||||||
Input processing functionsNote that all of these processing functions leave the rows collection as the final argument.
This is to make these functions interoperable with other standard collection processing functions ( Let's start with what may be the most basic and frequently needed function: | |||||||
mappify | |||||||
Takes a sequence of row vectors, as commonly produced by csv parsing libraries, and returns a sequence of
maps. By default, the first row vector will be interpreted as a header, and used as the keys for the maps.
However, this and other behaviour are customizable via an optional
| (defn mappify
([rows]
(mappify {} rows))
([{:keys [keyify header structs] :or {keyify true} :as opts}
rows]
(let [consume-header (not header)
header (if header
header
(first rows))
header (if keyify (mapv keyword header) header)
map-fn (if structs
(let [s (apply create-struct header)]
(partial apply struct s))
(partial impl/mappify-row header))]
(map map-fn
(if consume-header
(rest rows)
rows))))) | ||||||
Here's an example to whet our whistle:
Note that | |||||||
remove-comments | |||||||
Removes rows which start with a comment character (by default,
| (defn remove-comments
([rows]
(remove-comments {:comment-re #"^\#"} rows))
([{:keys [comment-re comment-char]} rows]
(let [commented? (if comment-char
#(= comment-char (first %))
(partial re-find comment-re))]
(remove
(fn [row]
(let [x (first row)]
(when x
(commented? x))))
rows)))) | ||||||
Let's see this in action with the same data we looked at in the last example:
Much better :-)
| |||||||
cast-with | |||||||
Casts the vals of each row according to
| (defn cast-with
([cast-fns rows]
(cast-with cast-fns {} rows))
([cast-fns {:keys [except-first exception-handler only] :as opts} rows]
(->> rows
(?>> except-first (drop 1))
(map #(impl/cast-row cast-fns % :only only :exception-handler exception-handler))
(?>> except-first (cons (first rows)))))) | ||||||
Let's try casting a numeric column using this function:
Alternatively, if we want to cast multiple columns using a single function, we can do so by passing a single casting function as the first argument.
Note that this function handles either map or vector rows.
In particular, if you’ve imported data without consuming a header (by either not using mappify or
by passing
| |||||||
except-first | |||||||
Takes any number of forms and a final | (defmacro except-first
[& forms-and-data]
(let [data (last forms-and-data)
forms (butlast forms-and-data)]
`((fn [rows#]
(let [first-row# (first rows#)
rest-rows# (rest rows#)]
(cons first-row# (->> rest-rows# ~@forms))))
~data))) | ||||||
This macro generalizes the
This could be useful if you know you want to do some processing on all non-header rows, but don't really need to know which columns are which to do this, and still want to keep the header row.
| |||||||
process | |||||||
This function wraps together the most frequently used input processing capabilities,
controlled by an
| (defn process
([{:keys [mappify keify header remove-comments comment-re comment-char structs cast-fns cast-exception-handler cast-only]
:or {mappify true
keify true
remove-comments true
comment-re #"^\#"}
:as opts}
rows]
(->> rows
(?>> remove-comments (semantic-csv.core/remove-comments {:comment-re comment-re :comment-char comment-char}))
(?>> mappify (semantic-csv.core/mappify {:keify keify :header header :structs structs}))
(?>> cast-fns (cast-with cast-fns {:exception-handler cast-exception-handler :only cast-only}))))
; Use all defaults
([rows]
(process {} rows))) | ||||||
Using this function, the code we've been building above is reduced to the following:
| |||||||
parse-and-process | |||||||
This is a convenience function for reading a csv file using | (defn parse-and-process
[csv-readable & {:keys [parser-opts]
:or {parser-opts {}}
:as opts}]
(let [rest-options (dissoc opts :parser-opts)]
(process
rest-options
(impl/apply-kwargs csv/parse-csv csv-readable parser-opts)))) | ||||||
| |||||||
slurp-csv | |||||||
This convenience function let's you | (defn slurp-csv
[csv-filename & {:as opts}]
(let [rest-options (dissoc opts :parser-opts)]
(with-open [in-file (io/reader csv-filename)]
(doall
(impl/apply-kwargs parse-and-process in-file opts))))) | ||||||
For the ultimate in programmer laziness:
| |||||||
Some casting functions for your convenienceThese functions can be imported and used in your | |||||||
->int | |||||||
Translate to int from string or other numeric. If string represents a non integer value, it will be rounded down to the nearest int. | (defn ->int
[v]
(if (string? v)
(-> v clojure.string/trim Double/parseDouble int)
(int v))) | ||||||
->long | |||||||
Translate to long from string or other numeric. If string represents a non integer value, will be rounded down to the nearest long. | (defn ->long
[v]
(if (string? v)
(-> v clojure.string/trim Double/parseDouble long)
(long v))) | ||||||
->float | |||||||
Translate to float from string or other numeric. | (defn ->float
[v]
(if (string? v)
(-> v clojure.string/trim Float/parseFloat)
(float v))) | ||||||
->double | |||||||
Translate to double from string or other numeric. | (defn ->double
[v]
(if (string? v)
(-> v clojure.string/trim Double/parseDouble)
(double v))) | ||||||
| |||||||
Note these functions place a higher emphasis on flexibility and convenience than performance, as you can likely see from their implementations. If maximum performance is a concern for you, and your data is fairly regular, you may be able to get away with less robust functions, which shouldn't be hard to implement yourself. For most cases though, the performance of those provided here should be fine. | |||||||
Output processing functions | |||||||
As with the input processing functions, the output processing functions are designed to be small, composable pieces which help you push your data through to a third party writer. And as with the input processing functions, higher level, opinionated, but configurable functions are offered which automate some of this for you. We've already looked at | |||||||
vectorize | |||||||
Take a sequence of maps, and transform them into a sequence of vectors. Options:
| (defn vectorize
([rows]
(vectorize {} rows))
([{:keys [header prepend-header format-header]
:or {prepend-header true format-header impl/stringify-keyword}}
rows]
;; Grab the specified header, or the keys from the first row. We'll
;; use these to `get` the appropriate values for each row.
(let [header (or header (-> rows first keys))
;; This will be the formatted version we prepend if desired
out-header (if format-header (mapv format-header header) header)]
(->> rows
(map
(fn [row] (mapv (partial get row) header)))
(?>> prepend-header (cons out-header)))))) | ||||||
Let's see this in action:
With some options:
| |||||||
batch | |||||||
Takes sequence of items and returns a sequence of batches of items from the original
sequence, at most | (defn batch [n rows] (partition n n [] rows)) | ||||||
This function can be useful when working with | |||||||
| |||||||
spit-csv | |||||||
Convenience function for spitting out CSV data to a file using
| (defn spit-csv
([file rows]
(spit-csv file {} rows))
([file
{:keys [batch-size cast-fns writer-opts header prepend-header]
:or {batch-size 20 prepend-header true}
:as opts}
rows]
(if (string? file)
(with-open [file-handle (io/writer file)]
(spit-csv file-handle opts rows))
; Else assume we already have a file handle
(->> rows
(?>> cast-fns (cast-with cast-fns))
(?>> (-> rows first map?)
(vectorize {:header header
:prepend-header prepend-header}))
; For safe measure
(cast-with str)
(batch batch-size)
(map #(impl/apply-kwargs csv/write-csv % writer-opts))
(reduce
(fn [w rowstr]
(.write w rowstr)
w)
file))))) | ||||||
Note that since we use | |||||||
One last example showing everything togetherLet's see how Semantic CSV works in the context of a little data pipeline. We're going to thread data in, transform into maps, run some computations for each row and assoc in, then write the modified data out to a file, all lazily. First let's show this with
Now let's see what this looks like with As mentioned above,
| |||||||
That's it for the core APIHope you find this library useful. If you have questions or comments please either submit an issue or join us in the dedicated chat room. | |||||||
This namespace consists of implementation details for the main API | (ns semantic-csv.impl.core) | ||||||
Translates a single row of values into a map of | (defn mappify-row
[header row]
(into {} (map vector header row))) | ||||||
Utility that takes a function f, any number of regular args, and a final kw-args argument which will be splatted in as a final argument | (defn apply-kwargs
[f & args]
(apply
(apply partial
f
(butlast args))
(apply concat (last args)))) | ||||||
Leaves strings alone. Turns keywords into the stringified version of the keyword, sans the initial | (defn stringify-keyword
[x]
(cond
(string? x) x
(keyword? x) (->> x str (drop 1) (apply str))
:else (str x))) | ||||||
Returns a function that casts casts a single row value based on specified casting function and optionally excpetion handler | (defn row-val-caster
[cast-fns exception-handler]
(fn [row col]
(let [cast-fn (if (map? cast-fns) (cast-fns col) cast-fns)]
(try
(update-in row [col] cast-fn)
(catch Exception e
(update-in row [col] (partial exception-handler col))))))) | ||||||
Format the values of row with the given function. This gives us some flexbility with respect to formatting both vectors and maps in similar fashion. | (defn cast-row
[cast-fns row & {:keys [only exception-handler]}]
(let [cols (cond
; If only is specified, just use that
only
(flatten [only])
; If cast-fns is a map, use those keys
(map? cast-fns)
(keys cast-fns)
; Then assume cast-fns is single fn, and fork on row type
(map? row)
(keys row)
:else
(range (count row)))]
(reduce (row-val-caster cast-fns exception-handler) row cols))) | ||||||
The following is ripped off from prismatic/plumbing: | |||||||
Conditional double-arrow operation (->> nums (?>> inc-all? (map inc))) | (defmacro ?>>
[do-it? & args]
`(if ~do-it?
(->> ~(last args) ~@(butlast args))
~(last args))) | ||||||
We include it here in lieue of depending on the full library due to dependency conflicts with other libraries. | |||||||