How do I read in a text file?
Short answer:
- (types; delimiter) 0: `:filename
- .Q.fs[chunk_handler; `:filename]
Although there are many ways to read an ASCII file in q – depending on the content, how big the file is, and what you want to do with it – most of the time you will use one of two methods. The first method is for files you want to read into memory in their entirety, while the other approach is for situations in which you want to deal with the file in chunks. The latter scenario is covered in this related faq.
If the file is small enough (compared to the available memory in your system), you can read it all in as a list of lines in one go using read0. Given the following file, lines.txt,
foo=10
bar=20
baz=30
we can write
q)lines: read0 `:lines.txt
q)lines
“foo=10”
“bar=20”
“baz=30”
q)
To break up the lines, we use the vs (vector from scalar) function, applying the /: (each right) adverb so that we split each line:
q)split: “=” vs/: lines
q)split
“foo” “10”
“bar” “20”
“baz” “30”
q)
At this point, you probably want to parse each piece of text into its corresponding type to facilitate fast searching or arithmetic etc. You use the $ (cast) operator to do this, passing an uppercase type character as its left argument:
q)”S” $ “foo”
`foo
q)”I” $ “10”
10
q)
You may remember from this related faq that you can convert a list of items at once:
q)”S” $ (“foo”; “bar”; “baz”)
`foo`bar`baz
q)”I” $ (“10”; “20”; “30”)
10 20 30
q)
But wait! There’s more! If you pass a list of type characters as the left argument to $, you can parse multiple lists:
q)”SI” $ ((“foo”; “bar”; “baz”); (“10”; “20”; “30”))
foo bar baz
10 20 30
q)
The list of type characters can be as long as you like:
q)"SSFI*" $ ("foo"; "bar"; "10.5"; "47"; "left as a string")
`foo
`bar
10.5
47
"left as a string"
q)
Thus, we can parse our file with the following code:
q)”SI” $ flip “=” vs/: read0 `:lines.txt
foo bar baz
10 20 30
q)
Since this sequence of operations is so common, it has been wrapped up in an overload of that workhorse of text I/O, 0: (load text). The trick is to pass a pair as the left argument to 0: where the first element of the pair is the string of type characters and the second element of the pair is the delimiter between each name and value in the file:
q)(“SI”; “=”) 0: `:lines.txt
foo bar baz
10 20 30
q)
Putting it all together, we can turn our file into a table like so:
q)flip `name`val ! (“SI”; “=”) 0: `:lines.txt
name val
——–
foo 10
bar 20
baz 30
q)
Using 0: instead of the combination of read0, vs and $ is faster and less memory-intensive. The differences become significant as the file size grows:
q)system “wc trade_small.csv”
” 1000001 1000001 29997921 trade_small.csv”
q)\ts (“TSIF”; “,”) 0: `:trade_small.csv
554 20971840j
q)\ts “TSIF” $ flip “,” vs/: read0 `:trade_small.csv
2649 232389280j
q)
As a consequence of the 0: function’s superior memory efficiency, it can handle much larger files than the other approach:
q)system “wc trade.csv”
” 10000001 10000001 299888328 trade_big.csv”
q)\ts (“TSIF”; “,”) 0: `:trade_big.csv
5672 335544640j
q)\ts “TSIF” $ flip “,” vs/: read0 `:trade_big.csv
wsfull
q)
If you don’t actually need to parse the file contents, then read0 (by itself) is fine.
By the way, if you only want to grab part of the file, you can pass a triple to read0 in order to read a subset of the bytes. You’ll still get a list of lines broken on newlines:
q)offset: 3
q)number_of_bytes_to_read: 6
q)read0 (`:lines.txt; offset; number_of_bytes_to_read)
“=10”
“ba”
q)
Unless your file has fixed-length records, however, you may find it easier – assuming you have the head and tail utilities (or similar) available, to use thesystem function to get exactly the lines you want. For example,
q)first_line: first system “head -1 lines.txt”
q)
(Note the call to first; system always returns a list of strings, even when there is only one.) This particular example is handy when you need to examine the start of a file to figure out how to read it properly.