How to parse CSV with double quote (") character in Crystal
We use microservice written in Crystal to parse large CSV files (about 1.5Gb). Some rows in these files may contain no closed " characters:
,Y,FEDERAL NATIONAL MORTGAGE ASSOCIATION "F,,
With Crystal default CSV parse settings this row and everything after it won't be parsed correctly because
DEFAULT_QUOTE_CHAR constant is equal to
". Of couse you can override
quote_char param in
CSV contstructor with something that cannot be found in your document.
From my point of view the best is to use zero byte which is
'\u0000' in Crystal.
csv = CSV.new(file, headers: true, strip: true, quote_char: '\u0000') while csv.next # ... end
Migrate tags in Rails to PostgreSQL array from ActsAsTaggableOn
ActsAsTaggableOn is a swiss army knife solution if you need to add tags to your ActiveRecord model.
Just by adding one gem to your Gemfile and
acts_as_taggable to the model you get everything you need: adding tags, searching for a model by tag, getting top tags, etc. However, sometimes you don't need all these.
In our project, we used
acts_as_taggable to store tags for
Note model. Then we displayed a list of notes on several pages with assigned tags and had autocompleted input for tags on
Note form. Everything worked well, but since we use PostgreSQL, I decided to store tags as an array in Note model.
First of all, I added
tags Array<String> column to
Note, after this migrated actsastaggable tags to
notes table with migration.
class MigrateNoteTags < ActiveRecord::Migration[5.2] def change execute <<-SQL UPDATE notes SET tags = grouped_taggings.tags_array FROM ( SELECT taggings.taggable_id, ARRAY_AGG ( tags.NAME ) tags_array FROM taggings LEFT JOIN tags ON taggings.tag_id = tags.ID WHERE taggable_type = 'Note' GROUP BY taggings.taggable_id ) AS grouped_taggings WHERE notes.ID = grouped_taggings.taggable_id SQL end end
To have backward compatibility, I added
def tag_list tags.join(', ') end
The last thing is to add the ability to search for tags. Since there about 500k records in the Notes table, I decided to create an SQL view:
CREATE OR REPLACE VIEW note_tags AS SELECT UNNEST ( tags ) AS name, COUNT ( * ) AS taggings_count FROM notes GROUP BY name
That's it! It takes from 100ms to 150ms to search for tags in this view, which is fine for me.
If you have more significant data sets, then the best would be to create
tags table and add triggers to
notes table that will update
tags on INSERT/UPDATE/DELETE.