Inspired by articles such asWhy you should learn just a little Awk andLearn one sed command, I am trying to make use of Unix toolssed
, awk
, grep
, cut
, uniq
, sort
,
etc. instead of writing short Python utility scripts.
Here is a Python script I wrote this week. It greps a file for a given regular expression pattern and returns a unique, sorted, list of matches inside the capturing parentheses.
# grep2.py import re import sys def main(): patt = sys.argv[1] filename = sys.argv[2] text = open(filename).read() matchlist = set(m.group(1) for m in re.finditer(patt, text, re.MULTILINE)) for m in sorted(matchlist): print m if __name__ == '__main__': main()
As an example, I used my script to search one of the Django admin template files for all the Django template markup in the file.
$ python grep2.py '({{[^{}]+}}|{%[^{}]+%})' tabular.html
Output:
{% admin_media_prefix %} {% blocktrans with inline_admin_formset.opts.verbose_name|title as verbose_name %} {% cycle "row1" "row2" %} {% else %} {% endblocktrans %} {% endfor %} {% endif %} {% endspaceless %} {% for field in inline_admin_formset.fields %} {% for field in line %} {% for fieldset in inline_admin_form %} {% for inline_admin_form in inline_admin_formset %} {% for line in fieldset %} {% if field.is_hidden %} {% if field.is_readonly %} {% if field.required %} {% if forloop.first %} {% if forloop.last %} {% if inline_admin_form.form.non_field_errors %} {% if inline_admin_form.has_auto_field %} {% if inline_admin_form.original %} {% if inline_admin_form.original or inline_admin_form.show_url %} {% if inline_admin_form.show_url %} {% if inline_admin_formset.formset.can_delete %} {% if not field.widget.is_hidden %} {% if not forloop.last %} {% load i18n adminmedia admin_modify %} {% spaceless %} {% trans "Delete?" %} {% trans "Remove" %} {% trans "View on site" %} {{ field.contents }} {{ field.field }} {{ field.field.errors.as_ul }} {{ field.field.name }} {{ field.label|capfirst }} {{ forloop.counter0 }} {{ inline_admin_form.deletion_field.field }} {{ inline_admin_form.fk_field.field }} {{ inline_admin_form.form.non_field_errors }} {{ inline_admin_form.original }} {{ inline_admin_form.original.id }} {{ inline_admin_form.original_content_type_id }} {{ inline_admin_form.pk_field.field }} {{ inline_admin_formset.formset.management_form }} {{ inline_admin_formset.formset.non_form_errors }} {{ inline_admin_formset.formset.prefix }} {{ inline_admin_formset.opts.verbose_name_plural|capfirst }} {{ inline_admin_form|cell_count }} {{ verbose_name }}
Here's my attempt at using Unix tools:
$ sed -rn 's/^.*(\{\{.*\}\}|\{%.*%\}).*$/\1/gp' tabular.html | sort | uniq
However the output isn't quite the same:
{% admin_media_prefix %} {% else %} {% endblocktrans %} {% endfor %} {% endif %} {% endspaceless %} {% for field in inline_admin_formset.fields %} {% for field in line %} {% for fieldset in inline_admin_form %} {% for inline_admin_form in inline_admin_formset %} {% for line in fieldset %} {% if field.is_readonly %} {% if inline_admin_form.form.non_field_errors %} {% if inline_admin_form.original or inline_admin_form.show_url %} {% if inline_admin_formset.formset.can_delete %} {% if not field.widget.is_hidden %} {% load i18n adminmedia admin_modify %} {% spaceless %} {% trans "Remove" %} {{ field.contents }} {{ field.field }} {{ field.field.errors.as_ul }} {{ field.field.name }} {{ field.label|capfirst }} {{ inline_admin_form.fk_field.field }} {{ inline_admin_form.form.non_field_errors }} {{ inline_admin_formset.formset.management_form }} {{ inline_admin_formset.formset.non_form_errors }} {{ inline_admin_formset.formset.prefix }} {{ inline_admin_formset.opts.verbose_name_plural|capfirst }}
Unix tools are powerful and concise, but I still need to get a lot more comfortable with their syntax. Please leave a comment if you know how to fix my command.