Julien Tayon: Abusing yahi -a log based statistic tool à la awstats- to plot histograms/date series
pre { line-height: 125%; } td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } .highlight .hll { background-color: #ffffcc } .highlight { background: #eeffcc; } .highlight .c { color: #408090; font-style: italic } /* Comment */ .highlight .err { border: 1px solid #FF0000 } /* Error */ .highlight .k { color: #007020; font-weight: bold } /* Keyword */ .highlight .o { color: #666666 } /* Operator */ .highlight .ch { color: #408090; font-style: italic } /* Comment.Hashbang */ .highlight .cm { color: #408090; font-style: italic } /* Comment.Multiline */ .highlight .cp { color: #007020 } /* Comment.Preproc */ .highlight .cpf { color: #408090; font-style: italic } /* Comment.PreprocFile */ .highlight .c1 { color: #408090; font-style: italic } /* Comment.Single */ .highlight .cs { color: #408090; background-color: #fff0f0 } /* Comment.Special */ .highlight .gd { color: #A00000 } /* Generic.Deleted */ .highlight .ge { font-style: italic } /* Generic.Emph */ .highlight .ges { font-weight: bold; font-style: italic } /* Generic.EmphStrong */ .highlight .gr { color: #FF0000 } /* Generic.Error */ .highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ .highlight .gi { color: #00A000 } /* Generic.Inserted */ .highlight .go { color: #333333 } /* Generic.Output */ .highlight .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */ .highlight .gs { font-weight: bold } /* Generic.Strong */ .highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ .highlight .gt { color: #0044DD } /* Generic.Traceback */ .highlight .kc { color: #007020; font-weight: bold } /* Keyword.Constant */ .highlight .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */ .highlight .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */ .highlight .kp { color: #007020 } /* Keyword.Pseudo */ .highlight .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */ .highlight .kt { color: #902000 } /* Keyword.Type */ .highlight .m { color: #208050 } /* Literal.Number */ .highlight .s { color: #4070a0 } /* Literal.String */ .highlight .na { color: #4070a0 } /* Name.Attribute */ .highlight .nb { color: #007020 } /* Name.Builtin */ .highlight .nc { color: #0e84b5; font-weight: bold } /* Name.Class */ .highlight .no { color: #60add5 } /* Name.Constant */ .highlight .nd { color: #555555; font-weight: bold } /* Name.Decorator */ .highlight .ni { color: #d55537; font-weight: bold } /* Name.Entity */ .highlight .ne { color: #007020 } /* Name.Exception */ .highlight .nf { color: #06287e } /* Name.Function */ .highlight .nl { color: #002070; font-weight: bold } /* Name.Label */ .highlight .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */ .highlight .nt { color: #062873; font-weight: bold } /* Name.Tag */ .highlight .nv { color: #bb60d5 } /* Name.Variable */ .highlight .ow { color: #007020; font-weight: bold } /* Operator.Word */ .highlight .w { color: #bbbbbb } /* Text.Whitespace */ .highlight .mb { color: #208050 } /* Literal.Number.Bin */ .highlight .mf { color: #208050 } /* Literal.Number.Float */ .highlight .mh { color: #208050 } /* Literal.Number.Hex */ .highlight .mi { color: #208050 } /* Literal.Number.Integer */ .highlight .mo { color: #208050 } /* Literal.Number.Oct */ .highlight .sa { color: #4070a0 } /* Literal.String.Affix */ .highlight .sb { color: #4070a0 } /* Literal.String.Backtick */ .highlight .sc { color: #4070a0 } /* Literal.String.Char */ .highlight .dl { color: #4070a0 } /* Literal.String.Delimiter */ .highlight .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */ .highlight .s2 { color: #4070a0 } /* Literal.String.Double */ .highlight .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */ .highlight .sh { color: #4070a0 } /* Literal.String.Heredoc */ .highlight .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */ .highlight .sx { color: #c65d09 } /* Literal.String.Other */ .highlight .sr { color: #235388 } /* Literal.String.Regex */ .highlight .s1 { color: #4070a0 } /* Literal.String.Single */ .highlight .ss { color: #517918 } /* Literal.String.Symbol */ .highlight .bp { color: #007020 } /* Name.Builtin.Pseudo */ .highlight .fm { color: #06287e } /* Name.Function.Magic */ .highlight .vc { color: #bb60d5 } /* Name.Variable.Class */ .highlight .vg { color: #bb60d5 } /* Name.Variable.Global */ .highlight .vi { color: #bb60d5 } /* Name.Variable.Instance */ .highlight .vm { color: #bb60d5 } /* Name.Variable.Magic */ .highlight .il { color: #208050 } /* Literal.Number.Integer.Long */ div.admonition, div.topic, blockquote { clear: left; } div.admonition { margin: 20px 0px; padding: 10px 30px; background-color: #EEE; border: 1px solid #CCC; } div.admonition tt.xref, div.admonition code.xref, div.admonition a tt { background-color: #FBFBFB; border-bottom: 1px solid #fafafa; } div.admonition p.admonition-title { font-family: Georgia, serif; font-weight: normal; font-size: 24px; margin: 0 0 10px 0; padding: 0; line-height: 1; } div.admonition p.last { margin-bottom: 0; } /* -- topics ----------------------------------------
Yahiis a python module that can installed with pip to make a all in one static html page aggregating the data from a web server. Actually, asshown with the parsing of auth.log in the documentation, it is pretty much a versatile log analyser based on regexp enabling various aggregation : by geo_ip, by histograms, chronological series.
The demo ishere. It pretty much is close to awstats which is discontinued orgoaccess.
As the author of yahi I may not be very objective, but I claim it is having a quality other tools don't have : it is easy to abuse for format other than web logs and here is an example out of the norm : parsing CSV.
Foreword what is yahi?
Ploting histograms or time series from CSV
CSV that can be parsed as regexp
There are simple cases when CSV don’t have strings embedded and are litteraly comma separated integers/floats.
In this case, CSV can be parsed as a regexp and it’s all the more convenient when the CSV has no title.
Here is an example using the CSV coming from the CSV generated bytrollometre
A line is made off a timestamp followed by various (int) counters.
Then, all that remains to do is
You click on time series and can see the either the chronological time serie
Or the profile by hour
Raw approach with csv.DictReader
Let’s take the use case where my job insurance sent me the data of all the 10000 jobless persons in my vicinity consisting for each line of :
opaque id,civility,firstname, lastname, email,email of the counseler following the job less person
For this CSV, I have the title as the first line, and have strings that may countain “,”, hence the regexp approach is strongly ill advised.
What we want here is 2 histograms :
- the frequency of the firstname (that does not violates RGPD) and that I can share,
- how much each adviser is counseling.
Here is the code
Then, all that remains to do is
And here we can see that each counseler is following on average ~250 jobless persons.
And the frequency of the firstname
Which correlated with the demographic of the firstname as included here below tends to prove that the older you are the less likeky you are to be jobless.
I am not sayingageism, the data are doing it for me.
https://beauty-of-imagination.blogspot.com/2025/09/abusing-yahi-log-based-statistic-tool.html