3

I'm trying to figure out a way of counting how many attribute values (for a multi-valued attribute in LDAP) various different users have. For example, the data looks something like this...

dn: [email protected],ou=test,dc=acme,dc=com
accountid: a45ff948-e154-4c48-aa74-5b64ea876735

dn: [email protected],ou=test,dc=acme,dc=com
accountid: f8103174-7853-4b0c-8d0e-faa820c8eff8
accountid: 3bea64d3-98d5-4ff1-b654-d01e4e3128cd

dn: [email protected],ou=test,dc=acme,dc=com

dn: [email protected],ou=test,dc=acme,dc=com
accountid: 90ad7323-20ca-4087-9b13-62d5713ae57e

I'd like to have output along the lines of...

[email protected],ou=test,dc=acme,dc=com , 1
[email protected],ou=test,dc=acme,dc=com , 2
[email protected],ou=test,dc=acme,dc=com , 0
[email protected],ou=test,dc=acme,dc=com , 1

Or at the very least I'd just like to know (and print out) which DNs have multiple values for the accountid attribute.

Any ideas please?

Thanks in advance!

New contributor
darrensunley is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
1
  • Does your input include any other attributes or is it ONLY accountids? If there are others, add a few to your sample input so we can test with that. Commented 6 hours ago

4 Answers 4

5

This seems very well suited to Perl in paragraph mode, where a record ("line") is defined by two consecutive newline characters. Something like this:

$ perl -00 -ne '@n=split(/\n/); /(uid=.+?),/; print "$1, $#n\n"' file 
[email protected], 1
[email protected], 2
[email protected], 0
[email protected], 1

Explanation

  • -00: enable paragraph mode
  • -ne: read the input file line by line (here, because of the previous bullet point, a line is a paragraph) and apply the script given by -e to each line.
  • @n=split(/\n/): split the input line on \n characters (newline) into the array @n. This array will now have as many elements as there are actual lines in the paragraph.
  • /(uid=.+?),/; capture the string between uid= and the first comma, saving it as $1.
  • print "$1, $#n\n": print what was captured above followed by the last index of the array @n. Since arrays start from 0, the last index will be the same as the number of elements in the array minus one. Since we want the number of attribute lines which seems to be the number of lines in the paragraph excluding the first one, this gives the expected result.
3
  • Thanks so much for the quick response - it did exactly what I needed!! One final question please if you don't mind... do you know a way of printing out all/only the ones that have more than one accountid line please? I'd imagine it's something like "print the entire 'line' if the array count is greater than 1...? For example, in the above example it'd be only the whole of the user2 object that I'd be looking for please. Commented 15 hours ago
  • 1
    @darrensunley perl -00 -ne '@n=split(/\n/); print if $#n > 1' file should do it. Or, since this is Perl and TIMTOWTDI, perl -00 -ne 'print if scalar(split(/\n/)) > 2' file. Commented 14 hours ago
  • Thanks again - you've been super helpful!!! Commented 14 hours ago
5

Using Miller (mlr) to read the data as "xtab" input (a format where records are separated by empty lines and fields are separated by newlines, with the field name and a tab at the start of the line), with the tabs in the format replaced by : (colon+space):

$ mlr --xtab --ips ': ' put -q 'print $dn, NF - 1' file
[email protected],ou=test,dc=acme,dc=com 1
[email protected],ou=test,dc=acme,dc=com 2
[email protected],ou=test,dc=acme,dc=com 0
[email protected],ou=test,dc=acme,dc=com 1

This simply outputs the dn field and the count of however many other fields there are in the record.

If you need commas in the output, use print $dn . ", " . string(NF - 1) as the put expression. Wrap that in a conditional that only prints the expression if NF > 1 (if you wish), like so:

$ mlr --xtab --ips ': ' put -q 'NF > 1 { print $dn . ", " . string(NF - 1) }' file
[email protected],ou=test,dc=acme,dc=com, 1
[email protected],ou=test,dc=acme,dc=com, 2
[email protected],ou=test,dc=acme,dc=com, 1

Alternatively, add the count as a new field and then cut out the dn field and your new field (output is on a whitespace-delimited indexed fields format ("nidx")):

$ mlr --x2n --ips ': ' put '$c = NF - 1' then cut -f dn,c file
[email protected],ou=test,dc=acme,dc=com 1
[email protected],ou=test,dc=acme,dc=com 2
[email protected],ou=test,dc=acme,dc=com 0
[email protected],ou=test,dc=acme,dc=com 1

Add --ofs ', ' to the options if you want comma+space as the output field delimiter. Here's how that may look, together with filtering out any record whose calculated c value is zero:

$ mlr --x2n --ips ': ' --ofs ', ' put '$c = NF - 1' then filter -x '$c == 0' then cut -f dn,c file
[email protected],ou=test,dc=acme,dc=com, 1
[email protected],ou=test,dc=acme,dc=com, 2
[email protected],ou=test,dc=acme,dc=com, 1
2
  • No need to filter after put as in mlr --x2n --ips ': ' --ofs ', ' put '$c = NF - 1; $c == 0' then cut -f dn,c' file1. And You know it. Commented 11 hours ago
  • @PrabhjotSingh So it is, but it's not evident. Commented 10 hours ago
2

And here's an awk approach:

$ awk -F'[ ,]' -v RS='\n\n' '{n=split($0,a,"\n"); print $2,n-1}' file 
[email protected]  1
[email protected]  2
[email protected]  0
[email protected]  2   <-- WRONG! See text

Explanation

  • -F'[ ,]': set the field separator to space or comma.
  • -v RS='\n\n'" set the record separator to be \n\n, so a paragraph.
  • n=split($0,a,"\n");: split the current record (paragraph) on \n into the array a. The number returned (n) is the number of elements in this array, so the number of \n characters in this record, and therefore the number of attributes plus one.
  • print $2,n-1: print the second field (since we are using space and = as the field separator, on your file this will be the string after the first space and before he first ,), and the value of n minus one, so the number of lines minus one.

Note that this is getting the last record wrong. That's because the file doesn't end with two consecutive newline characters but only with the default trailing newline that all text files require. As a result, this is also counted for the last record. You can get around this by adding an extra newline:

$ printf '%s\n\n' "$(cat file)" | 
   awk -F'[ ,]' -v RS='\n\n' '{n=split($0,a,"\n"); print $2,n-1}' 
[email protected] 1
[email protected] 2
[email protected] 0
[email protected] 1

Or, you print only after moving to the next line, and then print the last one specially:

$ awk -F'[ ,]' -v RS='\n\n' '{ 
   if(last){print last}; n=split($0,a,"\n"); last=$2" "n-1}
    END{print $2,n-2
 }' file 
[email protected] 1
[email protected] 2
[email protected] 0
[email protected] 1
3
  • 1
    My usual approach to handling this scenario is to put the block output processing in a function and then not only call it at the end of the block but also from END { } Commented 14 hours ago
  • You should mention that requires GNU awk for multi-char RS. With a POSIX awk RS='\n\n' would be treated the same as RS='\n' which is the default value. You should instead set RS='' to put any awk into paragraph mode and that'd also solve the problem you point out of getting the wrong value for the last record in the input. Commented 6 hours ago
  • If you didn't want to use RS='' for some reason, with GNU awk you could use RS='\n(\n|$)' instead to accommodate the case of just 1 newline at the end of the OPs input. Commented 3 hours ago
2

Using awk:

   $ awk '/^accountid: /{c++}
          /^dn: /{ 
          if (dn) print dn,c;c=0;
          sub(/^dn /,"");
          dn=$0
    }END{print dn,c}'

Or awk in paragraph mode:

$ awk -v RS= -F'\n' '{sub(/^dn: /,"");print $1"," NF-1}'

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.