Counting IP Addresses in a Log File
By Pete Freitag
I've been using grep
to search through files on linux / mac for years, but one flag I didn't use much until recently is the -o
flag. This tells grep to only output the matched pattern (instead of lines that mach the pattern).
This feature turns out to be pretty handy, let's say you want to find all the IP addresses in a file. You just need to come up with a regular expression to match an IP, I'll use this: "[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+"
it's not perfect, but it will work.
grep -o "[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+" httpd.log
How can I find unique ip addresses in a log file?
We can use the uniq
command to remove duplicate ip addresses, but uniq
needs a sorted input. We can do that with the sort
command, like so:
grep -o "[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+" httpd.log | sort | uniq
Show me the number of times each IP shows up in the log
Now we can use the -c
flag for uniq
to display counts:
grep -o "[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+" httpd.log | sort | uniq -c
This will output something like:
7 10.0.0.30 1 10.0.0.80 3 10.0.0.70
The ip counts are not in order, so we can pass our results through sort again, this time with the -n
flag to use a numeric sort.
grep -o "[0-9]\+\.[0-9]\+\.[0-9]\+\.[0-9]\+" httpd.log | sort | uniq -c | sort -n
The above will put them in order from least to greatest, you can pipe the result to tail
if you only want to see the top N IP addresses!
Pretty handy right? It works great for counting or finding ip addresses in nginx, apache or any kind of log files with ip addresses.
Other regex patterns to match an IP address
As I mentioned the pattern we are using above is not perfect, but it works pretty well and is reasonably easy to understand. Here are a few regular expressions that can be used to match IP addresses in a log file (note I have taken out some of the escaping):
[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+
- this is the one used above, the shortfall is that it can match more than 3 numbers in each octet position. We will improve the pattern in the next one.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}
- this improves the accuracy because each octet is now 1-3 digits, but is a bit long and still not perfect as it can match 999.999.999.999 which is not valid.[0-9.]+
- short simple regex, but will end up matching all kinds of numbers and not just ips.[0-9.]{6,15}
- a simple and short pattern, takes advantage the length of the IP being between 6 and 15 characters.
If you want an even more accurate regex pattern to match an ip address, it gets quite complex and lengthy. You have to account for the fact that the max number is 255 in each octet position. This means using or |
statements within the pattern, it gets quite long so if your log file is not getting any false positives you might just stick to a simpler pattern.
Counting IP Addresses in a Log File was first published on October 11, 2019.
If you like reading about linux, bash, grep, sort, or uniq then you might also like:
- Recursively Counting files by Extension on Mac or Linux
- The 15 Most Useful Linux commands
- Creating a Symbolic Link with ln -s What Comes First?
- Bash Loop To Wait for Server to Start
Weekly Security Advisories Email
Advisory Week is a new weekly email containing security advisories published by major software vendors (Adobe, Apple, Microsoft, etc).
Comments
cat access.log | cut -d" " -f9 | sort | uniq -c | sort -rn > output.log
With Windows, though, the output includes the filename. Not too problematic. But I wish I were a better dark-arts-regex wizard, like you. I often scan my SSH server logs for hacking attempts (there are many!) and manually block the IP addresses at the firewall. Unfortunately, the *reason* associated with the IP address in on the NEXT line (I_LOGON_AUTH_FAILED), therefore the regex doesn't quite work for me. (It's all XML.)
But this is a great use of grep!
Thanks for the post.