Comment télécharger et exploiter les fichiers journaux pour un hébérgement Web OVH

Intro

Je possède un Hébergement Web (en fait vous êtes actuellement dessus…) chez OVH. N'étant pas vraiment satisfait des outils de statistiques proposés : Urchin n'est plus maintenu, OVHcloud Web Statistics qui est encore jeune et Awstats que je trouve vraiment bien mais qui ne propose de voir l'activité que sur une journée.

C'est la raison pour laquelle j'ai décidé de voir comment télécharger les fichiers de logs afin de pouvoir les exploiter manuellement.

Configuration

Créer un utilisateur dédié pour les logs

Nous ne pouvons pas utiliser les identifiants principaux pour récupérer les journaux, nous devons donc créer un compte spécial dédié à cela depuis l'espace client OVHCloud Web.

OVH | Interface web principale OVH | Demande d'identifiants. OVH | Tableau de bord OVH | Menu plus+ OVH | Hébergement web, création d'un nouvel utilisateur pour les statistiques et logs. OVH | Créer un nouvel utilisateur étape 1 OVH | Créer un nouvel utilisateur étape 2 OVH | Créer un nouvel utilisateur étape 3 OVH | Interface Web, Statistiques et logs url

Et voila nous avons maintenant toutes les informations nécessaires pour pouvoir télécharger nos journaux.

Télécharger les journaux

GNU/Linux

Définir les variables :

[user@host ~]$ USR=ovhlogsuser
[user@host ~]$ PASS=Myverycomplexpassw0rD
[user@host ~]$ URL=https://log.clusterXXX.hosting.ovh.net/shebangthedolphins.net/
[user@host ~]$ DOMAIN=$(awk -F '/' '{ print $4 }' <<< $URL)
[user@host ~]$ DOMAIN=shebangthedolphins.net

Téléchargement

[user@host ~]$ wget --http-user="$USR" --http-password="$PASS" -A *gz -r -nd ""$URL"/logs/logs-10-2020/"
[user@host ~]$ ls -lh
total 596K
-rw------- 1 std std  55 10 déc.   2019 robots.txt.tmp
-rw-r--r-- 1 std std 24K  2 oct.  02:56 shebangthedolphins.net-01-10-2020.log.gz
-rw-r--r-- 1 std std 17K  3 oct.  02:19 shebangthedolphins.net-02-10-2020.log.gz
-rw-r--r-- 1 std std 14K  4 oct.  02:32 shebangthedolphins.net-03-10-2020.log.gz
-rw-r--r-- 1 std std 15K  5 oct.  02:34 shebangthedolphins.net-04-10-2020.log.gz
[…]
-rw-r--r-- 1 std std 16K 28 oct.  06:18 shebangthedolphins.net-27-10-2020.log.gz
-rw-r--r-- 1 std std 16K 29 oct.  06:08 shebangthedolphins.net-28-10-2020.log.gz
-rw-r--r-- 1 std std 15K 30 oct.  06:08 shebangthedolphins.net-29-10-2020.log.gz
-rw-r--r-- 1 std std 14K 31 oct.  06:08 shebangthedolphins.net-30-10-2020.log.gz
-rw-r--r-- 1 std std 52K  1 nov.  06:08 shebangthedolphins.net-31-10-2020.log.gz
[user@host ~]$ wget --http-user="$USR" --http-password="$PASS" "$URL"/logs/logs-$(/bin/date --date='1 days ago' '+%m-%Y')/"$DOMAIN"-$(/bin/date --date='1 days ago' '+%d-%m-%Y').log.gz
[user@host ~]$ ls -lh
total 20K
-rw-r--r-- 1 std std 18K 25 nov.  06:29 shebangthedolphins.net-24-11-2020.log.gz
[user@host ~]$ perl-rename -v 's/(.*)-(\d\d)-(\d\d)-(\d\d\d\d)(.*)/$4-$3-$2-$1$5/' *gz
[user@host ~]$ ls -lh
total 596K
-rw-r--r-- 1 std std 24K  2 oct.  02:56 2020-10-01-shebangthedolphins.net.log.gz
-rw-r--r-- 1 std std 17K  3 oct.  02:19 2020-10-02-shebangthedolphins.net.log.gz
-rw-r--r-- 1 std std 14K  4 oct.  02:32 2020-10-03-shebangthedolphins.net.log.gz
[user@host ~]$ wget --http-user="$USR" --http-password="$PASS" "$URL"/logs/logs-$(/bin/date --date='1 days ago' '+%m-%Y')/"$DOMAIN"-$(/bin/date --date='1 days ago' '+%d-%m-%Y').log.gz -O /tmp/$(/bin/date --date='1 days ago' '+%Y-%m-%d')-"$DOMAIN".log.gz
[user@host ~]$ ls -lh /tmp/*gz
-rw-r--r-- 1 std std 18K 25 nov.  06:29 /tmp/2020-11-24-shebangthedolphins.net.log.gz
[user@host ~]$ for DAY in $(seq 1 30); do wget --http-user="$USR" --http-password="$PASS" "$URL"/logs/logs-$(/bin/date --date=''$DAY' days ago' '+%m-%Y')/"$DOMAIN"-$(/bin/date --date=''$DAY' days ago' '+%d-%m-%Y').log.gz -O /tmp/$(/bin/date --date=''$DAY' days ago' '+%Y-%m-%d')-"$DOMAIN".log.gz; done
[user@host ~]$ ls -lh /tmp/*gz
-rw-r--r-- 1 std std 17K 26 oct.  06:12 /tmp/2020-10-25-shebangthedolphins.net.log.gz
-rw-r--r-- 1 std std 14K 27 oct.  06:44 /tmp/2020-10-26-shebangthedolphins.net.log.gz
-rw-r--r-- 1 std std 16K 28 oct.  06:18 /tmp/2020-10-27-shebangthedolphins.net.log.gz
[...]
-rw-r--r-- 1 std std 15K 23 nov.  06:03 /tmp/2020-11-22-shebangthedolphins.net.log.gz
-rw-r--r-- 1 std std 18K 24 nov.  06:38 /tmp/2020-11-23-shebangthedolphins.net.log.gz
-rw-r--r-- 1 std std 18K 25 nov.  06:29 /tmp/2020-11-24-shebangthedolphins.net.log.gz

Windows/PowerShell

Définir les variables :

PS C:\Users\std> $user = "ovhlogsuser"
PS C:\Users\std> $pass = "Myverycomplexpassw0rD"
PS C:\Users\std> $secpasswd = ConvertTo-SecureString $pass -AsPlainText -Force
PS C:\Users\std> $credential = New-Object System.Management.Automation.PSCredential($user, $secpasswd)
PS C:\Users\std> $domain = "shebangthedolphins.net"
PS C:\Users\std> $url = "https://log.clusterXXX.hosting.ovh.net/$domain/"

Téléchargement

PS C:\Users\std> Invoke-WebRequest -Credential $credential -Uri ("$url" + "logs/logs-" + $((Get-Date).AddDays(-1).ToString("MM-yyyy")) + "/$domain" + "-" + $((Get-Date).AddDays(-1).ToString("dd-MM-yyyy"))  + ".log.gz") -OutFile "$((Get-Date).AddDays(-1).ToString("yyyy-MM-dd"))-$domain.log"
PS C:\Users\std> dir


    Directory: C:\Users\std


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----       05/12/2020     15:45         360238 2020-12-04-shebangthedolphins.net.log
PS C:\Users\std>  1..30 | ForEach-Object { Invoke-WebRequest -Credential $credential -Uri ("$url" + "logs/logs-" + $((Get-Date).AddDays(-"$_").ToString("MM-yyyy")) + "/$domain" + "-" + $((Get-Date).AddDays(-"$_").ToString("dd-MM-yyyy"))  + ".log.gz") -OutFile "$((Get-Date).AddDays(-"$_").ToString("yyyy-MM-dd"))-$domain.log" }

Extraire les informations

GNU/Linux

Maintenant que nous avons téléchargé des fichiers journaux, voyons voir avec quelques exemples comment en extraire des informations.

Statistiques des pages consultées

[user@host ~]$ DOMAIN=shebangthedolphins.net
[user@host ~]$ zgrep -viE 'Bytespider|Trident|bot|404|GET \/ HTTP|BingPreview|Seekport Crawler' 2020-11-24-shebangthedolphins.net.log.gz | grep html | awk '{ print $1" "$11 }' | grep "$DOMAIN" | sort | uniq | awk '{ print $2 }' | sort | uniq -c | sort -n | tr -s "[ ]" | sed 's/^ //'
1 "http://shebangthedolphins.net/backup_burp.html"
1 "http://shebangthedolphins.net/prog_autoit_backup.html"
1 "http://shebangthedolphins.net/prog_powershell_kesc.html"
1 "http://shebangthedolphins.net/vpn_ipsec_06linux-to-linux_tunnel-x509.html"
1 "https://shebangthedolphins.net/fr/windows_grouppolicy_execute_powershell_script.html"
1 "https://shebangthedolphins.net/gnulinux_courier.html"
1 "https://shebangthedolphins.net/gnulinux_vnc_remotedesktop.html"
1 "https://shebangthedolphins.net/vpn_openvpn_windows_server.html"
1 "https://shebangthedolphins.net/windows_icacls.html"
1 "http://www.shebangthedolphins.net/vpn_ipsec_03linux-to-windows_transport-psk.html"
2 "https://shebangthedolphins.net/windows_mssql_alwayson.html"
3 "https://shebangthedolphins.net/fr/vpn_openvpn_buster.html"
7 "https://shebangthedolphins.net/ubiquiti_ssh_commands.html"
[user@host ~]$ DOMAIN=shebangthedolphins.net
[user@host ~]$ for i in *.log.gz; do echo "------------"; echo "$i"; zgrep -viE 'Bytespider|Trident|bot|404|GET \/ HTTP|BingPreview|Seekport Crawler' "$i" | grep html | awk '{ print $1" "$11 }' | grep "$DOMAIN" | sort | uniq | awk '{ print $2 }' | sort | uniq -c | sort -n | tr -s "[ ]" | sed 's/^ //' | wc -l; done
------------
2020-11-18-shebangthedolphins.net.log.gz
19
------------
2020-11-19-shebangthedolphins.net.log.gz
19
------------
2020-11-20-shebangthedolphins.net.log.gz
24
------------
2020-11-21-shebangthedolphins.net.log.gz
8
------------
2020-11-22-shebangthedolphins.net.log.gz
16
------------
2020-11-23-shebangthedolphins.net.log.gz
15
------------
2020-11-24-shebangthedolphins.net.log.gz
13
[user@host ~]$ DOMAIN=shebangthedolphins.net
[user@host ~]$ YEAR=2020
[user@host ~]$ for i in $(seq -w 1 12); do echo "------------"; echo "$YEAR-$i"; zgrep -viE 'Bytespider|Trident|bot|404|GET \/ HTTP|BingPreview|Seekport Crawler' $YEAR-"$i"*.log.gz | grep html | awk '{ print $1" "$11 }' | grep "$DOMAIN" | sort | uniq | awk '{ print $2 }' | sort | uniq -c | sort -n | tr -s "[ ]" | sed 's/^ //' | wc -l; done 2>/dev/null
------------
2020-01
101
------------
2020-02
73
------------
2020-03
92
------------
2020-04
91
------------
2020-05
87
------------
2020-06
73
------------
2020-07
81
------------
2020-08
97
------------
2020-09
135
------------
2020-10
151
------------
2020-11
154

Statistiques des pages consultées depuis un moteur de recherche

[user@host ~]$ zgrep "html HTTP.*200.*[0-9]\{4\} \"\(https://www.google\|https://www.bing\|https://www.qwant\|https://duckduckgo\)" 2020-11-24-shebangthedolphins.net.log.gz | sed 's/.*GET \(.*\.html\).*/\1/' | sort | uniq -c | sort -n
      1 /backup_burp.html
      1 /fr/windows_grouppolicy_execute_powershell_script.html
      1 /gnulinux_vnc_remotedesktop.html
      1 /prog_powershell_kesc.html
      1 /vpn_ipsec_03linux-to-windows_transport-psk.html
      1 /vpn_ipsec_06linux-to-linux_tunnel-x509.html
      1 /vpn_openvpn_windows_server.html
      1 /windows_icacls.html
      1 /windows_mssql_alwayson.html
      2 /fr/vpn_openvpn_buster.html
      6 /ubiquiti_ssh_commands.html
[user@host ~]$ zcat *.log.gz | grep "html HTTP.*200.*[0-9]\{4\} \"\(https://www.google\|https://www.bing\|https://www.qwant\|https://duckduckgo\)" | sed 's/.*GET \(.*\.html\).*/\1/' | sort | uniq -c | sort -n
[...]
     11 /gnulinux_vnc_remotedesktop.html
     11 /openbsd_packetfilter.html
     12 /fr/gnulinux_nftables_examples.html
     15 /windows_imule.html
     20 /backup_burp.html
     36 /fr/vpn_openvpn_buster.html
     38 /vpn_openvpn_windows_server.html
    112 /ubiquiti_ssh_commands.html
[user@host ~]$ YEAR=2020
[user@host ~]$ for i in $(seq -w 1 12); do echo "------------"; echo "$YEAR-$i"; zcat $YEAR-"$i"*.log.gz | grep "html HTTP.*200.*[0-9]\{4\} \"\(https://www.google\|https://www.bing\|https://www.qwant\|https://duckduckgo\)" | sed 's/.*GET \(.*\.html\).*/\1/' | sort | uniq -c | sort -n | wc -l; done 2>/dev/null
------------
2020-01
25
------------
2020-02
26
------------
2020-03
27
------------
2020-04
28
------------
2020-05
25
------------
2020-06
27
------------
2020-07
33
------------
2020-08
29
------------
2020-09
67
------------
2020-10
58
------------
2020-11
68

Script

Un script que j'utilise pour voir rapidement l'évolution des résultats des pages consultées depuis les moteurs de recherche.

Code

#! /bin/sh
for LOGS in *.log.gz; do
	echo "-----------------------------"
	echo "LOGS : $LOGS"
	for i in www.google www.bing www.qwant duckduckgo; do
		RESULT=$(zgrep "html HTTP.*200.*[0-9]\{4\} \"https://"$i"" $LOGS | wc -l)
		echo "$i = $RESULT"
	done
done

Output

-----------------------------
LOGS : 2020-11-21-shebangthedolphins.net.log.gz
www.google = 6
www.bing = 1
www.qwant = 0
duckduckgo = 4
-----------------------------
LOGS : 2020-11-22-shebangthedolphins.net.log.gz
www.google = 10
www.bing = 2
www.qwant = 0
duckduckgo = 6
-----------------------------
LOGS : 2020-11-23-shebangthedolphins.net.log.gz
www.google = 7
www.bing = 6
www.qwant = 2
duckduckgo = 1
-----------------------------
LOGS : 2020-11-24-shebangthedolphins.net.log.gz
www.google = 12
www.bing = 2
www.qwant = 0
duckduckgo = 3

Windows/PowerShell

Statistiques des pages consultées

PS C:\Users\std> $domain = "shebangthedolphins.net"
PS C:\Users\std> Select-String .\2020-11-05-shebangthedolphins.net.log -NotMatch -Pattern "Bytespider","Trident","bot","404","GET / HTTP","BingPreview","Seekport Crawler" | Select-String -Pattern "html" | %{"{0} {1}" -f $_.Line.ToString().Split(' ')[0],$_.Line.ToString().Split(' ')[10]} | Select-String -Pattern "$domain.*html" | Sort-Object | Get-Unique | %{"{0}" -f $_.Line.ToString().Split(' ')[1]} | group -NoElement | Sort-Object Count |  %{"{0} {1}" -f $_.Count, $_.Name }
1 "https://shebangthedolphins.net/fr/prog_introduction.html"
1 "https://shebangthedolphins.net/fr/prog_sh_check_snmp_synology.html"
1 "http://shebangthedolphins.net/openbsd_network_interfaces.html"
1 "https://shebangthedolphins.net/fr/menu.html"
1 "https://shebangthedolphins.net/fr/windows_commandes.html"
1 "https://shebangthedolphins.net/fr/windows_run_powershell_taskschd.html"
1 "http://shebangthedolphins.net/prog_sh_check_snmp_synology.html"
1 "https://shebangthedolphins.net/fr/windows_grouppolicy_reset.html"
1 "https://shebangthedolphins.net/fr/windows_grouppolicy_update_policy.html"
1 "https://shebangthedolphins.net/virtualization_kvm_windows10.html"
1 "https://shebangthedolphins.net/windows_event_on_usb.html"
1 "https://shebangthedolphins.net/prog_powershell_movenetfiles.html"
1 "http://shebangthedolphins.net/prog_autoit_backup.html"
1 "https://shebangthedolphins.net/fr/vpn_openvpn_buster.html"
1 "http://shebangthedolphins.net/prog_powershell_kesc.html"
1 "https://shebangthedolphins.net/index.html"
1 "http://shebangthedolphins.net/gnulinux_courier.html"
4 "https://shebangthedolphins.net/ubiquiti_ssh_commands.html"
6 "https://shebangthedolphins.net/vpn_openvpn_windows_server.html"

Sources

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Contact :