VulnTool
Discover, Triage, and Remediate CVE Vulnerabilities Across Your Infrastructure
Project Type
End-to-end product design for a Meta internal tool
My Role
Product Designer
Target Users
350 Quarterly Active People
Duration
Ongoing (March 2020 - Present)
Contribution
-
User research
-
Stakeholder interviews
-
Product management
-
Hosted Ideation Sessions
-
High-fidelity prototypes
-
Design
-
Usability Testing
Impact
Remediated 105% more vuln issues
117,905 remediated vuln issues in H2 2020; to 242,476 remediated vuln issues in H1 2021.
Reduced the average remediation time by 57.7%
From 71 days to 30 days.
Medium time to triage vulns decreased by 75%
From 51 days in June 2021, to 13 days in Jan 2021
Overview
Background
Meta’s code isn’t perfect- without proper patches and updates to our systems on a periodic basis they probably have vulnerabilities. For example, an application running on an outdated OS version could be breached if it isn’t upgraded to the latest version that secures the identified vulnerability.
At Meta, any leaked data negatively affects our brand integrity. So it is a very important issue!
The “EE Vuln Org Overview” dashboard highlights the large number of vulnerabilities across pillars, business units, and teams.
What is a Vulnerability?
-
CVE, short for Common Vulnerabilities and Exposures, is a list of publicly disclosed computer security flaws.
-
When someone refers to a CVE they mean a security flaw that's been assigned a CVE ID number.
-
CVE IDs have various CVSS scores to rank how crucial they are to fix quickly
-
Each CVE has a generic remediation solution- usually in the form of a patch upgrade.
-
When a scanned asset gets matched with a CVE ID, that is a Vuln
The Problem with Legacy Vuln Workflow
-
Inefficient Auto Remediation Task (Left)
-
Didn’t match CVE’s to hostnames
-
Have to google CVE info/solution
-
Task Tool statuses are too ambiguous
-
-
Tracking & Visibility
-
No way to quickly confirm remediation
-
No display of all scanned assets & CVE IDs
-
No lifecycle statuses i.e. “Fix Applied”
-
Lack of useful vuln-health & remediation charts
-
-
Navigation
-
No sorting & searching capabilities
-
No groupings of vulns that make sense i.e. share the same solution
-
Auto Remediation Task before VulnTool
Pre-VulnTool Workflow
VulnTool UI When I Started Working at Meta
When I joined the team in March 2020, I was the first product designer in my org Enterprise Engineering. A lead engineer had done all the UX/UI work himself so far, so there was lots of room for improvement as I integrated my UX practice into our team’s product development process.
Early UX Work and Designs
XFN Ideation Session (Quip Board)
Usability Testing Script
Engineer User Journey
VulnTool - Mid 2020 UI
Vulns Table
1. Status Quick Filters
Users had trouble finding the vulns they needed. We added quick filters here based on lifecycle states so they could drill down quicker.
2. Entity Groups
Since engineers don’t attack vulns on individual hostnames, we added a column with a selector as header to identify the affected entity groups.
3. Collapsed Navbar
Users complained there wasn’t enough screen space to view the vulns. We collapsed the navbar to increase real estate of vulns.
4. Saved Searches
Because it took a while for users to find what they are looking for, we added saved search capabilities that can be applied throughout the tool.
5. Split Views
To limit the number of search results and make vulns more discoverable, we created separate views for in-progress & remediated .We also Included a notice if vulns are in the other category with a view button.
6. Customized Columns
Since various users requested more metadata in their own order of preference, we added a column
7. Last Scan Date
Users had trouble identifying false positives. We added the last scan date here so users could easily compare it to the last seen at date to make that decision.
Homepage Dashboard
1. Team Toggle
When entering the tool, users often missed the search bar token for team ownership and were unknowingly looking at wrong vulns. We added this selector to make it more clear.
2. Active CVEs
We discovered the vulns table was difficult to discover all the CVEs and determine the impact one had on our infrastructure. We added this widget to communicate the vuln count and surface all CVEs in order of severity.
3. Recent Remediations
Users found it difficult to track remediation progress via the auto remediation tasks outside of VulnTool. We added this widget to surface that information better.
4. Open Tasks
To limit the number of search results and make vulns more discoverable, we created separate views for in-progress & remediated .We also Included a notice if vulns are in the other category with a view button.
5. Success & Productivity Tracking
We found the original progress charts/graphs were not readable or useful. I redesigned them here on its own page- emphasizing the severity breakdown of the vulnerabilities.
But Users Were Still Unhappy....
Because I entered this project without much technical knowledge of the space, I viewed many assumptions as facts. I later discovered the myriad of pain points our users faced.
I crafted a large UX Research plan to explore and dig deeper. I had to be very clever and organized about my documentation/presentation to convince the engineer project lead major changes were needed.
"How can we disable auto-task creation for Vulns? The auto task creation is simply adding addition noise."
"This is not the first time I've wasted hours of my life because you are not using information that you could easily collect."
"How can I group by CVE's and the host it effects? I would rather fix a CVE that goes to every host (that I care about) than fixing CVE's by host."
Time to Redefine the Problem!
Research Planning
I defined a research plan and sent out on a mission to reveal why the VulnTool was creating more headache than productivity.
Research Goals
-
Dig deeper why users aren’t satisfied with the VulnTool
-
Learn the needs of various user personas, teams, and dev platforms
-
Uncover the ideal remediation workflow to prevent errors and speed up the process
-
Define an optimized information architecture
Who
18 participants, mix of teams and personas
What
1:1 60 min interviews
When
Oct - Dec 2021
How
Remote Sessions through VC
Research Results
Research Data Points Whiteboard
Portion of Single Interview Synthesis
Research Themes
Reliability
Nexpose Vuln Data
Very generic. Doesn’t consider FB or OS specific steps
“The nexpose solution is pointless at best and probably harmful because the manually installed package would not get any further updates.”
False & Unknown Vulns
The tool can’t identify backports & has too many false/unknown CVSS vulns.
“95% of frustration dealing with backports & manually marking false positive.”
Task/Vuln Tool Relationship
It’s difficult to monitor remediation success via tasks due to poor data & complex patch schedules.
“I waste a lot of time when tasks don’t automatically close due to overlaps & gaps in scan cycles of general scanner vs when OS auto-patches.”
Discoverability
Solution Grouping Hides CVE Info
The task tool link view hides vulns & doesn’t communicate the highest priority ones.
“I don’t like the solution view. I want to view the highest CVSS & which hosts the vuln affects. I don’t see the information I need here.”
Active Vulns too Broad
The active vulns view has too many statuses that don’t reflect the true lifecycle of a vuln.
“I want to auto focus on vulns that need attention & in progress, rather than manually filtering out all the false positives.”
Scattered Metadata
Vuln data to make decisions and act on them is indirect & dispersed all over the views/task.
“The task itself doesn’t provide any helpful info. It just says you have this vulnerability.”
Simplicity
Absent Vuln Timeline
There is no aggregated single vuln history that includes vuln published date, status changes, & scanner updates.
“The last scan date is misleading, because it makes it seem like physical host was scanned but it is actually a nexpose run date.”
Labeling of Vulns
Application versus OS issues (and the OS type) aren’t communicated in the tool.
“Engineers attack vulns by OS type & applications. It makes sense to split up by them like that.”
Notifications & Noise
Unimportant vuln tasks and misleading critical CVE references creates distress & waste time.
“I am alarmed by the number of active vulns in home dashboard. In reality only a few are validated & in progress.”
Flexibility
Ownership of Entity Groups
Tasks and CVEs are matched to single hostnames that often are not assigned to the right person.
“Tasks are made on oncall rotations; it doesn’t consider vulns that need to be attacked on upstream dependencies.”
Overrides
There is no way to insert or override the correct versions and workarounds in the tool.
“I want to override solutions in bulk to easily communicate proper remediation steps to my team.”
Custom Columns & Saved Searches
The tool needs to be flexible to accommodate different types of users & edge cases.
“It’s too many clicks & filters to find the info I need to see. I want columns for the metadata in the extra vuln and host info.”
Ideation
After the large UX research study I brought together a group of users, XFN partners, and core team members to ideate on solutions after I presented the research deck- following design thinking techniques.
I also worked closely with the product manager on an “Opportunity -> Solution” tree.
Shift of Focus to Triaging
Triaging = The process of first assessing the validity of a vulnerability, and then assigning it to the right person / team to fix.
Production team's legacy “Triage Decision” table before we integrated them into VulnTool.
We studied their workflow, and created a plan to implement a triage process in our tool that caters to both Production & Corp environments.
Engineers need security analysts to complete a triage process prior to them receiving a remediation task so they can spend more time on implementing the fix quickly rather than investigation- increasing the speed and quantity we remediate vulnerabilities.
Triaging solves...
-
Make decisions at scale instead of individual vulns
-
Only Meta verified vuln data in remediation tasks
-
Correct remediation owners & affected entities
-
No reliance on faulty auto remediation task logic
-
Decreases excess notifications and noise for false positives
Triaging Flow Diagram
Triager User Persona
Role
-
Discovers the most critical vulnerabilities
-
Group vulns that can be remediated together
-
Manually creates remediation tasks for engineers
Frustrations
-
Can’t override vuln data in VulnTool- auto remediation tasks could have the wrong solution
-
Decentralized communication about remediation across different tools
-
Hard to apply vuln statuses in bulk on right groups
-
Hard to track remediation progress and success for a group of vulns
Goals
-
Facilitates efficient and quick remediation of vulns
-
Catch false positives and exceptions as early as possible
-
Provide engineers with clear and descriptive remediation steps
Needs
-
Triage multiple CVEs in bulk
-
Rescore a CVSS score based on environmental vector too
-
Triage only a subset based on entity group type i.e. OS vulns
-
Create remediation tasks for engineers directly from a triage
Triage Designs
Triage Form Annotations
1. Triage Metadata
Manually entered CVE data that will override our providers generic data. Now engineers will have actual and specific steps to remediate in Facebook context.
2. CVE List
Users are more interested in attacking vulns on a CVE level instead of individual hostnames so we listed them here- providing greater visibility of affected vulns.
3. Triage Statuses
Analysts can now enter a CVE status to mark all affected vulns at scale instead of one hostname at a time. False positives are marked sooner so engineers no longer receive unneeded remediation tasks.
4. Triage By Entity Groups
We added the ability to triage by specific affected entity groups for a single CVE since different affected hosts could have different solutions to remediate i.e. windows vs linux.
5. CVEs in Triage
Users can group CVEs that belong in the same triage- thus allowing creation of correct remediation data at scale easily. There is also a “CVE Info” dropdown so users can view CVE data while filling out the form.
CVE View with Triage Cards Annotations
1. CVE Cards
Since users were most concerned about vulns at the CVE level, we designed this view so the CVE list selection dynamically shows CVE data on the right. Tabs were added to display non triage CVE data, tasks, and hostnames.
If there is a triage, this section is collapsed by default to highlight the verified triage data below.
2. Affected Entity Groups Sidebar
Because vulns live on entity groups and not individual hostnames, we added this button to display affected entity groups for each CVE. We kept a sidebar so users can also look at while filling out triage form.
3. Sub Triage Icon
If a CVE has multiple sub triages (opposed to one for whole triage), we included this split icon for easy recognition. Triage cards that are not a sub triage don’t have this icon.
4. Triage Cards
Each triage receives its own card under the CVE data to distinguish it as verified data. We added triage specific fields such as the CVSS score, triager, and number of entities. Added collapsed states to show the full list of sub triages easily at a glance.
5. Triage Activity
Because often there is a lot of discussion about vulns, we added a section for users to add comments within triage activity where remediation tasks are also tracked.
6. Other Entities Card
If users wish to triage more entities than that were already selected in a sub triage, they can click here.
7. Create a Task Button
We added the ability for triagers to create remediation tasks for engineers directly from the triage card. Now they don’t have to manually create on their own, and all data in the task will be valid.
Triaging Was a Win!
Remediated 105% more vuln issues
117,905 remediated vuln issues in H2 2020; to 242,476 remediated vuln issues in H1 2021.
Reduced the average remediation time by 57.7%
From 71 days to 30 days.
Medium time to triage vulns decreased by 75%
From 51 days in June 2021, to 13 days in Jan 2021
EE Vuln oncall detected 171% more false positives
VulnTool "Group By" Redesign
The Problem
Previously the tool lacked dynamic groupings to make it easier to discover and triage vulns that belong in the same remediation task. The active vulns table only displayed individual scan results, and a triage could include hundreds of them. This resulted in endless scrolling, which was a huge pain point.
Without dynamic groupings that reflect true remediation efforts- like grouping vulns by operating systems, OS platforms, and assets types- it was nearly impossible to track remediation progress.
Active Vulns Table With Too Many Line Items
User Stories, Design Criteria, & Inspiration Audit
Ideation Session
Based on the problem statements me and the PM identified through previous research and the user personas I identified, I hosted an ideation session with the core team.
We generated many ideas that we then categorised based on feasibility into This MVP, next year improvements, and long term vision.