Text Processing with sed: A Comprehensive Guide
sed (stream editor) is a powerful Unix tool designed to perform basic text transformations on an input stream (a file or input from a pipeline). Unlike an interactive editor, sed processes text with commands supplied in scripts, allowing for non-interactive, repeatable workflows—a key advantage when automating text processing tasks.
1. Historical Background
- Origins: Developed in the mid-1970s by Lee E. McMahon at Bell Labs as part of the UNIX toolkit.
- Evolution: Standardized in POSIX, implemented in multiple environments (GNU sed, BSD sed).
- Philosophy: “Filter” design: reads input line by line, applies script, writes output.
2. Installation Version
Most Linux distributions include sed by default. To check:
sed --version
On Debian/Ubuntu:
sudo apt-get install sed
3. Core Concepts
- Input Stream: Lines fed to sed via file or pipeline.
- Patterns: Regular expressions that match text.
- Commands: Actions sed performs (substitute, delete, insert, etc.).
- Addresses: Specify lines or ranges to which commands apply.
4. Basic Syntax
General form:
sed [options] script inputfile
Example: Replace “foo” with “bar” globally:
sed s/foo/bar/g file.txt
5. Addresses Ranges
- Line Numbers: sed 2s/old/new/ (only line 2)
- Patterns: sed /start/,/end/d (delete lines from a match of “start” to “end”)
- Special Symbols:
- : last line
- ^ and : beginning/end of line in regex
6. Common Commands
| Command | Description | Example |
|---|---|---|
| s/pattern/replacement/flags | Substitute | s/hello/hi/g |
| d | Delete line | /foo/d |
| i text | Insert before line | 3i Header |
| a text | Append after line | 5a Footer |
| y/source/dest/ | Transliterate | y/abc/ABC/ |
7. Regular Expressions in sed
- Basic vs Extended: GNU sed -E enables extended regex (ERE).
- Special Characters: ., , [, ], ^, , .
- Examples:
- Match digits:
[0-9] - Group backreference:
([A-Za-z] )@1
- Match digits:
8. Scripts Multi-Command Files
Create a file script.sed:
/Error/d s/DEBUG/INFO/g 3i # Inserted line
Execute:
sed -f script.sed logfile.txt
9. Advanced Techniques
- Hold Space Pattern Space: Exchange data between two buffers using
h, H, g, G, x. - Branching: Use
bandtto jump within a script on success/failure. - Embedding Shell Commands:
sed s/VAR/date/e(GNU extension).
10. Performance Best Practices
- Avoid unnecessary regex overhead prefer precise addresses when possible.
- Chain multiple expressions with
-eor use scripts for readability. - Test complex scripts on representative data to catch edge cases.
- Use
LC_ALL=Cto speed up simple byte-wise operations.
11. Integration in Workflows
sed is often combined with other tools:
- grep for filtering relevant lines first.
- awk for field-oriented transformations.
- sort and uniq for aggregation.
- xargs to pass filenames or results to further commands.
12. Common Pitfalls
- Forgetting to escape regex metacharacters.
- Mixing GNU and POSIX behavior—test on target system.
- Overwriting files without backup (
sed -i.bakto create backups). - Locale-dependent patterns—use
LANG=Cfor consistency.
13. Example: Log File Sanitization
#!/usr/bin/env bash
# sanitize-logs.sh
LOG=1
sed -E -e /password/d
-e s/[0-9]{1,3}(.[0-9]{1,3}){3}/[REDACTED]/g
-e s/(user=)[^ ] /1[USER]/g LOG > sanitized.log
14. Security Note
When handling sensitive text—such as credentials or network configurations—ensure your scripts do not inadvertently expose data. If deploying over remote connections, consider encrypting your channel with OpenVPN or WireGuard to protect data in transit.
15. Further Reading
- GNU sed Manual
- POSIX sed Standard (IEEE Std 1003.1)
- Advanced Scripting Techniques in Unix/Linux Books
© 2024 Text Processing Mastery. All rights reserved.
Leave a Reply