Techniques for Manipulating Text (and why that’s useful)
presented by Evan Schiff
What do these files have in common?
ALE
Avid Bin Export
FCP XML
SubCap
They are all plain text.
STL Subtitles
You Can Do A Lot With Plain Text
Reformat one type of file into another Add, remove, or fix data generated by other applications Quickly hunt down a specific piece or pattern of information within a large document Make changes en masse to avoid time-consuming manual adjustments Parse it using a Terminal command or script
Some Real World Examples
Convert an EDL with Locators into a SubCap file for importing back into Avid Convert Avid Locators into a DVDSP or Compressor Chapter Markers file Create an EDL out of data in your Filemaker codebook Batch rename files with advanced substitution patterns Process a list of missing ProTools media to automatically hunt it down
What Tools Do You Need? A good text editor. (Not TextEdit or Notepad.)
Textmate ($64)
Atom (Free)
Sublime ($70)
Starting Out A) What type of data do you have? B) What type of data do you need? How do you turn A into B?
Starting Out Get familiar with everything a text editor can do. Learn to navigate using the keyboard. Experiment.
After Learning the Basics
Learn Regular Expressions (RegEx) Learn a programming language such as Python, Javascript, or Bash. Combine scripting with RegEx to accomplish complex tasks Constantly reassess your workflow to find faster and easier methods If you come up with something cool, share it!
Text Manipulation Without Regular Expressions
Multiple Cursors Demo Most of the time when we want to manipulate text, we want to change a lot of it all at once One way to do that is with Multiple Cursors Let’s look at what that is, and how it can be useful
Multiple Cursors Demo Cmd-F: Find Opt-Enter: Find All, Add a Cursor at every Occurrence Cmd-Click Text: Add a Cursor manually Cmd-Shift-L: For every line of selected text, add a Cursor
OS X Keyboard Navigation • Cmd ←/→: Go to start/end of line • Cmd ↑ / ↓: Go to top/bottom of document • Opt ←/→: Go to previous/next word • Add Shift to select or unselect text
Windows Keyboard Navigation • Home/End: Go to start/end of line • Ctrl Home/End: Go to top/bottom of document • Ctrl ←/→: Go to previous/next word • Add Shift to select or unselect text
Text Manipulation Using Regular Expressions
What are Regular Expressions?
Regular Expressions (RegEx) are a way to define patterns of text They enable you to find and select text that matches those patterns And with that matching text selected you can then change it to suit your needs
What Does It Look Like?
(\d{3}[^\n]*([0-9:]{11})\s([0-9:]{11})\s?\n[\s\S]*?(?=^\d{3}|^>|^$))
Wait, what the #\.$*^# is that?!
Hold it, Hold it, What the hell is that shit?! RegEx is made up of codes Each code represents a set of characters such as
letters, numbers, and $#!* When written in a specific sequence, they define a pattern of text. Let’s take a closer look.
What does RegEx look like? Simple: Timecode:
\d{2}:\d{2}:\d{2}:\d{2}
E-Mail Address:
[\w.-]+@[A-Za-z0-9.-]+\.[A-Z]{2,4}
01:02:03:04
[email protected]
Complex: EDL Event: Reference Locator in an EDL: Reference:
(\d{3}[^\n]*([0-9:]{11})\s([0-9:]{11})\s?\n[\s\S]*?(?=^\d{3}|^>|^$)) 003 08P013V V
C
13:31:20:14 13:31:28:16 04:00:42:22 04:00:51:00
\* LOC.*[\d:]{11}\s+([\w]+) +\b([^\r\n]*?)\r?\n * LOC: 04:00:47:20 BLUE
CS0020
What are some other patterns you can think of? Let’s test them with Rubular.com
RegEx Structure Regular Expressions consist of character codes and quantifiers Or in other words, what is the character you are looking for
and how many times do you expect to see it? Search Criteria
Code
Example
3 Digits:
\d{3}
315
A word of any length:
\w+
Avid
colou?r
color or colour
Zero or one ‘u’:
RegEx Symbols Symbol .
What It Represents Any character except line break
\w
Any character that could be part of a word
\d
Any digit
\s
Whitespace (spaces, tabs, line breaks)
\t
Tab
\n and \r [ ]
Such As A-Z 0-9 Special Characters
A-Z 0-9 _ 0-9
Line Break: \n is Mac/Linux, \r\n is Windows Match the letters or symbols inside the brackets. The example to the right matches only the letters D, E, or F
[DEF]
RegEx Quantifiers Symbol
What It Represents
?
Zero or One occurrence
+
One or more occurrence
*
Zero or more occurrences
{2} {2,5}
Exactly 2 occurrences Between 2 and 5 occurrences
Some Real World Demos Convert an EDL with Locators into a SubCap file for importing back into Avid Convert Avid Locators into a FCP or Compressor Chapter Markers file Batch rename files with advanced substitution patterns
When are RegEx Not Useful? When you don’t know or can’t clearly define a pattern If it’s faster to make the changes by hand than figure out what the pattern is When there’s no variation in what you’re searching for Sometimes a normal Find & Replace is all you need When there’s too much variation, don’t try the all-in-one approach. Maybe you can break it down into multiple smaller patterns
So What’s the Next Step?
Shell Scripting
What is Shell Scripting? Shell scripting uses a programming language to execute a series of commands in order to accomplish a more complex task
ProTools Example
In 40 lines of code, and using regular expressions, this script takes a text file of media that ProTools can’t find, locates it, and copies it to a directory on your desktop.
How Do I Learn?
Google it! Check out sites like Codecademy, Code Avengers, Khan Academy, etc. Pick a language to learn, and of the many languages out there,
I would probably start with Python
Thanks!