Different operating systems like Linux and Windows have different text file formats which are not compatible with each other. While using these platforms to edit and read text file there will be problems. If we want to use a text file which is created or edited in Windows or MS-DOS environment in Linux we need to change the text file into Linux format with the dos2unix
tool.
Install dos2unix
As dos2unix
tool is a very simple tool we can install it for different Linux distributions like below easily.
Ubuntu, Debian, Mint, Kali
We will use apt
command and dos2unix
package name for installation.
$ sudo apt install dos2unix

Fedora, CentOS, RedHat
We will use dnf
command like below.
$ sudo dnf install dos2unix

Hidden Characters Problem
As stated previously there are some characters which are not used in Linux text files but used in Windows text files.
- End of line is specified with Carriage Return `CR` followed by Line Feed `LF` in Windows but In Linux, only Line Feed `LF` is used.
- Default encoding of Windows Files are UTF-16 but in Linux, UTF-8 is used
- Binary files are skipped automatically
Print Given File Encoding and Type
We will start by listing or printing the given file encoding type and line terminator type. We will use the Linux file
command which provides this information.
$ file a.txt

We can see that the file a.txt
is encoded with the ISO-8859
and have CRLF
or Windows line terminators.
Convert From DOS To Unix Format
We will start with a simple example where we will convert the file which is created in the Windows or MS-DOS environment named a.txt
into a file which will be used in the Linux environment. a.txt
will be converted as the same file name.
$ dos2unix a.txt

After checking with the file
command we can see that it is referred to as text file without CRLF line terminator.
Specify Conversion Mode
While converting there are different conversion modes which provide alternative about the conversion. We can use the following options and conversion modes.
- `-iso` will convert from default code page into Unix Latin-1
- `-850` will convert from DOS CP850 to Unix Latin-1
- `-1252` will convert from Windows CP1252 to Unix UTF-8 or Unicode
In this example, we will convert to ASCII format with the -iso
option.
$ dos2unix -iso a.txt

We can see that detailed information is provided during conversion.
Keep Date Stamp Of Old File
Timestamps are file attributes those holds information like creation time, modification time, access time etc. While converting a text file from DOS format to Linux format this information will be changed. We can preserve the time stamp information with the -k
option like below.
$ dos2unix -k a.txt
Specify New File Name
Up to now we have converted files and replaced. We can specify a new file name where the converted file content will be written to this new file and old file will be kept without a change. We will use -o
option and specify the new file name.
In this example, we will create the newly converted file name b.txt
.
$ dos2unix -n a.txt b.txt

Recursive Convert with find Command
If there are a lot of files to be converted to Unix or Linux text file format converting them one by one is trivial work. We need to run the bulk operation on all files. We can use find
command which will execute given dos2unix command on all found files which are specified as txt
.
In this example, we will find all text files which is specified with -name *.txt
in the current working directory which are specified with .
and then run dos2unix
command all of these files with the xargs
command.
$ find . -name "*.txt" | xargs dos2unix