(Updated: )
/ #node #javascript #python 

How to integrate Python/Ruby/PHP/shell script with Node.js using child_process.spawn or child_process.exec

There are occasions when running a Python/Ruby/PHP shell script from Node.js is necessary. This post looks at best practices around leveraging child_process.spawn and child_process.exec to encapsulate this call in Node.js/JavaScript.

The goal here is to have an interoperability layer between Node.js and an outside shell. This is a quick workaround if some other part of your system isn’t developed in JavaScript.

spawn is used over exec because we’re talking about passing data, and potentially large amounts of it. To understand the difference between child_process.spawn and child_process.exec (see “Difference between spawn and exec of Node.js child_process”).

We’ll use exec to run arbitrary commands (eg. Pandoc generation), it’s ideal for small amounts of data (under 200k) using a Buffer interface and spawn for larger amounts using a stream interface.

spawn has a more verbose syntax for some of the use-cases we’ll look at, but it’s more serviceable for integrating with Ruby/Python/PHP since we might get more data than a couple of lines of text.

exec is brilliant to integrate with system binaries (where we don’t care about the output).

Full examples github.com/HugoDF/node-run-python.

Table of Contents

The following examples contain 2 sections.

The part that actually runs the shell command, usually a function called run. They also contain an IIFE (“immediately invoked function expression”) that actually calls it ie. (async () => { await run() })(). This IIFE is a nice pattern enabled by async/await (see Async JS: history, patterns and gotchas) but it’s just there for illustration purposes since it represents the call to the wrapped spawn call from another part of your application.

Integrating with child_process.exec

Generate a PDF/ePub from markdown files with pandoc and child_process.exec

The following script requires pandoc to be installed (see Installing pandoc).

We use util.promisify to turn exec from a callback-based to a Promise-based interface. See Async JavaScript: history, patterns and gotchas for more details on that.

const {promisify} = require('util');
const {exec} = require('child_process');
const execCmd = promisify(exec);

async function runPandoc(markdownPaths, outPath) {
  const options = `-s -V geometry:margin=.5in  -V documentclass=report -V author="Author Name" --toc`;
  const cmd = `pandoc ${options} -o ${outPath} ${markdownPaths.join(' ')}`;
  await execCmd(cmd);
}

I use this function to generate the Sequelize ES6 Cheatsheet PDF/ePub and the upcoming Advanced Jest Handbook PDF/ePub.

runPandoc can then be used as follows, the beauty of Pandoc is that you can use it to generate multiple types of documents with the same code:

(async () => {
  await runPandoc(['part1.md', 'part2.md'], './hello.pdf'); // will collate markdown files and output a PDF.
  await runPandoc(['part1.md', 'part2.md'], './hello.epub'); // will collate markdown files and output an ePub file.
});

Integrating with child_process.spawn

Call a shell command and log it with child_process.spawn

Using spawn is overkill in this situation since echo is only going to return what’s passed to it.

The example is pretty self-explanatory and shows how to use child_process.spawn to “shell out” and read that data back.

spawn takes the executable to call as the first parameter and optionally an array of options/parameters for the executable as the second parameter.

const { spawn } = require('child_process');

function run() {
  const process = spawn('echo', ['foo']);
  process.stdout.on(
    'data',
    (data) => console.log(data.toString())
  );
}

(() => {
  try {
    run()
    // process.exit(0)
  } catch (e) {
    console.error(e.stack);
    process.exit(1);
  }
})();

Output:

$ node run.js

foo

Call Python for its version with child_process.spawn

We’ll move quite quickly to showcase how we would do something similar to the above with python. Note again how --version is passed inside of an array.

We also create a nice logger to differentiate between stdout and stderr and bind to them. Since spawn returns an instance which has stdout and stderr event emitters, we can bind our logOutput function to 'data' event using .on('data', () => { /* our callback function */ }).

Another interesting tidbit is that python --``version outputs the version to stderr. The inconsistencies around whether *NIX executables use exit codes, stderr and stdout on success/error are a quirk that we’ll have to bear in mind while integrating Python/Ruby/other with Node.js.

const { spawn } = require('child_process')

const logOutput = (name) => (data) => console.log(`[${name}] ${data.toString()}`)

function run() {
  const process = spawn('python', ['--version']);

  process.stdout.on(
    'data',
    logOutput('stdout')
  );

  process.stderr.on(
    'data',
    logOutput('stderr')
  );
}

(() => {
  try {
    run()
    // process.exit(0)
  } catch (e) {
    console.error(e.stack);
    process.exit(1);
  }
})();
$ node run.js

[stderr] Python 2.7.13

Call a Python script from Node with child_process.spawn

We’ll now run a fully-fledged Python script (although it could just as well be Ruby, PHP, shell etc.) from Node.js.

This is script.py, it just logs out argv (the “argument vector”, ie. ['path/to/executable', /* command line arguments ])

import sys

print(sys.argv)

Like in the previous example, we’ll just call spawn with python with the path to the Python script (./script.py) in the second parameter.

Here comes another gotcha of integrating scripts in this fashion. In this example, the path to the script is based on the working directory from which node is called.

There are workaround of course using the path module and __dirname, which for example could resolve a other-script.py co-located with the JavaScript file/Node module calling spawn using: require('path').resolve(__dirname, './other-script.py').

const { spawn } = require('child_process')

const logOutput = (name) => (data) => console.log(`[${name}] ${data.toString()}`)

function run() {
  const process = spawn('python', ['./script.py']);

  process.stdout.on(
    'data',
    logOutput('stdout')
  );

  process.stderr.on(
    'data',
    logOutput('stderr')
  );
}

(() => {
  try {
    run()
    // process.exit(0)
  } catch (e) {
    console.error(e.stack);
    process.exit(1);
  }
})();

Output:

$ node run.js

\[stdout\] ['./script.py']

Pass arguments to a Python script from Node.js using child_process.spawn

The next step of integration is to be able to pass data from the Node/JavaScript code to the Pytonh script.

In order to do this, we’ll just passed more shell arguments using the arguments array (second parameter to spawn).

const { spawn } = require('child_process')

const logOutput = (name) => (data) => console.log(`[${name}] ${data.toString()}`)

function run() {
  const process = spawn('python', ['./script.py', 'my', 'args']);

  process.stdout.on(
    'data',
    logOutput('stdout')
  );

  process.stderr.on(
    'data',
    logOutput('stderr')
  );
}

(() => {
  try {
    run()
    // process.exit(0)
  } catch (e) {
    console.error(e.stack);
    process.exit(1);
  }
})();

Our script.py will also just log out the argv except the first element (which is the path to the script).

import sys

print(sys.argv)[1:]

Here’s the output:

$ node run.js

\[stdout\] ['my', 'args']

Read child_process.spawn output from Node.js

It’s nice to be able to pass data down to the Python script, but we’re still not able to get the data from the Python script back in a format that we’re able to leverage in our Node.js/JavaScript application.

The solution to this is to wrap the whole spawn -calling function into a Promise. This allows us to decide when we want to resolve or reject.

To keep track of the Python script’s output stream(s), we manually buffer the output using arrays (one for stdout and another for stderr).

We also add a listener for 'exit' using spawn().on('exit', (code, signal) => { /* probably call resolve() */ }). This is where we will tend to resolve/reject the Promise’s value(s) from the Python/Ruby/other script.

const { spawn } = require('child_process')

const logOutput = (name) => (data) => console.log(`[${name}] ${data}`)

function run() {
  return new Promise((resolve, reject) => {
    const process = spawn('python', ['./script.py', 'my', 'args']);

    const out = []
    process.stdout.on(
      'data',
      (data) => {
        out.push(data.toString());
        logOutput('stdout')(data);
      }
    );


    const err = []
    process.stderr.on(
      'data',
      (data) => {
        err.push(data.toString());
        logOutput('stderr')(data);
      }
    );

    process.on('exit', (code, signal) => {
      logOutput('exit')(`${code} (${signal})`)
      resolve(out);
    });
  });
}

(async () => {
  try {
    const output = await run()
    logOutput('main')(output)
    process.exit(0)
  } catch (e) {
    console.error(e.stack);
    process.exit(1);
  }
})();

Output:

$ node run.js

\[stdout\] ['my', 'args']
\[main\] ['my', 'args']

Handle errors from child_process.spawn

Next up we need to handle errors from the Python/Ruby/shell script at the Node.js/JavaScript level.

The main way that a *NIX executable signals that it errored is by using a 1 exit code. That’s why the .on('exit' handler now does a check against code === 0 before deciding whether to resolve or reject with value(s).

const { spawn } = require('child_process')

const logOutput = (name) => (data) => console.log(`[${name}] ${data}`)

function run() {
  return new Promise((resolve, reject) => {
    const process = spawn('python', ['./script.py', 'my', 'args']);

    const out = []
    process.stdout.on(
      'data',
      (data) => {
        out.push(data.toString());
        logOutput('stdout')(data);
      }
    );


    const err = []
    process.stderr.on(
      'data',
      (data) => {
        err.push(data.toString());
        logOutput('stderr')(data);
      }
    );

    process.on('exit', (code, signal) => {
      logOutput('exit')(`${code} (${signal})`)
      if (code === 0) {
        resolve(out);
      } else {
        reject(new Error(err.join('\n')))
      }
    });
  });
}

(async () => {
  try {
    const output = await run()
    logOutput('main')(output)
    process.exit(0)
  } catch (e) {
    console.error('Error during script execution ', e.stack);
    process.exit(1);
  }
})();

Output:

$ node run.js

[stderr] Traceback (most recent call last):
    File "./script.py", line 3, in <module>
    print(sy.argv)[1:]
NameError: name 'sy' is not defined

Error during script execution  Error: Traceback (most recent call last):
    File "./script.py", line 3, in <module>
    print(sy.argv)[1:]
NameError: name 'sy' is not defined

    at ChildProcess.process.on (/app/run.js:33:16)
    at ChildProcess.emit (events.js:182:13)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:240:12)

Pass structured data from Python/Ruby to Node.js/JavaScript with child_process.spawn

The final step to full integration between Ruby/Python/PHP/shell scripts and our Node.js/JavaScript application layer is to be able to pass structured data back from the script up to Node.js/JavaScript.

The simplest structured data format that tends to be available in both Python/Ruby/PHP and Node.js/JavaScript is JSON.

In the Python script, we print the json.dumps() output of a dictionary, see script.py:

import sys
import json

send_message_back = {
  'arguments': sys.argv[1:],
  'message': """Hello,
This is my message.

To the world"""
}


print(json.dumps(send_message_back))

In Node, we add some JSON-parsing logic (using JSON.parse) in the 'exit' handler.

A gotcha at this point is if, for example JSON.parse() fails due to badly-formed JSON, we need to propagate that error up, hence the try/catch where the catch clause reject-s the potential error: try { resolve(JSON.parse(out[0])) } catch(e) { reject(e) }.

const { spawn } = require('child_process')

const logOutput = (name) => (message) => console.log(`[${name}] ${message}`)

function run() {
  return new Promise((resolve, reject) => {
    const process = spawn('python', ['./script.py', 'my', 'args']);

    const out = []
    process.stdout.on(
      'data',
      (data) => {
        out.push(data.toString());
        logOutput('stdout')(data);
      }
    );


    const err = []
    process.stderr.on(
      'data',
      (data) => {
        err.push(data.toString());
        logOutput('stderr')(data);
      }
    );

    process.on('exit', (code, signal) => {
      logOutput('exit')(`${code} (${signal})`)
      if (code !== 0) {
        reject(new Error(err.join('\n')))
        return
      }
      try {
        resolve(JSON.parse(out[0]));
      } catch(e) {
        reject(e);
      }
    });
  });
}

(async () => {
  try {
    const output = await run()
    logOutput('main')(output.message)
    process.exit(0)
  } catch (e) {
    console.error('Error during script execution ', e.stack);
    process.exit(1);
  }
})();

Output

$ node run.js

[stdout] {"message": "Hello,\nThis is my message.\n\nTo the world", "arguments": ["my", "args"]}

[main] Hello,
This is my message.

To the world

I’ve got mentoring spots open at https://mentorcruise.com/mentor/HugoDiFrancesco/, so do that if you want Node.js/JavaScript/career mentoring or feel free to tweet at me @hugo__df

unsplash-logoElaine Casap

Author

Hugo Di Francesco

Co-author of "Professional JavaScript", "Front-End Development Projects with Vue.js" with Packt, "The Jest Handbook" (self-published). Hugo runs the Code with Hugo website helping over 100,000 developers every month and holds an MEng in Mathematical Computation from University College London (UCL). He has used JavaScript extensively to create scalable and performant platforms at companies such as Canon, Elsevier and (currently) Eurostar.

Get The Jest Handbook (100 pages)

Take your JavaScript testing to the next level by learning the ins and outs of Jest, the top JavaScript testing library.