Jump to content

XML pipeline

From Wikipedia, the free encyclopedia

In software, an XML pipeline is formed when XML (Extensible Markup Language) processes, especially XML transformations and XML validations, are connected.

For instance, given two transformations T1 and T2, the two can be connected so that an input XML document is transformed by T1 and then the output of T1 is fed as input document to T2. Simple pipelines like the one described above are called linear; a single input document always goes through the same sequence of transformations to produce a single output document.

Linear operations

[edit]

Linear operations can be divided in at least two parts

Micro-operations

[edit]

They operate at the inner document level

  • Rename - renames elements or attributes without modifying the content
  • Replace - replaces elements or attributes
  • Insert - adds a new data element to the output stream at a specified point
  • Delete - removes an element or attribute (also known as pruning the input tree)
  • Wrap - wraps elements with additional elements
  • Reorder - changes the order of elements

Document operations

[edit]

They take the input document as a whole

  • Identity transform - makes a verbatim copy of its input to the output
  • Compare - it takes two documents and compare them
  • Transform - execute a transform on the input file using a specified XSLT file. Version 1.0 or 2.0 should be specified.
  • Split - take a single XML document and split it into distinct documents

Sequence operations

[edit]

They are mainly introduced in XProc and help to handle the sequence of document as a whole

  • Count - it takes a sequence of documents and counts them
  • Identity transform - makes a verbatim copy of its input sequence of documents to the output
  • split-sequence - takes a sequence of documents as input and routes them to different outputs depending on matching rules
  • wrap-sequence - takes a sequence of documents as input and wraps them into one or more documents

Non-linear

[edit]

Non-linear operations on pipelines may include:

  • Conditionals — where a given transformation is executed if a condition is met while another transformation is executed otherwise
  • Loops — where a transformation is executed on each node of a node set selected from a document or a transformation is executed until a condition evaluates to false
  • Tees — where a document is fed to multiple transformations potentially happening in parallel
  • Aggregations — where multiple documents are aggregated into a single document
  • Exception Handling — where failures in processing can result in an alternate pipeline being processed

Some standards also categorize transformation as macro (changes impacting an entire file) or micro (impacting only an element or attribute)

XML pipeline languages

[edit]

XML pipeline languages are used to define pipelines. A program written with an XML pipeline language is implemented by software known as an XML pipeline engine, which creates processes, connects them together and finally executes the pipeline. Existing XML pipeline languages include:

Standards

[edit]

Product-specific

[edit]
  • W3C XML Pipeline Definition Language is specified in a W3C Note.[2]
  • W3C XML Pipeline Language (XPL) Version 1.0 (Draft) [3][4] is specified in a W3C Submission and a component of Orbeon Presentation Server OPS (now called Orbeon Forms). This specification provides an implementation of an earlier version of the language. XPL allows the declaration of complex pipelines with conditionals, loops, tees, aggregations, and sub-pipelines. XProc is roughly a superset of XPL.[5]
  • Cocoon sitemaps allow, among other functionality, the declaration of XML pipelines. Cocoon sitemaps are one of the earliest implementations of the concept of XML pipeline.
  • smallx XML Pipelines are used by the smallx project.
  • ServingXML defines a vocabulary for expressing flat-XML, XML-flat, flat-flat, and XML-XML transformations in pipelines.
  • PolarLake Circuit Markup Language used by PolarLake's runtime to define XML pipelines. Circuits are collections of paths through which fragments of XML stream (usually as SAX or DOM events). Components are placed on paths to interact with the stream (and/or the outside world) in a low latency process.
  • xmlsh is a scripting language based on the unix shells which natively supports xml and text pipelines [1]
  • Stylus Studio XML Pipeline is a visual grammar which defines the following operations: Input, Output, XQuery, XSLT, Validate, XSL-FO to PDF, Convert To XML, Convert From XML, Choose, Warning, Stop.

Pipe granularity

[edit]

Different XML Pipeline implementations support different granularity of flow.

  • Document: Whole documents flow through the pipe as atomic units. A document can only be in one place at a time. Though usually multiple documents may be in the pipe at once.
  • Event: Element/Text nodes events may flow through different paths. A document may be concurrently flowing through many components at the same time.

Standardization

[edit]

Until May 2010, there was no widely used standard for XML pipeline languages. However, with the introduction of the W3C XProc standard as a W3C Recommendation as of May 2010,[6] widespread adoption can be expected.

History

[edit]

See also

[edit]

References

[edit]
  1. ^ "XProc: An XML Pipeline Language". W3.org. Retrieved 2013-06-14.
  2. ^ "W3C XML Pipeline Definition Language".
  3. ^ "XML Pipeline Language (XPL) Version 1.0 (Draft)". W3.org. Retrieved 2013-06-14.
  4. ^ "XML Pipeline Definition Language Version 1.0". W3.org. 2002-02-28. Retrieved 2013-06-14.
  5. ^ "XML pipelines: XPL and XProc". Orbeon. 22 May 2007. Retrieved 14 March 2012.
  6. ^ "XProc: An XML Pipeline Language". W3.org. Retrieved 2013-06-14.
  7. ^ "Early Unix history and evolution". www.bell-labs.com. Archived from the original on April 8, 2015. Retrieved 2013-06-14.
  8. ^ "FAQ". Xpipe.sourceforge.net. 2001-12-09. Retrieved 2013-06-14.
[edit]

Standards

[edit]

Recommendations

[edit]

Working drafts

[edit]

Product specific

[edit]