this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Scala 98.3%
Other 1.7%
1 1 0

Clone this repository

https://tangled.org/kaushikc.org/kbc-data-processing https://tangled.org/did:plc:lvkhxfkdwqgwrpdek3h3q2gc/kbc-data-processing
git@git.kaushikc.org:kaushikc.org/kbc-data-processing git@git.kaushikc.org:did:plc:lvkhxfkdwqgwrpdek3h3q2gc/kbc-data-processing

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

Data Pre-Processing Tasks for Knowledge Base Creation#

This project consists of different pre-processing tasks required on input file(s) before they can be used for respective Knowledge Base Creation (KBC).

Technologies Used#

These are all Scala based scripts / programs each representing individual pre-processing tasks built using sbt

Dependencies

  • cats - for typeclasses & data types
  • monix - for observables, non-blocking Task and parallel processing; in other words for all the side-effects
  • pureconfig - for typed configuration (if and when required)

how to run

  • make relevant changes to application.conf for the respective module (like associatekbc or domain)
  • sbt run command will ask you to select the App you want to run

TODO#

  • replace current multiple main classes by multiple sbt projects
  • better way to do parallel & non-blocking IO for huge files without non-daemonic threads
  • TODOS from Domain
    • refactor regex(s) and keep them in one place
    • introduce free monads for actions and make the current implementation of parsing text as part of an interpretor there by making the whole parsing action extensible to any kind of input data
    • once a free monad structure is introduced for domain objects, create new interpretors with Akka Stream or FS2 as effects to see if they help improve the performance

there will be bugs